Lecture Notes in Bioinformatics
4689
Edited by S. Istrail, P. Pevzner, and M. Waterman Editorial Board: A. Apostolico S. Brunak M. Gelfand T. Lengauer S. Miyano G. Myers M.-F. Sagot D. Sankoff R. Shamir T. Speed M. Vingron W. Wong
Subseries of Lecture Notes in Computer Science
Kang Li Xin Li George William Irwin Guosen He (Eds.)
Life System Modeling and Simulation International Conference, LSMS 2007 Shanghai, China, September 14-17, 2007 Proceedings
13
Series Editors Sorin Istrail, Brown University, Providence, RI, USA Pavel Pevzner, University of California, San Diego, CA, USA Michael Waterman, University of Southern California, Los Angeles, CA, USA Volume Editors Kang Li George William Irwin Queen’s University Belfast School of Electronics, Electrical Engineering and Computer Science Ashby Building, Stranmillis Road, BT9 5AH Belfast, UK E-mail: {K.Li, g.irwin}@ee.qub.ac.uk Xin Li Guosen He Shanghai University, School of Mechatronics and Automation, China E-mail: {su_xinli, guosenhe}@yahoo.com.cn
Library of Congress Control Number: 2007933845
CR Subject Classification (1998): F.2.2, F.2, E.1, G.1, I.2, J.3 LNCS Sublibrary: SL 8 – Bioinformatics ISSN ISBN-10 ISBN-13
1865-0929 3-540-74770-2 Springer Berlin Heidelberg New York 978-3-540-74770-3 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2007 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12118502 06/3180 543210
Preface
The International Conference on Life System Modeling and Simulation (LSMS) was formed to bring together international researchers and practitioners in the field of life system modeling and simulation as well as life system-inspired theory and methodology. The concept of a life system is quite broad. It covers both micro and macro components ranging from cells, tissues and organs across to organisms and ecologic niches. These interact and evolve to produce an overall complex system whose behavior is difficult to comprehend and predict. The arrival of the 21st century has been marked by a resurgence of research interest both in arriving at a systems-level understanding of biology and in applying such knowledge in complex real-world applications. Consequently, computational methods and intelligence in systems, biology, as well as bio-inspired computational intelligence, have emerged as key drivers for new computational methods. For this reason papers dealing with theory, techniques and real-world applications relating to these two themes were especially solicited. Building on the success of a previous workshop in 2004, the 2007 International Conference on Life System Modeling and Simulation (LSMS 2007) was held in Shanghai, China, September 14–17, 2007. The conference was jointly organized by The Shanghai University, Queen's University Belfast together with The Life System Modeling and Simulation Special Interest Committee of the Chinese Association for System Simulation. The conference program offered the delegates keynote addresses, panel discussions, special sessions and poster presentations, in addition to a series of social functions to enable networking and future research collaboration. LSMS 2007 received a total of 1,383 full paper submissions from 21 countries. All these papers went through a rigorous peer-review procedure, including both prereview and formal referring. Based on the referee reports, the Program Committee finally selected 333 good-quality papers for presentation at the conference, from which 147 were subsequently selected and recommended for publication by Springer in one volume of Lecture Notes in Computer Science (LNCS) and one volume of Lecture Notes in Bioinformatics (LNBI). This particular volume of Lecture Notes in Computer Science (LNCS) includes 84 papers covering 6 relevant topics. The organizers of LSMS 2007 would like to acknowledge the enormous contributions made by the following: the Advisory Committee and Steering Committee for their guidance and advice, the Program Committee and the numerous referees worldwide for their efforts in reviewing and soliciting the papers, and the Publication Committee for their editorial work. We would also like to thank Alfred Hofmann, from Springer, for his support and guidance. Particular thanks are of course due to all the authors, as without their high-quality submissions and presentations, the LSMS 2007 conference would not have been possible. Finally, we would like to express our gratitude to our sponsor – The Chinese Association for System Simulation, – and a number of technical co-sponsors: the IEEE United Kingdom and Republic of Ireland Section, the IEEE CASS Life Science Systems and Applications Technical Committee, the IEEE CIS Singapore Chapter, the
VI
Preface
IEEE Shanghai Section for their technical co-sponsorship and the Systems and Synthetic Biology (Springer) for their financial sponsorship. The support of the Intelligent Systems and Control group at Queen’s University Belfast, Fudan University, the Shanghai Institute for Biological Sciences, the Chinese Academy of Sciences, the Shanghai Association for System Simulation, the Shanghai Association for Automation, Shanghai Association for Instrument and Control, the Shanghai Rising-star Association, the Shanghai International Culture Association, Shanghai Medical Instruments Trade Association is also acknowledged.
June 2007
Bohu Li Guosen He Mitsuo Umezu Min Wang Minrui Fei George W. Irwin Kang Li Luonan Chen Shiwei Ma
LSMS 2007 Organization
Advisory Committee Panos J. Antsaklis, USA Aike Guo, China Huosheng Hu, UK Iven Mareels, Australia Shuzhi Sam Ge, Singapore Yishan Wang, China Zhenghai Xu, China Xiangsun Zhang, China Mengchu Zhou, USA
John L. Casti, Austria Roland Hetzer, Germany Okyay Kaynak, Turkey Kwang-Hyun Park, Korea Eduardo Sontag, USA Paul Werbos, USA Hao Ying, USA Guoping Zhao, China
Joseph Sylvester Chang, Singapore Tom Heskes, Netherlands
Kwang-Hyun Cho, Korea
Seung Kee Han, Korea
Yan Hong, HK China
Fengju Kang, China Yixue Li, China Sean McLoone, Ireland Dhar Pawan, Singapore
Young J Kim, Korea Zaozhen Liu, China David McMillen, Canada Chen Kay Tan, Singapore
Stephen Thompson, UK Tianyuan Xiao, China Tianshou Zhou, China
Svetha Venkatesh, Australia Jianxin Xu, Singapore Quanmin Zhu, UK
Jenn-Kang Hwang, Taiwan China Gang Li, UK Zengrong Liu, China Yi Pan, USA Kok Kiong Tan, Singapore YuguoWeng, Germany Wu Zhang, China
Kazuyuki Aihara, Japan Zongji Chen, China Alfred Hofmann, Germany Frank L. Lewis, USA Xiaoyuan Peng, China Steve Thompson, UK Stephen Wong, USA Minlian Zhang, China Yufan Zheng, Australia
Steering Committee
Honorary Chairs Bohu Li, China Guosen He, China Mitsuo Umezu, Japan
General Chairs Min Wang, China Minrui Fei, China George W. Irwin, UK
VIII
Organization
International Program Committee IPC Chairs Kang Li, UK Luonan Chen, Japan IPC Local Chairs Luis Antonio Aguirre, Brazil Xingsheng Gu, China WanQuan Liu, Australia T.C. Yang, UK
Yongsheng Ding, China
Orazio Giustolisi, Italy
Pheng-Ann Heng, HK, China Zhijian Song, China Jun Zhang, USA
Nicolas Langlois, France Shu Wang, Singapore
IPC Members Akira Amano, Japan Ming Chen, China Xiaochun Cheng, UK Patrick Connally, UK Huijun Gao, China Ning Gu, China Liqun Han, China Guangbin Huang, Singapore Ping Jiang, UK
Vitoantonio Bevilacqua, Italy Zengqiang Chen, China Minsen Chiu, Singapore Rogers Eric, UK Xiaozhi Gao, Finland Weihua Gui, China Jiehuan He, China Sunan Huang, Singapore
Weidong Cai, Australia Wushan Cheng, China Sally Clift, UK Haiping Fang, China Zhinian Gao, China Lingzhong Guo, UK Liangjian Hu, China Peter Hung, Ireland
Prashant Joshi, Austria
Abderrafiaa Koukam, France Keun-Woo Lee, Korea Jun Li, Singapore Xiaoou Li, Mexico Guoqiang Liu, China Junfeng Liu, USA Zuhong Lu, China Kezhi Mao, Singapore Carlo Meloni, Italy Manamanni Noureddine, France Girijesh Prasad, UK
Xuecheng Lai, Singapore
Tetsuya J Kobayashi, Japan Ziqiang Lang, UK
Raymond Lee, UK Shaoyuan Li, China Yunfeng Li, China Han Liu, China Mandan Liu, China Guido Maione, Italy Marco Mastrovito, Italy Zbigniew Mrozek, Poland Philip Ogunbona, Australia
Donghai Li, China Wanqing Li, Australia Paolo Lino, Italy Julian Liu, UK Wei Lou, China Fenglou Mao, USA Marion McAfee, UK Antonio Neme, Mexico Jianxun Peng, UK
Yixian Qin, USA
Wei Ren, China
Organization
Qiguo Rong, China Ziqiang Sun, China Nigel G Ternan, UK Bing Wang, UK Ruiqi Wang, Japan Xiuying Wang, Australia Guihua Wen, China Lingyun Wu, China Qingguo Xie, China Jun Yang, Singapore Ansheng Yu, China Jingqi Yuan, China Jun Zhang, USA Cishen Zhang, Singapore Yisheng Zhu, China
Da Ruan, Belgium Sanjay Swarup, Singapore Shanbao Tong, China Jihong Wang, UK Ruisheng Wang, Japan Yong Wang, Japan
Chenxi Shao, China Shin-ya Takane, Japan Gabriel Vasilescu, France Ning Wang, China Xingcheng Wang, China Zhuping Wang, Singapore
Peter A. Wieringa, Netherlands Xiaofeng Wu, China Meihua Xu, China Tao Yang, USA Weichuan Yu, HK China Dong Yue, China Yi Zhang, China Xingming Zhao, Japan
Guangqiang Wu, China Hong Xia, UK Zhenyuan Xu, China Maurice Yolles, UK Wen Yu, Mexico Zhoumo Zeng, China Zuren Zhang, China Huiyu Zhou, UK
Secretary General Shiwei Ma, China Ping Zhang, China
Co-Secretary-General Li Jia, China Qun Niu, China Banghua Yang, China
Lixiong Li, China Yang Song, China
Publication Chairs Xin Li, China Sanjay Swarup, Singapore
Special Session Chair Hai Lin, Singapore
IX
Xin Li, China Ling Wang, China
X
Organization
Organizing Committee OC Chairs Jian Wang, China Yunjie Wu, China Zengrong Liu, China Yuemei Tan, China OC Co-chairs Tingzhang Liu, China Shiwei Ma, China Weiyi Wang, China Xiaojin Zhu, China OC Members Jian Fan, China Zhihua Li, China Zhongjie Wang, China
Weiyan Hou, China Hai Lin, Singapore Lisheng Wei, China
Aimei Huang, China Xin Sun, China Xiaolei Xia, UK
Reviewers Jean-Francois Arnold Xiaojuan Ban Leonora Bianchi Mauro Birattari Ruifeng Bo Jiajun Bu Dongsheng Che Fei Chen Feng Chen Guochu chen Hang Chen Mingdeng Chen Lijuan Chen Zengqiang Chen Cheng Cheng Guojian Cheng Jin Cheng Maurizio Cirrincione Patrick Connally Marco Cortellino
Jean-Charles Creput Shigang Cui Dan Diaper Chaoyang Dong Guangbo Dong Shuhai Fan Lingshen Fang Dongqing Feng Hailin Feng Zhanshen Feng Cheng Heng Fua Jie Gao Padhraig Gormley Jinhong Gu Lan Guo Qinglin Guo Yecai Guo Yu Guo Dong-Han Ham Zhang Hong
Aimin Hou Yuexian Hou Jiangting Hu Qingxi Hu Wenbin Hu Xianfeng Huang Christian Huyck George W. Irwin Yubin Ji Li Jian Shaohua Jiang Guangxu Jin Hailong Jin Xinsheng Ke Mohammad Khalil Yohei Koyama Salah Laghrouche Usik Lee Chi-Sing Leung Gun Li
Organization
Honglei Li Kan Li Kang Li Ning Li Xie Li Yanbo Li Yanyan Li Zhonghua Li Xiao Liang Xiaomei Lin Binghan Liu Chunan Liu Hongwei Liu Junfang Liu Lifang Liu Renren Liu Wanquan Liu Weidong Liu Xiaobing Liu Xiaojie Liu Xuxun Liu Yumin Liu Zhen Liu Zhiping Liu Xuyang Lou Tao Lu Dajie Luo Fei Luo Suhuai Luo Baoshan Ma Meng Ma Xiaoqi Ma Quentin Mair Xiong Men Zhongchun Mi Claude Moog Jin Nan Jose Negrete Xiangfei Nie Xuemei Ning Dongxiao Niu Jingchang Pan Paolo Pannarale Konstantinos Pataridis Jianxun Peng Son Lam Phung Xiaogang Qi
Chaoyong Qin peng qin Zhaohui Qin Lipeng Qiu Yuqing Qiu Yi Qu Qingan Ren Didier Ridienger Giuseppe Romanazzi R Sanchez Jesus Savage Ssang-Hee Seo Tao Shang Zichang Shangguan Chenxi Shao JeongYon Shim Chiyu Shu Yunxing Shu Vincent Sircoulomb Anping Song Chunxia Song Guanhua Song Yuantao Song Yan Su Yuheng Su Suixiulin Shibao Sun Wei Sun Da Tang Pey Yuen Tao Shen Tao Keng Peng Tee Jingwen Tian Han Thanh Trung Callaghan Vic Ping Wan Hongjie Wang Kundong Wang Lei Wang Lin Wang Qing Wang Qingjiang Wang Ruisheng Wang Shuda Wang Tong Wang Xiaolei Wang Xuesong Wang
Ying Wang Zhelong Wang Zhongjie Wang Hualiang Wei Liang Wei Guihua Wen Qianyong Weng Xiangtao Wo Minghui Wu Shihong Wu Ting Wu Xiaoqin Wu Xintao Wu Yunna Wu Zikai Wu Chengyi Xia Linying Xiang Xiaolei Xia Yougang Xiao Jiang Xie Jun Xie Xiaohui Xie Lining Xing Guangning Xu Jing Xu Xiangmin Xu Xuesong Xu Yufa Xu Zhiwen Xu Qinghai Yang Jin Yang Xin Yang Yinhua Yang Zhengquan Yang Xiaoling Ye Changming Yin Fengqin Yu Xiaoyi Yu Xuelian Yu Guili Yuan Lulai Yuan Zhuzhi Yuan Peng Zan Yanjun Zeng Chengy Zhang Kai Zhang Kui Zhang
XI
XII
Organization
Hongjuan Zhang Hua Zhang Jianxiong Zhang Limin Zhang Lin Zhang Ran Zhang Xiaoguang Zhang Xing Zhang
Haibin Zhao Shuguang Zhao Yi Zhao Yifan Zhao Yong Zhao Xiao Zheng Yu Zheng Hongfang Zhou
Huiyu Zhou Qihai Zhou Yuren Zhou Qingsheng Zhu Xinglong Zhu Zhengye Zhu Xiaojie Zong
Table of Contents
The First Section: Modeling and Simulation of Societies and Collective Behavior Phase Synchronization of Circadian Oscillators Induced by a Light-Dark Cycle and Extracellular Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ying Li, Jianbao Zhang, and Zengrong Liu
1
Detecting RNA Sequences Using Two-Stage SVM Classifier . . . . . . . . . . . Xiaoou Li and Kang Li
8
Frequency Synchronization of a Set of Cells Coupled by Quorum Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianbao Zhang, Zengrong Liu, Ying Li, and Luonan Chen
21
A Stochastic Model for Prevention and Control of HIV/AIDS Transmission Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Min Xu, Yongsheng Ding, and Liangjian Hu
28
Simulation of Artificial Life of Bee’s Behaviors . . . . . . . . . . . . . . . . . . . . . . . Bin Wu, Hongying Zhang, and Xia Ni
38
Hybrid Processing and Time-Frequency Analysis of ECG Signal . . . . . . . Ping Zhang, Chengyuan Tu, Xiaoyang Li, and Yanjun Zeng
46
Robust Stability of Human Balance Keeping . . . . . . . . . . . . . . . . . . . . . . . . Minrui Fei, Lisheng Wei, and Taicheng Yang
58
Modelling Pervasive Environments Using Bespoke and Commercial Game-Based Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marc Davies, Vic Callaghan, and Liping Shen
67
The Research of Artificial Animal’s Behavior Memory Based on Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaojuan Ban, Shurong Ning, Jing Shi, and Dongmei Ai
78
The Second Section: Computational Methods and Intelligence in Biomechanical Systems, Tissue Engineering and Clinical Bioengineering How to Ensure Safety Factors in the Development of Artificial Heart: Verified by the Usage of “Modeling and Simulation” Technology . . . . . . . Mitsuo Umezu
88
XIV
Table of Contents
Parametric-Expression-Based Construction of Interior Features for Tissue Engineering Scaffold with Defect Bone . . . . . . . . . . . . . . . . . . . . . . . Chunxiang Dai, Qingxi Hu, and Minglun Fang Computation of Uniaxial Modulus of the Normal and Degenerated Articular Cartilage Using Inhomogeneous Triphasic Model . . . . . . . . . . . . Haijun Niu, Qing Wang, Yongping Zheng, Fang Pu, Yubo Fan, and Deyu Li Effect of the Plantar Ligaments Injury on the Longitudinal Arch Height of the Human Foot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yunfeng Yang, Guangrong Yu, Wenxin Niu, Jiaqian Zhou, Yanxi Chen, Feng Yuan, and Zuquan Ding Internet Living Broadcast of Medical Video Stream . . . . . . . . . . . . . . . . . . Shejiao Li, Bo Li, and Fan Zhang Predicting Syndrome by NEI Specifications: A Comparison of Five Data Mining Algorithms in Coronary Heart Disease . . . . . . . . . . . . . . . . . . Jianxin Chen, Guangcheng Xi, Yanwei Xing, Jing Chen, and Jie Wang Application of Image Processing and Finite Element Analysis in Bionic Scaffolds’ Design Optimizing and Fabrication . . . . . . . . . . . . . . . . . . . . . . . . Liulan Lin, Huicun Zhang, Yuan Yao, Aili Tong, Qingxi Hu, and Minglun Fang The Mechanical Properties of Bone Tissue Engineering Scaffold Fabricating Via Selective Laser Sintering . . . . . . . . . . . . . . . . . . . . . . . . . . . . Liulan Lin, Aili Tong, Huicun Zhang, Qingxi Hu, and Minglun Fang
97
104
111
120
129
136
146
The Third Section: Computational Intelligence in Bioinformatics and Biometrics Informational Structure of Agrobacterium Tumefaciens C58 Genome . . . Zhihua Liu and Xiao Sun
153
Feature Extraction for Cancer Classification Using Kernel-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shutao Li and Chen Liao
162
A New Hybrid Approach to Predict Subcellular Localization by Incorporating Protein Evolutionary Conservation Information . . . . . . . . . ShaoWu Zhang, YunLong Zhang, JunHui Li, HuiFeng Yang, YongMei Cheng, and GuoPing Zhou
172
Table of Contents
Support Vector Machine for Prediction of DNA-Binding Domains in Protein-DNA Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiansheng Wu, Hongtao Wu, Hongde Liu, Haoyan Zhou, and Xiao Sun
XV
180
Feature Extraction for Mass Spectrometry Data . . . . . . . . . . . . . . . . . . . . . Yihui Liu
188
An Improved Algorithm on Detecting Transcription and Translation Motif in Archaeal Genomic Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minghui Wu, Xian Chen, Fanwei Zhu, and Jing Ying
197
Constructing Structural Alignment of RNA Sequences by Detecting and Assessing Conserved Stems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaoyong Fang, Zhigang Luo, Bo Yuan, Zhenghua Wang, and Fan Ding
208
Iris Verification Using Wavelet Moments and Neural Network . . . . . . . . . . Zhiqiang Ma, Miao Qi, Haifeng Kang, Shuhua Wang, and Jun Kong
218
Comprehensive Fuzzy Evaluation Model for Body Physical Exercise Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yizhi Wu, Yongsheng Ding, and Hongan Xu
227
The Fourth Section: Brain Stimulation, Neural Dynamics and Neural Interfacing The Effect of Map Information on Brain Activation During a Driving Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tao Shang, Shuoyu Wang, and Shengnan Zhang
236
Worm 5: Pseudo-organics Computer and Natural Live System . . . . . . . . . Yick Kuen Lee and Ying Ying Lee
246
Comparisons of Chemical Synapses and Gap Junctions in the Stochastic Dynamics of Coupled Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiang Wang, Xiumin Li, and Dong Feng
254
Distinguish Different Acupuncture Manipulations by Using Idea of ISI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiang Wang, Wenjie Si, Limei Zhong, and Feng Dong
264
The Study on Internet-Based Face Recognition System Using PCA and MMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jong-Min Kim
274
Simulation of Virtual Human’s Mental State in Behavior Animation . . . . Zhen Liu
284
XVI
Table of Contents
Hemodynamic Analysis of Cerebral Aneurysm and Stenosed Carotid Bifurcation Using Computational Fluid Dynamics Technique . . . . . . . . . . Yi Qian, Tetsuji Harada, Koichi Fukui, Mitsuo Umezu, Hiroyuki Takao, and Yuichi Murayama
292
Active/Inactive Emotional Switching for Thinking Chain Extraction by Type Matching from RAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JeongYon Shim
300
Pattern Recognition for Brain-Computer Interfaces by Combining Support Vector Machine with Adaptive Genetic Algorithm . . . . . . . . . . . . Banghua Yang, Shiwei Ma, and Zhihua Li
307
The Fifth Section: Biological and Biomedical Data Integration, Mining and Visualization Improved Locally Linear Embedding by Cognitive Geometry . . . . . . . . . . Guihua Wen, Lijun Jiang, and Jun Wen
317
Predicting the Free Calcium Oxide Content on the Basis of Rough Sets, Neural Networks and Data Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yunxing Shu, Shiwei Yun, and Bo Ge
326
Classification of Single Trial EEG Based on Cloud Model for Brain-Computer Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shaobin Li and Chenxi Shao
335
The Modified Self-organizing Fuzzy Neural Network Model for Adaptability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zuohua Miao, Hong Xu, and Xianhua Wang
344
Predicting Functional Protein-Protein Interactions Based on Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luwen Zhang and Wu Zhang
354
The Chaos Model Analysis Based on Time-Varying Fractal Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianrong Hou, Dan Huang, and Hui Zhao
364
Bi-hierarchy Medical Image Registration Based on Steerable Pyramid Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiuying Wang and David Feng
370
Table of Contents
XVII
The Sixth Section: Computational Methods and Intelligence in Organism Modeling and Biochemical Networks and Regulation A Multiagent Quantum Evolutionary Algorithm for Global Numerical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chaoyong Qin, Jianguo Zheng, and Jiyu Lai
380
Developing and Optimizing a Finite Element Model of Phalange Using CT Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qingxi Hu, Quan Zhang, and Yuan Yao
390
Reverse Engineering Methodology in Broken Skull Surface Model Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luyue Ju, Gaojian Zhong, and Xia Liu
399
Identification and Application of Nonlinear Rheological Characteristics of Oilseed Based on Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . Xiao Zheng, Guoxiang Lin, Dongping He, Jingzhou Wang, and Yan You
406
Prediction of Death Rate of Breast Cancer Induced from Average Microelement Absorption with Neural Network . . . . . . . . . . . . . . . . . . . . . . Shouju Li, Jizhe Wang, Yingxi Liu, and Xiuzhen Sun
414
An Adaptive Classifier Based on Artificial Immune Network . . . . . . . . . . . Zhiguo Li, Jiang Zhong, Yong Feng, and ZhongFu Wu
422
Investigation of a Hydrodynamic Performance of a Ventricular Assist Device After Its Long-Term Use in Clinical Application . . . . . . . . . . . . . . . Yuma Kokuzawa, Tomohiro Shima, Masateru Furusato, Kazuhiko Ito, Takashi Tanaka, Toshihiro Igarashi, Tomohiro Nishinaka, Kiyotaka Iwasaki, and Mitsuo Umezu
429
The Seventh Section: Computational Methods and Intelligence in Modeling of Molecular, Cellular, Multi-cellular Behavior and Design of Synthetic Biological Systems QSAR and Molecular Docking Study of a Series of Combretastatin Analogues Tubulin Inhibitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yubin Ji, Ran Tian, and Wenhan Lin
436
A Software Method to Model and Fabricate the Defective Bone Repair Bioscaffold Using in Tissue Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qingxi Hu, Hongfei Yang, and Yuan Yao
445
XVIII
Table of Contents
Using Qualitative Technology for Modeling the Process of Virus Infection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hailin Feng and Chenxi Shao
453
AOC-by-Self-discovery Modeling and Simulation for HIV . . . . . . . . . . . . . . Chunxiao Zhao, Ning Zhong, and Ying Hao
462
A Simulation Study on the Encoding Mechanism of Retinal Ganglion Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chao-Feng Cai, Pei-Ji Liang, and Pu-Ming Zhang
470
Modelling the MAPK Signalling Pathway Using a Two-Stage Identification Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Padhraig Gormley, Kang Li, and George W. Irwin
480
The Eighth Section: Others Design and Path Planning for a Remote-Brained Service Robot . . . . . . . . Shigang Cui, Xuelian Xu, Zhengguang Lian, Li Zhao, and Zhigang Bing
492
Adaptive Fuzzy Sliding Mode Control of the Model of Aneurysms of the Circle of Willis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peijun Ju, Guocai Liu, Li Tian, and Wei Zhang
501
Particle Swarm Optimization Applied to Image Vector Quantization . . . . Xubing Zhang, Zequn Guan, and Tianhong Gan
507
Face Detection Based on BPNN and Wavelet Invariant Moment in Video Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hongji Lin and Zhengchun Ye
516
Efficient Topological Reconstruction for Medical Model Based on Mesh Simplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chunxiang Dai, Ying Jiang, Qingxi Hu, Yuan Yao, and Hongfei Yang
526
Repetitive Motion Planning of Redundant Robots Based on LVI-Based Primal-Dual Neural Network and PUMA560 Example . . . . . . . . . . . . . . . . Yunong Zhang, Xuanjiao Lv, Zhonghua Li, and Zhi Yang
536
Tensile Test to Ensure a Safety of Cannula Connection in Clinical Ventricular Assist Device (VAD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takashi Tanaka, Tomohiro Shima, Masateru Furusato, Yuma Kokuzawa, Kazuhiko Ito, Kiyotaka Iwasaki, Yi Qian, and Mitsuo Umezu
546
Table of Contents
A Reproduction of Inflow Restriction in the Mock Circulatory System to Evaluate a Hydrodynamic Performance of a Ventricular Assist Device in Practical Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Masateru Furusato, Tomohiro Shima, Yuma Kokuzawa, Kazuhiko Ito, Takashi Tanaka, Kiyotaka Iwasaki, Yi Qian, Mitsuo Umezu, ZhiKun Yan, and Ling Zhu Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
XIX
553
559
Phase Synchronization of Circadian Oscillators Induced by a Light-Dark Cycle and Extracellular Noise Ying Li1 , Jianbao Zhang1 , and Zengrong Liu2 1
2
Department of Mathematics, Shanghai University, 200444 Shanghai, China
[email protected] Institute of Systems Biology, Shanghai University, 200444 Shanghai, China
[email protected]
Abstract. In mammals, the master circadian pacemaker is considered the suprachiasmatic nucleus (SCN) of the hypothalamus. Individual cellular clocks in SCN, the circadian center, are integrated into a stable and robust pacemaker with a period length of about 24 hours, which are remarkably accurate at timing biological events despite the randomness of their biochemical reactions. In this paper, we study the effect of the Light-Dark cycle and environment noise on the daily rhythms of mammals and give some numerical analysis. The results show that the environment noise makes for phase synchronization but it can not make the oscillators get phase synchronization with period of 24-h. On the contrary, the threshold of the strength of light that makes the oscillators to get the phase synchronization with period of 24-h with environment noise is larger than that in the case without environment noise.
1
Introduction
Circadian rhythms are observed in the physiology of mammals and other higher organisms. In mammals, physiological and behavioral circadian rhythms are controlled by a pacemaker located in the suprachiasmatic nucleus(SCN) of the hypothalamus[1,2]. SCN consists of 16000 neurons arranged in a symmetric bilateral structure, and it is generally believed that each isolated SCN neuron behaves as an oscillator by itself. It has been shown that isolated individual neurons are able to produce circadian oscillations, with periods ranging from 20 to 28 hours[3,4]. Daily rhythms in behavior, physiology and metabolism are controlled by endogenous circadian clocks. At the heart of these clocks is a circadian oscillator that keeps circadian time. There are many factors that entrain circadian oscillators such as the intercellular coupling, a 24-h LD cycle and intercellular and extracellular noise and the structure of SCN. In this article, we mainly analyze the effect of a 24-h LD cycle and the environment noise. Firstly, a mathematical model to describe the behavior of a population of SCN neurons is presented. The single cell oscillator is described by a K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 1–7, 2007. c Springer-Verlag Berlin Heidelberg 2007
2
Y. Li, J. Zhang, and Z. Liu
three-variable model similar to the widely used Goodwin model that was able to simulate physiological oscillations on the basis of a negative feedback. This model, based on the negative feedback loop, accounts for the core molecular mechanism leading to self-sustained oscillations of clock genes. Then under the condition to have periodic solution, we show that, the models of cells with different parameter values, their individual periods are different. Our attention is mainly focused on the analysis and comparison of their effects on the circadian oscillator from numerical viewpoint. The result of this paper is that the environment noise can make for phase synchronization but it increases the threshold of the strength of light to make the oscillators synchronize to 24-h compared to the case without noise and the 24-h LD cycle plays the crucial role on entraining the 24-h clocks.
2
Model of Self-sustained Oscillation in a SCN Neuron
To simulate circadian oscillations in single mammalian cells, we resort to a threevariable model, based on the Goodwin oscillator[5]. In this model, a clock gene mRNA (X ) produces a clock protein (Y ) which, in turn, actives a transcriptional inhibitor (Z ). The latter inhibits the transcription of the clock gene, closing a negative feedback loop. In circadian clocks, protein degradation is controlled by phosphorylation, ubiquitination and proteasomal degradation and thus it is reasonable to assume Michaelian kinetics. Here,we advise the model of an individual cell as following: ⎧ K1n X ⎪ ⎨ X˙ = v1 K1n +Z n − v2 K2 +X , (1) Y˙ = k3 X − v4 K4Y+Y , ⎪ ⎩ ˙ Z = k5 Y − v6 K6Z+Z . where v1 , v2 , v4 , v6 , K1 , K2 , K4 , K6 , k3 , k5 are parameters. In this version, self-sustained oscillation can be obtained for a Hill coefficient of n = 4. The variable X represents mRNA concentration of a clock gene, per or cry; Y is the resulting protein, PER or CRY; and Z is the active protein or the nuclear form of the protein(inhibitor). This model is closely related to those proposed by Ruoff and Rensing[7], Leloup and co-workers[6], or Ruoff and co-workers[8] for the circadian clock in Neurospora. In [10], we analyzed the dynamics of model (1) for which we gave the sufficient conditions to be a self-sustained oscillator. Now, we consider the multi-cell system under the effect of 24-h LD cycle and environment noise. The evolution equations for the population composed of N oscillators(denoted by i = 1, 2, ..., N ) are then written as ⎧ K1n Xi ⎪ ⎨ X˙ i = v1 K1n +Zin − v2 K2 +Xi + L + Dξi , i (2) Y˙i = k3 Xi − v4 K4Y+Y , i ⎪ ⎩ ˙ Zi Zi = k5 Yi − v6 K6 +Zi . L is a square-wave function which reflects the effect of a LD cycle. The term L switches from L = 0 in dark phase to L = L0 in light phase. That is to say
Phase Synchronization of Circadian Oscillators
L(t) =
L0 t ∈ [24k, 24k + 12), 0 t ∈ [24k + 12, 24(k + 1)),
3
(3)
where k is a natural number. The parameter D describes the strength of noise. ξi , called as extracellular noises representing external noises originated outside the cells due to environment perturbations, are assumed as independent Gaussian white noises with zero mean < ξi (t) >= 0 and covariances < ξi (t), ξj (t ) >= δ(t − t ).
3
The Main Results
In paper [10], we proved theoretically that the oscillators can get phase synchronization with period of 24-h under the effect of the 24-h LD cycle as if the strength of light was large enough. Here we invest the effect of the environment noises. When L0 = D = 0, system Eq.(2) is exactly the self-sustained oscillators of individual cells. Here we consider 20 cells that is to say N = 20. The values of parameters consult Ref.[9]. From Fig.1, we can see that the twenty self-sustained oscillators have different periods ranging from 20-h to 30-h. Now we consider the effect of environment noise that is to say D = 0. The simulation shows that the extracellular noise makes for phase synchronization(See Fig.3). But at the same time, the noise increases the threshold of L0 that makes the oscillators get phase synchronization with period of 24-h compared to the case without noise, which is verified by Fig.3-6. Fig.3 shows that the noise makes the oscillators to get phase synchronization but their periods are not 24-h. At the same time the light is added, when L0 = 0.01 the oscillators can not get the phase synchronization with period of 24-h while they can without 0.25
Variable X for 20 oscillators
0.2
0.15
0.1
0.05
1
1.002
1.004
1.006 Time(h)
1.008
1.01
1.012 4
x 10
Fig. 1. The time evolution of variable X for 20 oscillators, when L0 = D = 0
4
Y. Li, J. Zhang, and Z. Liu 0.4
0.35
L and Variable X for 20 oscillators
0.3
0.25
0.2
0.15
0.1
0.05
0
1
1.002
1.004
1.006 Time(h)
1.008
1.01
1.012 4
x 10
Fig. 2. The time evolution of L and variable X for 20 oscillators, when L0 = 0.02 and D=0 1
0.9
0.8
Variable X for 20 oscillators
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
2
2.002
2.004
2.006 Time(h)
2.008
2.01
2.012 4
x 10
Fig. 3. The time evolution of variable X for 20 oscillators, when L0 = 0 and D = 0.1
noise. With the increase of L0 , the periods of these synchronized oscillators approach 24-h gradually. Until L0 ≥ 0.2 with noise, these oscillators get the phase synchronization with period of 24-h. That is to say the threshold is increased greatly. When only the 24-h LD cycle is added, from paper [10], we know that as long as L0 is large enough, these oscillators can get phase synchronization and their periods are all 24-h. Our simulation results show that when L0 ≥ 0.015, the phase synchronization with period of 24-h is got(See Fig.2).
Phase Synchronization of Circadian Oscillators
5
0.8
0.7
L and variable X for 20 oscillators
0.6
0.5
0.4
0.3
0.2
0.1
0
1
1.002
1.004
1.006 Time(h)
1.008
1.01
1.012 4
x 10
Fig. 4. The time evolution of L and variable X for 20 oscillators, when L0 = 0.01 and D = 0.1 1
0.9
L and variable X for 20 oscillators
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
1.002
1.004
1.006 Time(h)
1.008
1.01
1.012 4
x 10
Fig. 5. The time evolution of L and variable X for 20 oscillators, when L0 = 0.1 and D = 0.1
From the Fig.1-6, we get a summary result that the extracellular noise and the 24-h LD cycle can accelerate phase synchrony of oscillators with different respective periods. But only the 24-h LD cycle can entrain the 24-h circadian clocks of the population of cells and the extracellular noise increases the threshold of the strength of light that makes the oscillators to get phase synchronization with period of 24-h.
6
Y. Li, J. Zhang, and Z. Liu 2
1.8
L and variable X for 20 oscillators
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1
1.002
1.004
1.006 Time(h)
1.008
1.01
1.012 4
x 10
Fig. 6. The time evolution of L and variable X for 20 oscillators, when L0 = 0.2 and D = 0.1
4
Discussion and Conclusions
In this letter, we have introduced a molecular model for the regulatory network underlying the circadian oscillations in the SCN. We analyze the effect of the 24h LD cycle and extracellular noise numerically. From these numerical simulation we can see that the 24-h LD plays a crucial role in entraining the circadian clocks 24-h which accords with the biologic experiments. The results of this paper establish quantitative basis for understanding the essential cooperative dynamics. The effect of intercellular coupling and the SCN structure will be discussed later. Acknowledgments. This research is supported by the NNSF of China (Grants: 70431002) and Innovation Foundation of Shanghai University for Postgraduates. We express special thanks.
References 1. Reppert, S.M., Weaver, D.R.: Coordination of circadian timing in mammals. Nature 418, 935–941 (2002) 2. Moore, R.Y., Speh, J.C., Leak, R.K.: Suprachiasmatic nucleus organizarion. Cell Tissue Res. 309, 89–98 (2002) 3. Welsh, D.K., Logothetis, D.E., Meister, M., Reppert, S.M.: Individual neurons dissociated from rat suprachiasmatic nucleus express independently phased circadian firing rhythms. Neuron 14, 697–706 (1995) 4. Honma, S., Nakamura, W., Shirakawa, T., Honma, K.: Diversity in the circadian periods of single neurons of the rat suprachiasmatic nucleus on nuclear structure and intrinsic period. Neurosci. Lett. 358, 173–176 (2004)
Phase Synchronization of Circadian Oscillators
7
5. Goodwin, B.C.: Oscillatory behavior in enzymatic control processes. Adv. Enzyme Regul. 3, 425–438 (1965) 6. Leloup, J., Gonze, C.D., Goldbeter, A.: Limit cycle models for circadian rhythms based on thanscriptional regulation in Drosophila and Neurospora. J. Bilo. Rhy. 14, 433–448 (1999) 7. Rouff, P., Rensing, L.: The temperature-conpensated Goodwin model simulates many circadian clock properties. J. Theor. Biol. 179, 275–285 (1996) 8. Rouff, P., Vinsjevik, M., Monnerjahn, C., Rensing, L.: The Goodwin model: simulating the effect of light pulses on the circadian sporulation rhythm of Neurospora crassa. J. Theor. Biol. 209, 29–42 (2001) 9. Didier, G., Bernard, S., Waltermann, C., Kramer, A., Herzel, H.: Spontaneous synchronization of coupled circadian oscillators. Biophysical J. 89, 120–129 (2005) 10. Li, Y., Zhang, J.B., Liu, Z.R.: Circadian oscillators and phase synchronization under a light-dark cycle. Int. J. Nonlinear Science 1(3), 131–138 (2006)
Detecting RNA Sequences Using Two-Stage SVM Classifier Xiaoou Li1 and Kang Li2 1
2
Departamento de Computaci´ on CINVESTAV-IPN A.P. 14-740, Av.IPN 2508, M´exico D.F., 07360, M´exico School of Electronics, Electrical Engineering and Computer Science Queen’s University Belfast Ashby Building, Stranmillis Road, Belfast, BT9 5AH, UK
[email protected]
Abstract. RNA sequences detection is time-consuming because of its huge data set size. Although SVM has been proved to be useful, normal SVM is not suitable for classification of large data sets because of its high training complexity. A two-stage SVM classification approach is introduced for fast classifying large data sets. Experimental results on several RNA sequences detection demonstrate that the proposed approach is promising for such applications.
1
Introduction
RNA plays many important biological roles other than as a transient carrier of amino acid sequence information [14]. It catalyzes peptide bond formation, participates in protein localization, serves in immunity, catalyzes intron splicing and RNA degradation, serves in dosage compensation. It is also an essential subunit in telomeres, guides RNA modification, controls development, and has an abundance of other regulatory functions [29]. Non-coding RNAs (ncRNAs) are transcripts that have function without being translated to protein [12]. The number of known ncRNAs is growing quickly, and their significance had been severely underestimated in classic models of cellular processes. It is desirable to develop high-throughput methods for discovery of novel ncRNAs for greater biological understanding and for discovering candidate drug targets. However, novel ncRNAs are difficult to detect in conventional biochemical screens. They are frequently short, often not polyadenylic, and might only be expressed under specific cellular conditions. Experimental screens have found many ncRNAs, but have demonstrated that no single screen is capable of discovering all known ncRNAs for an organism. A more effective approach, demonstrated in previous studies [2,28], may be to first detect ncRNA candidates computationally, then verify them biochemically. Considering the number of available whole genome sequences, SVM can be applied to a large and diverse data set, and has massive potential for novel ncRNA discovery[21,27]. However, long training K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 8–20, 2007. c Springer-Verlag Berlin Heidelberg 2007
Detecting RNA Sequences Using Two-Stage SVM Classifier
9
time is needed. Therefore, it is impossible to repeat the SVM classification on the updated data set in an acceptable time when new data are included into the data set frequently or continuously. Many researchers have tried to find possible methods to apply SVM classification for large data sets. Generally, these methods can be divided into two types: 1) modify SVM algorithm so that it could be applied to large data sets, and 2) select representative training data from a large data set so that a conventional SVM could handle. For the first type, a standard projected conjugate gradient (PCG) chunking algorithm can scale somewhere between linear and cubic in the training set size [8,16]. Sequential Minimal Optimization (SMO) is a fast method to train SVM [23,7]. Training SVM requires the solution of QP optimization problem. SMO breaks this large QP problem into a series of smallest possible QP problems, it is faster than PCG chunking. [10] introduced a parallel optimization step where block diagonal matrices are used to approximate the original kernel matrix so that SVM classification can be split into hundreds of subproblems. A recursive and computational superior mechanism referred as adaptive recursive partitioning was proposed in [17], where the data is recursively subdivided into smaller subsets. Genetic programming is able to deal with large data sets that do not fit in main memory [11]. Neural networks technique can also be applied for SVM to simplify the training process [15]. For the second type, clustering has been proved to be an effective method to collaborate with SVM on classifying large data sets. For examples, hierarchical clustering [31,1], k-means cluster [4] and parallel clustering [7]. Clustering based methods can reduce the computations burden of SVM, however, the clustering algorithms themselves are still complicated for large data set. Rocchio bundling is a statistics-based data reduction method [25]. The Bayesian committee machine is also reported to be used to train SVM on large data sets, where the large data set is divided into m subsets of the same size, and m models are derived from the individual sets [26]. But, it has higher error rate than normal SVM and the sparse property does not hold. Falling into the second type of SVM classification methods for large data sets, a two stages SVM classification approach has been proposed in our previous work[4,5,18]. At first, we select representative training data from the original data set using the results of clustering, and these selected data are used to train the first stage SVM. Note that the first stage SVM is not precise enough because of the great reduction on original data set. So we use a second stage SVM to refine the classification. The obtained support vectors of the first stage SVM are used to select data for the second stage SVM by recovering their clustermates (we call the process de-clustering). At last, the second stage SVM is applied on those de-clustered data. Our experimental results show that the accuracy obtained by our approach is very close to the classic SVM methods, while the training time is significantly shorter. Furthermore, the proposed approach can be applied on huge data sets regardless of their dimensionality. In this paper, we apply our approach on several RNA sequences data sets. The rest of the paper is organized as follows: Section II introduces our two
10
X. Li and K. Li
stages SVM classifier. Section III show the experiment results on RNA sequence detection with comparisons with other well known classifiers. Conclusion is given in Section IV.
2
Two-Stage SVM Classifier
By the sparse property of SVM, data samples which are not support vectors will not contribute the optimal hyperplane. The input data sets which are far away from the decision hyperplane should be eliminated, meanwhile the data sets which are possibly support vectors should be used. In this paper, we select the cluster centers and data of mix-labeled clusters as training data for the first stage SVM. We believe these data are the most useful and representative in a large data set for finding support vectors. Nota that, the training data set in the first stage SVM classification is only a small percentage of the original data. Data of the clusters near the hyperplane are not used totally for training SVM, since we only select the cluster centers. This may affect the classification precision, i.e., the obtained decision hyperplane may not be precise enough. However, at least it gives us a reference on data distribution. According to above analysis, we make the following modification on the training data set of the first stage SVM. 1). Remove the data far from the hyperplane from the training data set because they will not contribute to find the support vectors, 2). Retain the data of the mix-labeled clusters since they are more likely support vectors. 3). Additionally, we add the data of the clusters whose centers are support vectors of the first stage SVM. In general, our approach consists of the four steps which are shown in Figure 1: 1) data selection, 2) the first stage SVM classification, 3) de-clustering, 4) the second stage SVM classification. The following subsections will give a detailed explanation on each step. 2.1
Selecting Training Data
The goal of clustering is to separate a finite number of unlabeled items into a finite and discrete set of “natural” hidden data structures, such that items in the same cluster are more similar to each other, and those in different clusters tend to be dissimilar according to certain measure of similarity or proximity. A large number of clustering methods have been developed, e.g., squared error-based kmeans [3], fuzzy C-means [22], kernel-base clustering [13]. By our experience, fuzzy C-means clustering, Minimum enclosing ball(MEB) clustering and random selection have been proved very effective for selecting training data for the first stage SVM.[4,5,18] Let l be the cluster number, then the process of clustering is to find l partitions (or clusters) Ωi from input data set X, i = 1, . . . , l, l < n, Ωi = ∅, ∪li=1 Ωi = X.
Detecting RNA Sequences Using Two-Stage SVM Classifier
11
Fig. 1. Two-stage SVM classification
Note that the data in a cluster may have same label (positive or negative) or different labels (both positive and negative). The obtained clusters can be classified into three types: 1) clusters with only positive labeled data, denoted by Ω + , i.e., Ω + = {∪Ωi | y = +1}; 2) clusters with only negative labeled data, denoted by Ω − , i.e., Ω − = {∪Ωi | y = −1}; 3) clusters with both positive and negative labeled data (or mix-labeled), denoted by Ωm , i.e., Ωm = {∪Ωi | y = ±1}. Figure 2 (a) illustrates the clusters after clustering, where the clusters with only red points are positive labeled (Ω + ), the clusters with green points are negative labeled (Ω − ) , and clusters A and B are mix-labeled (Ωm ).
Fig. 2. Data selection: (a) Clusters (b) The first stage SVM
12
X. Li and K. Li
We select not only the centers of the clusters but also all the data of mixlabeled clusters as training data in the first SVM classification stage. If we denote the set of the centers of the clusters in Ω + and Ω − by C + and C − respectively, i.e., C + = {∪Ci | y = +1} positive labeled centers C − = {∪Ci | y = −1} negative labeled centers Then the selected data which will be used in the first stage SVM classification is the union of C + , C − and Ωm , i.e., C + ∪ C − ∪ Ωm . In Figure 2 (b), the red points belongs to C + , and the green points belong to C − . It is clear that the data in Figure 2 (b) are all cluster centers except the data in mix-labeled clusters A and B. 2.2
The First Stage SVM Classification
We consider binary classification. Let (X, Y ) be the training patterns set, X = {x1 , · · · , xn }, Y = {y1 , · · · , yn } yi = ±1, xi = (xi1, . . . , xip )T ∈ Rp
(1)
The training task of SVM classification is to find the optimal hyperplane from the input X and the output Y , which maximize the margin between the classes. That is, training SVM yields to find an optimal hyperplane or to solve the following quadratic programming problem (primal problem), 1 T 2w w
n
+ c ξk minw,b J (w) = k=1 T subject : yk w ϕ (xk ) + b ≥ 1 − ξk
(2)
where ξk is slack variables to tolerate mis-classifications ξk > 0, k = 1 · · · n, c > 0, wk is the distance from xk to the hyperplane wT ϕ (xk ) + b = 0, ϕ (xk ) is a nonlinear function. The kernel which satisfies the Mercer condition [9] is T K (xk , xi ) = ϕ (xk ) ϕ (xi ) . (2) is equivalent to the following quadratic programming problem which is a dual problem with the Lagrangian multipliers αk ≥ 0, n n maxα J (α) = − 21 yk yj K (xk , xj ) αk αj + αk subject :
n
k,j=1
αk yk = 0,
k=1
(3)
0 ≤ αk ≤ c
k=1
Many solutions of (3) are zero, i.e., αk = 0, so the solution vector is sparse, the sum is taken only over the non-zero αk . The xi which corresponds to nonzero αi is called a support vector (SV). Let V be the index set of SV, then the optimal hyperplane is αk yk K (xk , xj ) + b = 0 (4) k∈V
Detecting RNA Sequences Using Two-Stage SVM Classifier
The resulting classifier is
y(x) = sign
13
αk yk K (xk , x) + b
k∈V
where b is determined by Kuhn-Tucker conditions. Sequential minimal optimization (SMO) breaks the large QP problem into a series of smallest possible QP problems [23]. These small QP problems can be solved analytically, which avoids using a time-consuming numerical QP optimization as an inner loop. The memory required by SMO is linear in the training set size, which allows SMO to handle very large training sets [16]. A requirement l αi yi = 0, it is enforced throughout the iterations and implies that in (3) is i=1
the smallest number of multipliers can be optimized at each step is two. At each step SMO chooses two elements αi and αj to jointly optimize, it finds the optimal values for these two parameters while all others are fixed. The choice of the two points is determined by a heuristic algorithm, the optimization of the two multipliers is performed analytically. Experimentally the performance of SMO is very good, despite needing more iterations to converge. Each iteration uses few operations such that the algorithm exhibits an overall speedup. Besides convergence time, SMO has other important features, such as, it does not need to store the kernel matrix in memory, and it is fairly easy to implement [23]. In the first stage SVM classification, we use SVM classification with SMO algorithm to get the decision hyperplane. Here, the training data set is C + ∪ C − ∪ Ωm , which has been obtained in the last subsection. Figure 2 (b) shows the results of the first stage SVM classification. 2.3
De-clustering
We propose to recover the data into the training data set by including the data in the clusters whose centers are support vectors of the first stage SVM, we call this process de-clustering. Then, more original data near the hyperplane can be found through the de-clustering. The de-clustering results of the support vectors in Figure 2 (b) are shown in Figure 3 (a). The de-clustering process not only overcomes the drawback that only small part of the original data near the support vectors are trained, but also enlarge the training data set size of the second stage SVM which is good for improving the accuracy. 2.4
The Second Stage SVM Classification
Taking the recovered data as new training data set, we use again SVM classification with SMO algorithm to get the final decision hyperplane yk α∗2,k K (xk , x) + b∗2 = 0 (5) k∈V2
where V2 is the index set of the support vectors in the second stage. Generally, the hyperplane (4) is close to the hyperplane (5).
14
X. Li and K. Li
Fig. 3. (a) De-clustering (b) The second stage SVM
In the second stage SVM, we use the following two types of data as training data: 1). The data of the clusters whose centers are support vectors, i.e., ∪Ci ∈V {Ωi }, where V is a support vectors set of the first stage SVM; 2). The data of mix-labeled clusters, i.e, Ωm . Therefore, the training data set is ∪Ci ∈V {Ωi } ∪ Ωm . Figure 3 (b) illustrates the second stage SVM classification results. One can observe that the two hyperplanes in Figure 2 (b) and Figure 3 (b) are different but similar.
3
RNA Sequence Detection
We use three case studies to show the two-stage SVM classification approach introduced in the last section. The first example is done to show the necessities of the second stage SVM by comparing the accuracy of both stages. The second example is not a large data set, but it shows that training time, accuracy can be improved through adjusting the cluster number. The third example is a real large data set, we made a complete comparison with several well known algorithms as well as our two stage SVM with different clustering methods. Example 1. The training data is at www.ghastlyfop.com/blog/tag index svm.html/ .To train the SVM classifier, a training set containing every possible sequence pairing. This resulted in 47, 5865 rRNA and 114, 481 tRNA sequence pairs. The input data were computed for every sequence pair in the resulting training set of 486, 201 data points. Each record has 8 attributes with continuous values between 0 to 1. In [27], a SVM-based method was proposed to predict the common structure of two RNA sequences on the basis of minimizing folding free energy change. RNA, the total free energy change of an input sequence pair can either be compared with the total free energy changes of a set of control sequence pairs, or be used in combination with sequence length and nucleotide frequencies as input to a classification support vector machine.
Detecting RNA Sequences Using Two-Stage SVM Classifier
15
Fig. 4. The first stage SVM classification on the RNA sequences data set used in [27] with 103 data
In our experiments, we obtained 12 clusters from 1000 original data using FCM clustering. Then, in the first stage SVM, 113 training data including the cluster centers and data of mix-labeled clusters are obtained using the data selection process introduced in section II, and we got 23 support vectors. Figure 4 shows the result of the first stage SVM. Following the de-clustering technique, 210 data were recovered as training data for the second stage SVM. In the second stage SVM, we got 61 support vectors, see Figure 5. Table 1 shows the comparisons on training time and accuracy between our two SVM stages. The training time of our two-stage SVM and LIBSVM is first compared. For training 103 data, our classifier needs 67 seconds while LIBSVM needs about 100 seconds. For training 104 data, our classifier needs 76 seconds while LIBSVM needs about 1, 000 seconds. And, for 486, 201 data, our classifier needs only 279 seconds while the LIBSVM should use a very long time, it is not reported in [27], (we guess it maybe around 105 seconds). On the other hand, there is almost no difference between their accuracies. This implies that our approach has great advantage on gaining training time. Then, the accuracy between the first stage SVM and two-stage SVM is compared. From the figures and Table 1, it is obvious that the accuracy of two-stage SVM is much better than the first stage SVM. This shows that the two stages are necessary. Example 2. 3mer Dataset. The original work on string kernels – kernel functions defined on the set of sequences from an alphabet S rather than on a vector space [9] – came from the field of computational biology and was motivated by algorithms for aligning DNA and protein sequences. The recently presented k-spectrum (gap-free k-gram) kernel and the (k,m) mismatch kernel provide an alternative model for string kernels for biological sequences, and were designed, in particular, for the application of SVM
16
X. Li and K. Li
Fig. 5. The two stages SVM classification on the RNA sequences data set used in [27] with 103 data
protein classification. These kernels use counts of common occurrences of short k-length subsequences, called k-mers, rather than notions of pairwise sequence alignment, as the basis for sequence comparison. The k-mer idea still captures a biologically-motivated model of sequence similarity, in that sequences that diverge through evolution are still likely to contain short subsequences that match or almost match. We use SVM to classify proteins based on sequence data into homologous groups (evolutionary similar) to understand the structure and functions of proteins. The 3mer data set has 2000 data points, and each record has 84 attributes with continuous values between 0 to 1. The data set contains 1000 positive sequences and 1000 negative sequences. The data set is available at noble.gs.washington.edu/proj/hs/ In [21], spectrum kernel was used as a feature set representing the distribution of every possible k-mer in a RNA sequence. The value for each feature is the number of times that particular feature appears in the sequence divided by the number of times any feature of the same length appears in the sequence. In our experiments, we used MEB clustering to select data for the first stage SVM, and k=3 (i.e., 3mers, 2mers and 1mers are our features) to train our two stages SVM classifier. Table 2 shows the accuracy and training time of our classifier and LIBSVM, where the accuracies are almost the same. Also there is no much deference on training time, this is because that the data set contains only 2,000 data. However, we did experiments with cluster number (l) 400 and 100. We can see that, when we use less cluster number, the training time is less too, since the training data size is smaller, but, we get a worse accuracy. Example 3. This RNA data set is available at http://www.pubmedcentral. nih.gov /articlerender.fcgi?artid=1570369#top from Supplementary Material (additional file 7). The data set consists of 23605 data points, each record has 8 attributes with
Detecting RNA Sequences Using Two-Stage SVM Classifier
17
Table 1. Accuracy and training time on the RNA sequences data set in [27]
Data set size 103 104 486, 201
First stage SVM T (s) Acc (%) 31 67.2 70 76.9 124 81.12
Two-stage SVM T (s) Acc (%) 76 88.9 159 92.7 279 98.4
LIBSVM T (s) Acc (%) 102 87.4 103 92.6 98.3 105 ?
Table 2. Accuracy and training time on the RNA sequences data set in [21] Two-stage SVM # t(s) Acc(%) l 2000 17.18 75.9 400 2000 7.81 71.7 100
LIBSVM # t(s) 2000 8.71 — —
Acc(%) 73.15 —
continuous values between 0 to 1. The data set contains 3919 ncRNAs and 19686 negative sequences. We used sizes 500, 1, 000, 2, 500, 5, 000, 10, 000 and 23, 605 in our experiments. Experiments were done using MEB two-stage, RS two-stage, SMO, LIBSVM and simple SVM. Table 3 shows our experiment results on different data size with MEB two-stage and RS two-stage. Table 4 shows the comparisons between our approach and other algorithms. In Table 3 and 4, the notations are as explained as follows. “#” is the data size; “t” is the training time of the whole classification which includes the time of clustering, the first stage SVM training, de-clustering and the second stage SVM training; “Acc” is the accuracy; “l” is the number of clusters used in the experiment; “TrD2” is the number of training data for the second stage SVM training; “SV1” is the number of support vectors obtained in the first stage SVM; “SV2” is the number of support vectors obtained in the second stage SVM. Table 3 shows our experiment results on different data size with MEB twostage and RS two-stage. For example, in the experiment on 10, 000 data points, we sectioned it into 650 clusters using MEB clustering and random selection. In the first stage classification of MEB two-stage, we got 199 support vectors. Following the de-clustering technique, 862 data were recovered as training data for the second stage SVM, which is much less than the original data size 10, 000. In the second stage SVM, 282 support vectors were obtained. From Table 2, we can also see that MEB two-stage has a little better accuracy than random selection (RS) two-stage, while its training time is longer than that of RS two-stage. Table 4 shows the comparison results on training time and accuracy between our two-stage classification and some other SVM algorithms including SMO, simple SVM and LIBSVM. For example, to classify 5000 data, LIBSVM is the fastest, and SMO has the best accuracy, our two approaches are not better than them,
18
X. Li and K. Li
although the time and accuracy are similar to them. However, to classify 23, 605 data, Simple SVM and SMO have no better accuracy than the others, but their training time is tremendous longer. Comparing to our two approaches, LIBSVM takes almost double training time of MEB two-stage, and almost 7 times of the time of RS two-stage, although it has the same accuracy as ours. This experiment implies that our approach has great advantage on large data sets since it can reach the same accuracy as the other algorithm can in a very short training time. Table 3. Two-stage SVM classification results on RNA sequence data set MEB two-stage # t Acc 500 4.71 85.3 1000 5.90 86.2 2500 15.56 86.3 5000 26.56 86.7 10000 69.26 86.9 23605 174.5 88.5
l 350 400 450 500 650 1500
SV1 87 108 124 149 199 278
TrD2 397 463 529 656 862 1307
RS two-stage # t Acc 500 4.07 85.3 1000 4.37 85.7 2500 11.2 86.5 5000 15.8 86.1 10000 30.2 86.5 23605 65.7 88.3
SV2 168 162 209 227 282 416
l 350 400 450 500 650 1500
SV1 88 97 132 146 187 257
TrD2 421 453 581 637 875 1275
SV2 172 153 221 211 278 381
Table 4. Training time and accuracy on RNA sequence data set
# 500 1000 2500 5000 10000 23605
4
MEB two-stage RS two-stage t Acc t Acc 4.71 85.3 4.07 85.3 5.90 86.2 4.37 85.7 15.56 86.3 11.21 86.5 26.56 86.7 15.79 86.1 69.26 87.9 30.22 86.5 174.5 88.2 65.7 88.3
LIBSVM t Acc 0.37 86 0.72 87.2 3.06 87.4 12.53 87.6 48.38 88.2 298.3 88.6
SMO t 1.56 3.54 4.20 212.43 1122.5 —-
Acc 87.7 88.3 87.7 88.8 89.6 —-
Simple SVM t Acc 2.78 86.7 8.18 87.1 561.3 88.1 ——————-
Conclusions and Discussions
Our two-stage SVM classification approach is much faster than other SVM classifiers without loss of accuracy when data set is large enough. From the results of the experiments we made on biological data sets in this work, our approach has been showed suitable for classifying large and huge biological data sets. Additionally, another promising application on genomics machine learning is under study.
References 1. Awad, M.L., Khan, F., Bastani, I., Yen, L.: An Effective support vector machine(SVMs) Performance Using Hierarchical Clustering. In: Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 663– 667. IEEE Computer Society Press, Los Alamitos (2004) 2. Axmann, I.M., Kensche, P., Vogel, J., Kohl, S., Herzel, H., Hess, W.R.: Identification of cyanobacterial non-coding RNAs by comparative genome analysis. Genome Biol R73 6 (2005)
Detecting RNA Sequences Using Two-Stage SVM Classifier
19
3. Babu, G., Murty, M.: A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm. Pattern Recognit. Lett. 14, 763–769 (1993) 4. Cervantes, J., Li, X., Yu, W.: Support Vector Machine Classification Based on Fuzzy Clustering for Large Data Sets. In: Gelbukh, A., Reyes-Garcia, C.A. (eds.) MICAI 2006. LNCS (LNAI), vol. 4293, pp. 572–582. Springer, Heidelberg (2006) 5. Cervantes, J., Li, X., Yu, W., Li, K.: Support vector machine classification for large data sets via minimum enclosing ball clustering. Neurocomputing (accepted for publication) 6. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001), http://www.csie.ntu.edu.tw/∼ cjlin/libsvm 7. Chen, P.H., Fan, R.E., Lin, C.J.: A Study on SMO-Type Decomposition Methods for Support Vector Machines. IEEE Trans. Neural Networks 17, 893–908 (2006) 8. Collobert, R., Bengio, S.: SVMTorch: Support vector machines for large regression problems. Journal of Machine Learning Research 1, 143–160 (2001) 9. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000) 10. Dong, J.X., Krzyzak, A., Suen, C.Y.: Fast SVM Training Algorithm with Decomposition on Very Large Data Sets. IEEE Trans. Pattern Analysis and Machine Intelligence 27, 603–618 (2005) 11. Folino, G., Pizzuti, C., Spezzano, G.: GP Ensembles for Large-Scale Data Classification. IEEE Trans. Evol. Comput. 10, 604–616 (2006) 12. Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., Bateman, A.: RFAM: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, 121–124 (2005) 13. Girolami, M.: Mercer kernel based clustering in feature space. IEEE Trans. Neural Networks 13, 780–784 (2002) 14. Hansen, J.L., Schmeing, T.M., Moore, P.B., Steitz, T.A.: Structural insights into peptide bond formation. Proc. Natl. Acad. Sci. 99, 11670–11675 (2002) 15. Huang, G.B., Mao, K.Z., Siew, C.K., Huang, D.S.: Fast Modular Network Implementation for Support Vector Machines. IEEE Trans. on Neural Networks (2006) 16. Joachims, T.: Making large-scale support vector machine learning practice. Advances in Kernel Methods: Support Vector Machine. MIT Press, Cambridge (1998) 17. Kim, S.W., Oommen, B.J.: Enhancing Prototype Reduction Schemes with Recursion: A Method Applicable for Large Data Sets. IEEE Trans. Syst. Man, Cybern. B. 34, 1184–1397 (2004) 18. Li, X., Cervantes, J., Yu, W.: Two Stages SVM Classification for Large Data Sets via Randomly Reducing and Recovering Training Data. In: IEEE International Conference on Systems, Man, and Cybernetics, Montreal Canada (2007) 19. Lin, C.T., Yeh, L.C.M., S, F., Chung, J.F., Kumar, N.: Support-Vector-Based Fuzzy Neural Network for Pattern Classification. IEEE Trans. Fuzzy Syst. 14, 31–41 (2006) 20. Mavroforakis, M.E., Theodoridis, S.: A Geometric Approach to Support Vector Machine(SVM) Classification. IEEE Trans. Neural Networks 17, 671–682 (2006) 21. Noble, W.S., Kuehn, S., Thurman, R., Yu, M., Stamatoyannopoulos, J.: Predicting the in vivo signature of human gene regulatory sequences. Bioinformatics 21, 338– 343 (2005) 22. Pal, N., Bezdek, J.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3, 370–379 (1995)
20
X. Li and K. Li
23. Platt, J.: Fast Training of support vector machine using sequential minimal optimization. Advances in Kernel Methods: support vector machine. MIT Press, Cambridge, MA (1998) 24. Prokhorov, D.: IJCNN 2001 neural network competition. Ford Research Laboratory (2001), http://www.geocities.com/ijcnn/nnc ijcnn01.pdf 25. Shih, L., Rennie, D.M., Chang, Y., Karger, D.R.: Text Bundling: Statistics-based Data Reduction. In: Proc. of the Twentieth Int. Conf. on Machine Learning, Washington DC (2003) 26. Tresp, V.: A Bayesian Committee Machine. Neural Computation 12, 2719–2741 (2000) 27. Uzilov, A.V., Keegan, J.M., Mathews, D.H.: Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics 173 (2006) 28. Washietl, S., Hofacker, I.L., Lukasser, M., Huttenhofer, A., Stadler, P.F.: Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat. Biotechnol. 23, 1383–1390 (2005) 29. Weilbacher, T., Suzuki, K., Dubey, A.K., Wang, X., Gudapaty, S., Morozov, I., Baker, C.S., Georgellis, D., Babitzke, P., Romeo, T.: A novel sRNA component of the carbon storage regulatory system of Escherichia coli. Mol. Microbiol. 48, 657– 670 (2003) 30. Xu, R., WunschII, D.: Survey of Clustering Algorithms. IEEE Trans. Neural Networks 16, 645–678 (2005) 31. Yu, H., Yang, J., Han, J.: Classifying Large Data Sets Using SVMs with Hierarchical Clusters. In: Proc. of the 9th ACM SIGKDD (2003)
Frequency Synchronization of a Set of Cells Coupled by Quorum Sensing Jianbao Zhang2 , Zengrong Liu1 , Ying Li2 , and Luonan Chen1 1
Institute of Systems Biology, Shanghai University, Shanghai. 200444, China
[email protected],
[email protected] 2 College of Sciences, Shanghai University, Shanghai, 200444, China
Abstract. Collective behavior of a set of cells coupled by quorum sensing is a hot topic of biology. Noticing the potential applications of frequency synchronization, the paper studies frequency synchronization of a set of cells with different frequencies coupled by quorum sensing. By phase reduced method, the multicell system is transformed to a phase equation, which can be studied by master stability function method. The sufficient conditions for frequency synchronization of the multicell system is obtained under two general hypotheses. Numerical simulations confirm the validity of the results.
1
Introduction
Recently, collective dynamics in a population of cells communicated with each other through intercellular signaling have attracted much attention from many fields of biology and many researches have been carried out[1,2,3] . In the sight of biology, this type of collective dynamics is caused by intercellular signaling from identical and unreliable components, whereas bacteria display various social behaviors and cellular differentiations, such as quorum sensing in grampositive and gram-negative strains because of intercellular communications[4,5,6] . In order to study the theoretical mechanism of such phenomena, many interesting studies were carried out such as Ref.[2]. But most of studies are based on theory of synchronization and two hypotheses are adopted for the convenience of mathematical analysis[2,7] : (1). The network consists of identical cells, in other words, all the individual cells are with identical parameters; (2). The hypothesis of quasi-steady-state approximation of the network holds. Under the two hypotheses above, the authors give the conditions for complete synchronization. It is very difficult to study theoretically without the two hypotheses. Luckily, there are many different synchronization states such as frequency synchronization, which is more important than complete synchronization in the fields of biology. Plenty of phenomena such as the circadian rhythms of mammals[3] verify the importance of frequency synchronization. Therefore, we try to deduce
Supported by National Natural Science Foundation of China (70431002,10672093).
K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 21–27, 2007. c Springer-Verlag Berlin Heidelberg 2007
22
J. Zhang et al.
the theoretical mechanism of frequency synchronization caused by intercellular signaling without the two hypotheses mentioned above. In this direction, we give an introduction to the advances of phase synchronization. In order to study phase synchronization of a set of limit-cycle oscillators, some researchers proposed the phase reduced method[8,9,10] . Initially, the method transformed an oscillator under a small perturbation to a compact dynamical equation of phase. Then Kuramoto considered a network consisting of n identical subsystems with intrinsic frequency ωi and obtain the famous Kuramoto model[12] . Later[13] , phase synchronization in random sparse complex networks of small-world and scale-free topologies with long-range decayed couplings was studied. Recently, population of identical limit-cycle oscillators driven by common additive noise[14] and uncoupled limit-cycle oscillators under random impulses[15] was also studied. The results mentioned above show that the phase reduced method is a valid method to study phase synchronization. In our point, the more important collective dynamics in multicell systems or multicellular structures is frequency synchronization. The paper studies frequency synchronization of cells communicated with each other by intercellular signaling through the method mentioned above.
2
A Synthetic Multicellular Network
Now, we introduce a synthetic gene network in Escherichia coli[7] and all cells communicate with each other by signaling molecules based on the quorumsensing mechanism. The repressilator is a network of three genes, a, b, c, the products of which inhibit the transcription of each other in a cyclic way. Specifically (see Fig. 1), the gene c expresses protein C, which inhibits transcription of the gene a. The product of gene a inhibits transcription of the gene b, the protein product B of which in turn inhibits expression of gene b, completing the cycle. These bacteria exhibit cell-to-cell communication through a mechanism that makes use of two proteins, the first one of which (LuxI) synthesizes a small molecule known as an autoinducer (AI), which can diffuse freely through the cell membrane. When a second protein (LuxR) binds to this molecule, the resulting complex activates transcription of various genes, including some coding for light-producing enzymes. The scheme of the network is shown in Figure 1, and the detailed description is provided by Garcia-Ojalvo and others (2004). The dynamics of genes a, b, c and proteins A, B, C are given respectively as dai (t) αC = −d1i ai (t) + μC +C , m dt i (t) dbi (t) αA , dt = −d2i bi (t) + μA +Am i (t) dci (t) αS Si (t) αB dt = −d3i ci (t) + μB +Bim (t) + μS +Si (t) , dAi (t) = −d4i Ai (t) + βa ai (t), dt dBi (t) = −d5i Bi (t) + βb bi (t), dt dCi (t) = −d6i Ci (t) + βc ci (t), dt
(1)
Frequency Synchronization of a Set of Cells
Bi
23
Ci
cI (
bi
)
lac I (
ci
) AI
te tR ( a i )
Ai
Ci la cI (
ci
lu x I
)
AI
Ai
LuxR
AI AI
the ith cell CELL
CELL
CELL
AI
CELL
C ELL CELL
C ELL
Fig. 1. Scheme of the repressilators communicated with each other through autoinducer (AI), which can diffuse freely through the cell membrane
dSi (t) = −ds Si (t) + βs Ai (t), (2) dt where ai , bi , and ci are the concentrations of mRNA transcribed from genes a, b, and c in cell i, respectively; concentrations of the corresponding proteins are represented by Ai , Bi , and Ci , respectively. Concentration of AI inside each cell is denoted by Si . αA , αB , and αC are the dimensionless transcription rates in the absence of repressor. αs is the maximal contribution to the gene c transcription in the presence of saturating amounts of AI. Parameters βa , βb and βc are the translation rates of the proteins from the mRNAs. βs is the synthesis rate of AI. m is the Hill coefficient, and dji are the respective dimensionless degradation rates of mRNA or proteins for genes a, b, and c and the corresponding proteins in cell i. Consider that AI can diffuse freely through the cell membrane and denote ηs and ηe as the diffusion rate of AI inward and outward the cell, the dynamics of AI can be rewritten as follows, dSi (t) dt dSe (t) dt
= −ds Si (t) + βs Ai (t) − ηs (Si (t) − Se (t)), N Sj (t)−Se (t) = −de Se (t) + ηe . n j=1
(3)
24
J. Zhang et al.
Then the individual cells (1)-(2) communicated with each other through the signaling molecules and form a biological network (1)-(3). One can regard the signaling molecules in the extracellular environment as the (n+1)th node, which implies that the n cells communicated with the (n + 1)th node and construct a star-type network. Then we can apply the method to study frequency synchronization of oscillator networks to study frequency synchronization of multicell systems (1)-(3). The dynamics of most of biologic systems are periodic. For example, it can be concluded that system (1)-(2) exhibits limit-cycle oscillations in a wide region of parameter ds space from Fig. 2 of Ref[2]. Then denote xi = (ai , bi , ci , Ai , Bi , Ci , Si ) , it is reasonable to assume that each individual cells in network exhibits limitcycle oscillations. Let x0i (t) and Oi denote the stable periodic solution of individual cells and the orbit corresponding to x0i (t). In the light of phase reduced method, we can produce a definition of phase φi on the orbit Oi such that φ˙i (x0i ) = ωi . Extending the definition of φi to the absorbing domain Di of the orbit Oi , which is essentially the same as asymptotic phase[9] , one obtains φ˙i (xi ) = ωi , xi ∈ Di .
(4)
Then consider the network (1)-(3). The exact expression of Se is easily to obtain as follows, n ηe t −(de +ηe )t Se (t) = Se0 e + Sj (s)e(d+ηe )(s−t) ds. n j=1 0 Based on equation (3) and φ˙i (xi ) = gradxi φi · x˙ i , one gets ηe φ˙i = ωi +ηgradSi φi {Se0 1e−(de +ηe )t + n
n j=1
t 0
(de +ηe )(s−t) xm ds − Si (φi (t))}. j (φj (s))e
(5)
It is reasonable to assume the following hypothesis holds, especially for a periodic function Sj (φj (t)). φi as the gradient of φi along xm (H1 ). Denote gradxm i , there holds i t t (d+ηy )(s−t) φi xm ds = Gij (φi , φj ) e(d+ηy )(s−t) ds. (6) gradxm j (φj (s))e i 0
0
Motivated by Ref.[16], we can further assume that hypothesis (H2 ) holds and a numerical example has been given to illustrate the rationality of such hypothesis. (H2 )[16] There exists d + 1 constants φ01 , . . . , φ0d , ω such that the following conditions. 1. Denoting ϕi = ωt + φ0i , Δωi = ωi − ω, there holds, n ηq Δωi + Gij (ϕi , ϕj ) − ηHi (ϕi ) = 0. n j=1
(7)
2. G(φi , φj ) is differentiable and Gyij (φi , φj )|φ=ϕ = −Gxij (φi , φj )|φ=ϕ , where ∂ ∂ Gxij (φi , φj ) = ∂φ Gij (φi , φj ), Gyij (φi , φj ) = ∂φ Gij (φi , φj ). i j Then one obtain the following results.
Frequency Synchronization of a Set of Cells
25
Theorem 1. Under hypotheses (H1 ) and (H2 ), system (1)-(3) realizes collective rhythms if q is large enough. The detailed derivation and proof are similar to our recent work[16]. Though the conditions of the results relate with the initial values of system (2)-(4), but once some initial values are verified to satisfy Theorem.1, Theorem.1 still holds when the initial values deviates slightly. Therefore, Theorem.1 has many practical applications in the fields of biology.
3
Numerical Simulations
As far as the system mentioned in section 2 is concerned, we can not give the explicit expression of Gij (ϕi , ϕj ), Hi (φi ), but we can presume that Gij (ϕi , ϕj ), Hi (φi ) satisfy hypotheses (H1 ), (H2 ) because of the generality of the two hypotheses. Then we consider the multicell systems (2)-(4) consists of 6 cells with the initial values and parameters in Tab.1 and Tab.2.It has been shown thatparameters βa , βb , , βc and d4i , d5i , , d6i affects most markedly the oscillation frequency. Then all the repressilators oscillate at different frequencies because of the different values of β and d. Figure 2 shows that frequency synchronization Table 1. The initial values of the 6-cell system (2)-(4) n
1
2
3
4
5
6
ai0 8.2803 9.1756 1.1308 8.1213 9.0826 1.5638 bi0
1.2212 7.6267 7.2180 6.5164 7.5402 6.6316
ci0
8.8349 2.7216 4.1943 2.1299 0.3560 0.8116
Ai0 8.5057 3.4020 4.6615 9.1376 2.2858 8.6204 Bi0 6.5662 8.9118 4.8814 9.9265 3.7333 5.3138 Ci0 1.8132 5.0194 4.2219 6.6043 6.7365 9.5733 Si0 1.9187 1.1122 5.6505 9.6917 0.2374 8.7022 Se0 0.2688
Table 2. The parameters of the 6-cell system (2)-(4) n
1
2
3
4
5
6
αAi
2.9823 3.7146 3.8071 2.6168 2.8173 2.8836
βi
0.1302 0.1301 0.1303 0.1301 0.1304 0.1301
di
0.5008 0.5004 0.5012 0.5004 0.5015 0.5004
others αB = αC = 1.96, αs = 1.βs = 0.018, ds = 0.016, μA = μB = μC = 0.2, m = 4, ηs = 0.4.
26
J. Zhang et al.
can be realized, which confirms the validity of the results. The numerical simulations are consistent with the previous researches on system (2)-(4) which show that a network of coupled repressilators with identical parameters can realize e is large enough. complete synchronization when Q = ηeη+d e
ai:tetR in the ith cell
20
15
10
5
0 0
50
t→
100
150
1750
1800
ai:tetR in the ith cell
20
15
10
5
0 1650
1700
t→
Fig. 2. Time evolution of ai (tetR in the ith cell) of multicell systems (2)-(4) consists of 6 cells with different frequencies. Fig.a implies that the 6 cells oscillate at different frequencies when t ∈ [0, 150], but the 6 cells oscillate at identical frequency, different amplitudes when t ∈ [1650, 1800] (see Fig.b).
References 1. Wang, R., Jing, Z., Chen, L.: Modelling periodic oscillation in gene regulatory networks by cyclic feedback networks. Bull. Math. Biol. 67, 339–367 (2004) 2. Wang, R., Chen, L.: Synchronizing Genetic Oscillators by Signaling Molecules. Journal of biological Rhythms 20, 257–269 (2005) 3. Li, Y., Zhang, J., Liu, Z.: Circadian Oscillators and Phase Synchronization under a Light-Dark Cycle. International Journal of Nonlinear Science 1, 131–138 (2006) 4. Taga, M.E., Bassler, B.L.: Chemical communication among bacteria. PNAS 100, 14549–14554 (2003) 5. Weiss, R., Knight, T.F.: Engineering communications for microbial robotics. DNA 6, 13–17 (2000)
Frequency Synchronization of a Set of Cells
27
6. Chen, L., Wang, R., Zhou, T., Aihara, K.: Noise-induced cooperative behavior in a multi-cell system. Bioinformatics 21, 51–62 (2005) 7. Garcia-Ojalvo, J., Elowitz, M., Strogatz, S.H.: Modeling a synthetic multicellular clock: repressilators coupled by quorum sensing. PNAS 101, 10955–10960 (2004) 8. Guckenheimer, J.: Isochrons and phaseless sets. J. Math. Biol. 1, 259C273 (1975) 9. Kuramoto, Y.: Chemical Oscillations, Waves and Turbulence. Springer, New York (1984) 10. Winfree, A.T.: Biological Rhythms and the Behavior of Populations of Coupled Oscillators. J. Theoret. Biol. 16, 15–42 (1967) 11. Izhikevich, E.M.: Phase equations for relaxation oscillators. SIAM Journal on Applied Mathematics 60, 1789–1804 (2000) 12. Strogatz, S.H.: From Kuramoto to Crawford: Exploring the onset of synchronization in populations of coupled oscillators. Physica D 143, 1–20 (2000) 13. Li, X.: Phase synchronization in complex networks with decayed long-range interactions. Physica D 223, 242C247 (2006) 14. Teramae, J., Tanaka, D.: Robustness of the noise-induced phase synchronization in a general class of limit cycle oscillators. Physical Review Letters 93, 20 (2004) 15. Nakao, H., Arai, K., Nagai, K., Tsubo, Y., Kuramoto, Y.: Synchrony of limit-cycle oscillators induced by random external impulses. Phys. Rev. E. 72, 026220 (2005) 16. Zhang, J., Liu, Z., Li, Y.: An approach to analyze phase synchronization in oscillator networks with weak coupling. Chinese Physics Review Letters 24(6) (2007) 17. Pecora, L.M., Carroll, T.L.: Master Stability Functions for Synchronized Coupled Systems. Phycical Review Letters 80, 2109 (1998) 18. Jordi, G.O., Elowitz, M.B., Steven, H.S.: Modeling a synthetic multicellular clock: Repressilators coupled by quorum sensing. PNAS 101, 10955–10960 (2004)
A Stochastic Model for Prevention and Control of HIV/AIDS Transmission Dynamics Min Xu1, Yongsheng Ding1,2, and Liangjian Hu3 1
Glorious Sun School of Business and Management, Donghua University, Shanghai 200051, China 2 College of Information Sciences and Technology, Donghua University, Shanghai 201620, China
[email protected] 3 Department of Applied Mathematics, Donghua University, Shanghai 201620, China
Abstract. In this paper, we first present a stochastic model of the proportion of the population infected with HIV against total population, and prove the existence and uniquess of its solution. Through computer simulation, we forecast the proportion of the population infected with HIV against the total population in the transmission course of AIDS in China in next 20 years. Especially, we study the control index of the transmission rate β to obtain its effect on the epidemic trend of AIDS when it fluctuates. As such, we present a strategy to adjust β to reach a certain control aim based on the analysis of the mean value and variance of the proportion.
1 Introduction The AIDS epidemic is spreading very fast according to the report from the United Union. The number of people infected with HIV increased from 35 million in 2001 to 38 million in 2003. According to a report from the Ministry of Health of the People’s Republic of China, the number of people infected with HIV was 135630 at the end of September of 2005. More and more researchers focus on prevention and control of AIDS. And most of them work on the spread of HIV based on ordinary differential equations. Haynatzka [1] studied the spread of AIDS among interactive transmission. Castillo [2] formulated a group of ordinary differential equations and studied the incubation in the dynamics of AIDS. Blythe and Anderson [3-5] studied various models, including heterogeneity population among whom AIDS is spreading. Jacquez et al. [6-7] presented a compartmental model of homosexual population for AIDS spreading. Greenhalgh et al. [8] discussed a two group model for the susceptible and infected populations. However, the above models have not considered the stochastic attributes during the transmission course of AIDS. Roberts [9] formulated a stochastic differential equation model for a fatal epidemic’s. In this paper, based on the above model, we extend it for AIDS and prove the existence and uniquess of the solution. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 28–37, 2007. © Springer-Verlag Berlin Heidelberg 2007
A Stochastic Model for Prevention and Control of HIV/AIDS Transmission Dynamics
29
In our model, the control index of the transmission rate β has been changed from a constant in the deterministic model to a stochastic one in the stochastic process because of environment factor. And what’s more, the proportion of population infected with HIV against the total population in the future will also be a stochastic process correspondingly. In the deterministic model, a corresponding control aim can be obtained only if β is a certain constant. But it can not be obtained with 100% probability when the environment disturbance with a certain intension exists. In reality, the stochastic model can describe the transmission course exactly. Based on the formulation of the stochastic model, we analyses the proportion of the population infected with HIV against the total population in next few years under the environmental disturbance with a certain intensity. We present a method to adjust β to reach a certain control aim.
2 Stochastic Differential Equation Model of AIDS Transmission 2.1 The Stochastic Differential Equation Model When the infection rate of the disease is a constant, the proportion of the population infected with HIV against the total population satisfies the ordinary differential equation, dZ = ( p − 1) BZ + ( β C − α )(1 − Z ) Z dt
(1)
where Z is the proportion of the population infected with HIV against the total population; B is the birth rate independent on the total population; C is the contact rate between individuals; α is the increase of the death rate suffering from AIDS; p is the vertical distribution probability, which is more than 0 and less than 1. β is the constant transmission rate. Because of the environmental effect, β should be a stochastic process. We regard it as a Gaussian white noise β0 + ρη(t ) , and replace it in the Eq. (1). Then we get the stochastic differential equation (SDE) model for AIDS transmission, dZ = F ( Z )dt + G ( Z ) dW
(2)
F ( Z ) = ( p − 1) BZ + ( β 0 C − α )(1 − Z ) Z
(3)
G ( Z ) = ρC (1 − Z ) Z
(4)
Where
E (η (t )) = 0, D(η (t )) = 1,
β0 is the average transmission rate and ρ is the intensity of the
environmental disturbance. Z is a stochastic process reflecting the trend of fluctuation of the proportion under the environmental disturbance. 2.2 Existence and Uniquess of Solution Following, we prove the solution of the SDE model (2) is existent and unique. Theorem 1. [10] (Existence and uniquess of the solution of SDE model (2)) If X t satisfies n dimensional SDE
30
M. Xu, Y. Ding, and L. Hu
b( x, t ) = (b1 ( x, t ), b2 ( x, t ),..., bn ( x, t ))T
(5)
σ ( x, t ) = (σ ij ( x, t )) 1 ≤ i, j ≤ n
(6)
X 0 = x0 , x0 ∈ R n
(7)
The initial condition is
Assume that
b( x, t ) and σ ( x, t ) are continuous vector value function and matrix
value function about conditions below,
( x, t ) ∈ R n × [0, T ] , respectively, and satisfy the − − ⎧ ⎪ b( x, t ) − b( x, t ) ≤ c* x − x ⎪ ⎪ ο ( x, t ) − σ ( x− , t ) ≤ c x − x− * ⎨ ⎪ b( x, t ) ≤ c(1 + x ) ⎪ ⎪ σ ( x, t ) ≤ c(1 + x ) ⎩
Lipschitz
(8)
where c* and c are constants. Then the unique strong solution of Eqs. (1) and (2) exists in
(Ω, F , Ft , P) as far as any
x ∈ R n is concerned. Here, we always assume that Wt is Brown motion in the probability space (Ω, F , P ) , Ft is its natural σ algebra.
Theorem 2. Assume that p , B , β , α and C are positive real numbers. Then for any initial condition Z 0 (0 < Z 0 < 1) , there is a unique solution to Eq. (2). Proof. We just need prove the coefficients satisfy Lipschitz condition. Here, b( z, t ) = ( p − 1) Bz + ( β0C − α )(1 − z ) z
(9)
σ ( z, t ) = ρC (1 − z ) z
(10)
Then we verify that they satisfy the conditions in Theorem 1, −
−
−
−
−
_
b( z, t ) − b( z , t ) = ( p − 1) B( z − z ) + ( β 0C − α )[( z − z )(1 − ( z + z ))]
(11)
By means of three triangle inequality, we get −
−
b( z, t ) − b( z , t ) ≤ ( p − 1) B ( z − z ) + ( β 0C − α )[( z − z )(1 − ( z + z ))]
(12)
Noticing that z represents the proportion of the population infected with HIV against the total population, so −
z ≤ 1 . Then we get −
b( z, t ) − b( z , t ) ≤ 2 max( ( p − 1) B , 3( β 0 C − α ) ) z − z
(13)
A Stochastic Model for Prevention and Control of HIV/AIDS Transmission Dynamics
31
In addition, −
−
σ ( z , t ) − σ ( z , t ) ≤ 3 ρC z − z
(14)
c* = max(2 max(( p − 1)B , 3(β0C − α ) ),3 ρC )
(15)
So if we let
then the first and second conditions are proved. Next we prove that the third and fourth conditions are satisfied. b( z , t ) ≤ ( ( p − 1) B + β 0 C − α )(1 + z )
(16)
σ ( z , t ) = ρC (1 − z ) z ≤ ρC (1 + z )
(17)
c = max(( p − 1) B + β0C − α , ρC )
(18)
Then we put
So the third and fourth conditions are proved. As a result, the solution of Eq. (1) is existent and unique according to Theorem 1.
3 Strategy for AIDS Prevention and Control We can predict the proportion of the population infected with HIV against the total population in next few years with the application of SDE (2) under a certain intensity of the environmental disturbance. Compared with the deterministic one, it shows the trend of fluctuation. Since only a few of SDEs can have explicit solution [11-15], we can not obtain the explicit solution of the Eq. (2). So we simulate the proportion of the population infected with HIV against the total population by means of numerical solution. Furthermore, we analyses how to adjust β to reach a certain control under the condition of a certain intensity of the environmental disturbance. 3.1 Numerical Solution of the SDE The numerical solution methods include Euler method, Milsteins method, RungeKutta method and so on [16, 17]. We can get the approximation of the sample trajectory on the node one by one by a definite partition on the time interval with consideration of stochastic increment. An iterated expression can be obtained when we use Euler method. Z ( j , k ) = Z ( j − 1, k ) + F ( Z ( j − 1, K ))( t j − t j −1 ) + G ( Z ( j − 1, k ))[W ( j ) − W ( j − 1)]
(19)
j = 1,2,..., M ; k = 1,2,..., K
(20)
F ( Z ( j − 1, k )) = ( p − 1) BZ ( j − 1, k ) + ( β 0 C − α )[1 − Z ( j − 1, k )]Z ( j − 1, k )
(21)
Where
32
M. Xu, Y. Ding, and L. Hu
G ( Z ( j − 1, k )) = ρC (1 − Z ( j − 1, k )) Z ( j − 1, k )
(22)
j and k represent the time node and trajectory, respectively. M and K are the number of time node and sample trajectory, respectively. According to the expression above and the known condition, we can get Z (1, k ), Z (2, k ),..., Z ( M , k ) , the values of the proportion of the population infected with HIV against the total population on different time node as far as every given trajectory is concerned. Furthermore, the mean value of this process is, −
Z ( j) =
1 K
K
∑ Z ( j, k ),
1≤ j ≤ M
k =1
(23)
and the variance is, S 2 ( j) =
− − 1 K 1 K Z ( j, k ) 2 − [Z ( j )]2 , 1 ≤ j ≤ M ∑ (Z ( j, k ) − Z ( j))2 = K − 1 ∑ K − 1 k =1 k =1
(24)
3.2 Adjusting for the Control Index of the Infection Rate β We put t1 as the beginning year and t2 as the end year in the deterministic model. The proportion of the population infected with HIV against the total population Z (t ) on [t1 , t 2 ] is a curve with constant β. So the control can be obtained in the deterministic model if we put γ as the given control aim, where γ is a little more than Z (t 2 ) . But in fact, β is changed to a stochastic process because of the environmental disturbance. Correspondingly, Z (t ) is also changed to a stochastic process. We write it as Zt in order to avoid confusion in the deterministic model. Then we can get a series of curves, Z ( j ,1), Z ( j ,2),..., Z ( j , K ) , representing the proportion of the population infected with HIV against the total population on [t1, t2 ] . These curves show the character of probability of the process. In other word, Z t , the proportion of the population in2
fected with HIV against the total population on time t 2 , will have M results, Z ( M ,1), Z ( M ,2),..., Z (M , K ) corresponding to the end points of k trajectories. So we can compute the number of trajectories, H (β 0 ) which can reach the aim on time t2 (we put it as H (β 0 ) , for it only has something to do with β0 when ρ , the intensity of the disturbance is a certain value.), H (β 0 ) =
∑l
(25)
1≤l ≤ K
where l ∈ L = {l : Z ( M , l ) ≤ γ } . Then we can compute the probability of reaching the aim P , P=
H (β 0 ) K
(26)
Obviously, P equals to 100% in the deterministic model, while it may reach 50% or less than 50% under the condition of stochastic disturbance. In order to ensure P to
A Stochastic Model for Prevention and Control of HIV/AIDS Transmission Dynamics
33
reach a bigger value, we must decrease the average transmission rate β 0 , when the intensity of the disturbance is given.
4 Strategy for AIDS Prevention and Control Plenty of investigations demonstrate that the year of 1995 is a very important year in the history of AIDS epidemic and control. The number of people infected with HIV reached more than 15000 according to the statistics from health ministry of People’s Republic of China this year. We estimate the value of β based on the statistics of year from 1995 to 2015. And then we put the year of 1995 as the beginning point and use the model to forecast the trend of change of the proportion of the population infected with HIV against the total population in next 20 years. According to the statistics offered by Chinese center for disease control and prevention, the number of people infected with HIV was about 15000, and the number of total population was 1207.78 million in 1995 in China. So the proportion of the population infected with HIV against the total population was about 1.25 × 10 −5 in 1995. The vertical transmission probability from mother to child is 35%. The birth rate B is about 18.3556 × 10-3 according to the statistics from the same resource. Generally speaking, people will die one year after developing symptoms of AIDS. Therefore, α equals to 1, and C equals to 0.5256 [9,18]. Furthermore, we postulate that the parameters are also the same during this time period. Then we forecast the proportion of the population infected with HIV against the total population in next 10 years with Eq. (2) and we put M and K as 20 and 100 in Eq. (3), respectively. Firstly, we find the value of β to make the deterministic model satisfy the fact according to the number of people infected with HIV from year 1995 to 2005. By computing, we determine that β is 10. 6
x 10 -4
5
4
3
2
1
0
1995 1997
1999
2001
2003
2005
2007
2009
2011
2013
2015
Fig. 1. The proportion of the population infected with HIV against the total population under deterministic situation from 1995 to 2015 (Line 1 represents the simulation result, while line 2 represents the actual data)
34
M. Xu, Y. Ding, and L. Hu
In Fig. 1, line 1 represents the simulation result from the deterministic model, while line 2 represents the actual data from year 1995 to 2005. From it, we can see that the proportion of the population infected with HIV against the total population will reach about 6 ×10 −4 in 2015. While, we must use the stochastic model (2) for the transmission rate of the AIDS is disturbed by the environment noise. Fig. 2 demonstrates that the fluctuation of the proportion of the population infected with HIV against the total population from 1995 to 2015 when ρ is 0.5. We conclude that the proportion exceeds 6 × 10 −4 under some circumstances because of environmental disturbance. 8
x 10-4
7 6 5 4 3 2 1 0 1995
1997
1999
2001 2003
2005
2007
2009
2011
2013
2015
Fig. 2. The proportion of the population infected with HIV from 1995 to 2005, where ρ the number of sample trajectory is 10
= 0.5 and
We postulate that the control reaches below 6 × 10 −4 in the next 20 years. Then we can obtain the aim when β equals to 10 in the deterministic model. But the probability of obtaining such an aim is not as big as expected when the control index β equals 0
to 10. Table 1. The value of the end point of every trajectory when
Value(1× 10–3)
β0 β0
=9 =10
k=1
k=2
k=3
k=4
k=5
k=6
β0 k=7
0.34 0.31 0.37 0.33 0.34 0.35 0.42 0.71 0.61 0.54 0.63 0.56 0.62 0.42
is 9 and 10, respectively
k=8 k=9
k=10
0.51 0.38 0.49 0.70 0.63 0.39
We compute that the probability equals to about 40% with the application of numerical method of the stochastic model. Consequently, we must decrease β 0 in order to ensure the proportion not to exceed 6×10-4 with the probability of 95%. We find that β 0 should be put as 9 when we use numerical simulation and skip from 9.9 to 9.8 to 9.7, etc.
A Stochastic Model for Prevention and Control of HIV/AIDS Transmission Dynamics
35
Fig. 3 demonstrates the fluctuation of the proportion of the population infected with HIV from 1995 to 2005 when ρ is the same, while the average transmission rate β equals to 9. Fig. 4 demonstrates the mean value of the proportion of the population 0
infected with HIV from 1995 to 2005 when
ρ equals to 0.5 and β 0 equals to 9 and
10, respectively.
6
x 10-4
5
4
3
2
1
0 1995
1997 1999
2001
2003
2005
2007
2009
2011
2013
2015
Fig. 3. The proportion of the population infected with HIV from 1995 to 2005, where and, β 0 is 9. The number of sample trajectory is 10.
6
ρ is to 0.5
x 10-4
β =10 β =9 0 0
5
4
3
2
1
0 1995
1997 1999
2001 2003
2005 2007 2009
2011
2013
2015
Fig. 4. The mean value of the proportion of the population infected with HIV from 1995 to 2005, where ρ is 0.5 and β 0 is 9, respectively −4 We find that the mean value will reach about 6 × 10 in 1995 when
it will fall below 4×10
-4.
β0
is 10, while
36
M. Xu, Y. Ding, and L. Hu
5 Conclusions Based on the formulation of a stochastic model of the proportion of the population infected with HIV against the total population, we first prove its solution is existent and unique. we analyses the effect of the control index of the transmission rate on the proportion of the population based on the estimation of the distribution of the proportion from 1995 to 2015 in China with the application of stochastic model. Acknowledgements. This work was supported in part by Program for New Century Excellent Talents in University from Ministry of Education of China (No. NCET-04415), the Cultivation Fund of the Key Scientific and Technical Innovation Project from Ministry of Education of China (No. 706024), International Science Cooperation Foundation of Shanghai (No. 061307041), and Specialized Research Fund for the Doctoral Program of Higher Education from Ministry of Education of China (No. 20060255006).
References 1. Haynatzka, V.R., Gani, J., Rachevn, S.T.: The spread of AIDS among interactive transmission. Biosystems 73(3), 157–161 (2004) 2. Castillo, C.C., et al.: The role of long incubation periodic in the dynamics of acquired immunodeficiency syndrome—Single population models. Math. Biol. 7, 373–398 (1989) 3. Blythe, S.P., Anderson, R.M.: Distributed incubation and infections periods in models of transmission dynamics of human immunodeficiency virus (HIV). IMA J. Math. Appl. Med. Biol. 1–19 (1988) 4. Blythe, S.P., Anderson, R.M.: Variable infectiousness in HIV transmission models. IMA. Math. Appl. Med. Biol. 5, 181–200 (1988) 5. Blythe, S.P., Anderson, R.M.: Heterogeneous sexual activity models of HIV transmission in male homosexual populations. IMA J. Math. Appl. Med. Biol. 5, 237–260 (1988) 6. Jacquez, J.A., Simon, C.P., Koopman, J.S.: Structured mixing: Heterogeneous mixing by the definition of activity groups. In: Castillo-Chavez, C. (ed.) Mathematical and Statistical Approaches to AIDS Epidemiology. Lecture Notes in Biomath., pp. 301–315. Springer, Heidelberg (1989) 7. Jacquez, J.A., Simon, C.P., Koopman, J.S.: The reproduction number in deterministic models of contagious disease. Comments Theor. Biol. 2, 159–209 (1988) 8. Greenhalgh, D., Doyle, M., Lewis, F.: A mathematical model of AIDS and condom use. IMA J. Math. Appl. Med. Biol. 18, 225–262 (2001) 9. Roberts, M.G., Saha, A.K.: The asymptotic behavior of a logistic epidemic model with stochastic disease transmission. Applied Mathematics Letters 12, 37–41 (1999) 10. Friedman, A.: Stochastic Differential Equations and Their Applications. Academic Press, New York (1976) 11. Okesendel, B.: Stochastic Differential Equations. Springer, Heidelberg (1985) 12. Kloeden, P.E., Platen, E.: Numerical solution of stochastic differential equations. Springer, Heidelberg (1992) 13. Saito, Y., Higham, T.: Stability analysis of numerical scheme for stochastic differential equations. SIAM, Numer. Anal. 33, 2254–2267 (1996)
A Stochastic Model for Prevention and Control of HIV/AIDS Transmission Dynamics
37
14. Higham, D.J.: Mean-square and asymptotic stability of the stochastic theta method. SIAM, Numer. Anal. 38(3), 753–769 (2000) 15. Ryden, T., Wiktorsson, M.: On the simulation of iterated Ito integrals. Stochastic Processes and their Applications 91(1), 116–151 (2001) 16. Burrage, K., Burrage, P., Mitsui, T.: Numerical solutions of stochastic differential equations-implementation and stability issues. Computational and Applied Mathematics 125, 171–182 (2000) 17. Slominski, L.: Euler’s approximations of solutions of SDEs with reflecting boundary. Stochastic Processes and their Applications 94(2), 317–337 (2001) 18. Liu, M.X., Zhou, Y.C.: An age-structured dynamic model of HIV. Journal of North China Institute of Technology 25(2), 25–30 (2004)
Simulation of Artificial Life of Bee’s Behaviors Bin Wu, Hongying Zhang, and Xia Ni School of Information Engineering, Southwest University of Science and Technology, Sichuan, Mianyang 621002, China {wbxn,zhanghongying,nixia}@swust.edu.cn
Abstract. Based on the ‘bottom-up’ programming approach in Artificial Life, Finite-State Machine is adopted to describe the behavior of the individual bee, and the behavior in different period was realized in this paper. As a result, the interaction of the individual bee each other, individual bee and virtual environment produce the Emergence of swarm’s collective behaviors. Finally, we apply the graphical interfaces to realize the simulation.
1 Introduction Artificial life is a charming topic in researching of complexity. Bionic system of life phenomenon is an important research topic in Artificial Life. All kinds of complicated life phenomenon are reappeared through setting up bionic system of life phenomenon, which can help us to understanding the essence of life. Biologic behaviors in the nature are rich and colorful, especially, biologists are more and more interested in the complicated colony behaviors of insects. Recently, many of the biologists have thought that the communications of all these insects were based on the theory of cooperation without communication. The basic idea of this theory is that each of the insects adjusts its behavior according to the changes of its surroundings without any leaders and obvious communications of each other. Based on this, this entire colony can complete very complicated tasks. In general, individual behavior is sample, blind and random, while colony behaviors are coherent, fluent and accordant [1]. According to the above theory, the behaviors of the swarm are simulated by artificial life in this paper. We can see that although the behavior of the individual bee is signal, it emerges complicated structures of the swarm under the interaction of the bees and their surroundings.
2 Behavior of Bees Although the behavior of the individual bee is sample and disorderly, the behavior of the swarm is complicated and orderly. A queen bee, many worker bees and a few drones make up of a swarm. Their conformations and functions are different. They divide the work and cooperate with each other. Queen bee is the only bee which can lay eggs in a swarm. Generally speaking, she can live from three to five year after successful amphimixis. In her life, queen bee K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 38–45, 2007. © Springer-Verlag Berlin Heidelberg 2007
Simulation of Artificial Life of Bee’s Behaviors
39
goes through three phases. In the first phase, she becomes eclosion imago from eggs. During this period, the main task of the queen bee is eating and flying around the honeycomb. After twenty-one days, the queen bee step into her second phase. She begins to fly out and copulate with drones. When the queen bee is entire impregnation, she steps into her third phase.In this period, the queen bee does nothing but give birth to children. Drone is a male individual in the swarm which is developed by unoosperm in the male comb. His only task is copulated with the queen bee to multiply their offspring. The number of the drone is about two percent of the total number of the swarm. The age of the drone is about two months. The drone goes through two phases in his life. In the first phase, it speeds the drone about twenty-four day to become eclosion imago from eggs. After twelve days, the drone is mature. During this period, the main task of the queen bee is eating and flying around the honeycomb. After that, the drone steps into his second phase. He copulated with the queen bee. Once the amphimixis is successful, the drone is died at once. Worker bee is a female individual which developed by the germ cell. With the age be increasing, the worker bee takes on all works in the comb. The number of the worker bee is very large, it occupies the ninety-eight percents of the total number of the swarm. The age of the worker bee is about four or six weeks in the summer and three or six months in the winter. It speeds the worker bee about twenty-one days to become eclosion imago from eggs. In this period, the main tasks of the worker bee are keeping the comb clean, feeding the baby bees, excreting the queen bee milk, building the comb, and guarding. After twenty-one days, the worker bee flies out their comb and pick the farina, which is the last work in her life. Although the bees live in a colony life, the swarms do not communicate with each other. The bees have the abilities of guarding their comb in order to resisting the outside swarms and insects. If the outside swarms thrust themselves in the comb, the bees which guard the comb must be fighted with them until the outside bees are running away or died. On the contrary, when all the bees are in the outside combs, such as in the flowers or in the watering place, different swarms are not hostile and interference to each other. If the queen bee flies in the outside swarm, she must be killed by the worker bees. On the contrary, if the drone flies in the outside swarm, the worker bees do not hurt him. It is may be show that the swarm wants to avoid propagating with close relative to make their offspring better[2].
3 The Basic Ideas 3.1 Adaptability Ashby, who is the one of the outrunner of the control theory, has understood in the 1950 that it is must be look on the controlled objects and their surroundings as a whole. When researching on a control system, we not only establish the model of the controlled objects, but also the model of objects and their surroundings. The viewpoint of Ashby is in fact that we study the behaviors of the organism from the point of adaptability. To research the adaptability in artificial intelligence is more and more important[1].
40
B. Wu, H. Zhang, and X. Ni
The individual, which make up of the artificial life system, is a self-determination body which has the abilities of self-study. It has the characteristics of selfadaptability, self-organization and evolution. All these characteristics represent action and counteractive between the self-determination body and surroundings. Through the interaction of the elf-determination body each other and the self-determination body and surroundings, the behaviors which are applied to different environments will be selected by training and studying local information. All these embody macroscopically the intelligent performance of emergence[3]. 3.2 Emergence Unlike designing car or robot, artificial life is not beforehand designed. The most interesting example of artificial life exhibits the behavior of ‘emergence’. The meaning of ‘emergence’ is that the interactions of many simple units produce remarkably total performance in the complicated environment. In artificial life, the exhibition of the system is not deduced by the genotype, where the genotype represents sample rules of the system running, the exhibition represents total emergence of the system. Computer speaking, ‘bottom-up’ programming approach allows emerging newly unpredictable phenomenon on the up level. This phenomenon is the key to life system[4]. In this paper, we apply ‘bottom-up’ programming approach to simulate the behaviors of the individual bee in order to emerge the total behaviors of the swarm. 3.3 Finite-State Machine Finite-state machine is a mechanism which made up of finite state. A state in it is called current state. Finite-state machine can receive an input which will result in occurring state transformation, that is, the state transforms from current state to output state. This state transformation is based on a state transformation function. After finishing the state transformation, the output state is become current state. In the state transformation figure of finite-state machine, vertex is represented as state, arrowhead is represented as transfer because it describes FSM transforming from a state to another state. Label text of the transfer is divided into two parts by a bias, which the left part is the name of touching off transfer event and the right one is the name of behavior after touched off transfer. In this paper, we apply finite-state machine to describe the behavior of bee individual in the simulation.
4 Simulations In this section, we will apply the graphical interfaces to realize the simulation, which can visually show the behaviors of virtual swarm in virtual environment at different phases. 4.1 Finite-State Machine The object of our simulation is to simulate the behavior of bees with the method of artificial life. So we not only establish the model of bees, but also place these bee individuals into a dynamic virtual environment. When the environment changes, the
Simulation of Artificial Life of Bee’s Behaviors
41
individuals produce different behaviors vary from different environments. This phenomenon shows that the individuals are adapted to the environment. The environment and the bees can be looked as two objects which oppose and contact with each other. Based on these, we can establish a simulation system through a layered structure, this can be shown in Fig.1. Main program Module Management
Resource Module
Comb Module
Bee Module
flower
Comb
Bee
Fig. 1. Layered structure of the program
4.2 Realization of the Program The best method of implementing the synthesis of artificial life is the ‘bottom-up’ programming approach. The basic idea of this method is that we define a lot of small units in the bottom and several simple rules which relative to their inner and local interaction, then coherent collectivity behavior is produced from the interaction. This behavior does not beforehand organize based on the special rules. The aim of the simulation in our paper is not aim at implementing the behaviors of the swarm but implementing the individual behavior of bee. Although the individual behavior of the artificial bee is single, it produces complicated behaviors of the swarm under the interactions of many individual bees. (1)Establishing the Virtual Environment As we know, the behaviors of the bees are relative to the changes of the seasons or the temperature. So the first task of our simulation is to realize the changes of the seasons in the virtual environment. In general, the life of the swarm in one year can be divided into some periods[2]. According to this, we can divide the seasons as follows: Early spring: February. The queen bee begins to lay eggs in this period. If the queen bee lays eggs, the temperature must be beyond 10 . Based on this, we set the temperature of February 10 ~15 .
℃ ℃
℃
Spring: from March to May.
℃ ℃
Summer: from June to August. We set the temperature of March to July 15 ~35 and August 35 ~25 .
℃ ℃
Autumn: from September to October. We set the temperature of this period 25 ~15 .
℃ ℃
℃ ℃
Late autumn: November. We set the temperature of this period 15 ~5 Winter: from December to January in the following year.
42
B. Wu, H. Zhang, and X. Ni
Fig. 2. The relation between season and temperature
Fig.2 describes the temperature distribution in our simulation. Based on this figure, we can establish a simple function of the temperature change. In the virtual environment, we also simulate the food of the virtual bees, that is, establish the flower. In our simulation, we set some parameters to describe these flowers, such as the location of the flower (X, Y), florescence(BloomingP), the number of farina (UnitFPN), color and size. (2)The Behavior of Bees In our simulation, we describe the behaviors of the queen bee, the worker bee and the drone through finite-state machine. 4.3 The Interface of the Simulation The main interface of the simulation program is shown as Fig.3. The simulation program is divided into five parts: the control of original parameter, data statistic and
Fig. 3. The main interface of the simulation program
Simulation of Artificial Life of Bee’s Behaviors
43
analysis, establishing new comb, motion comb and motion monitor. The user can set original parameters through parameter control panel, such as season, month, the performance of the queen bee, the number of swarm in the system, and the total number of the bees. We set different original parameters in order to observe the affects of these parameters. During the program running, if you put down the button of “establish the new swarm”, you can add the swarm arbitrarily. The aim of this button is that you can observe the affects when the new swarm adds into the virtual environment. 4.4 The Results of the Simulation In this section, we give an experiment of moving bee colony. Firstly, we set the initial values which are as follows: Season: Spring; Month: April; Queen bee is impregnation; The number of the comb: one; The number of the bee: eighty The number of the farina in comb: ten thousands The number of the flower: one hundred; The content of the farina in one flower: one thoudred The position of the comb is (5496, 4577) which is established randomly by the system. Fig.4 (a) gives the emluator running statement after eleven days. From this picture, we can see that the number of flower is very small around the comb. This can affect the number of the farina in the comb. After that, we move the comb to a new position (1000, 5000). The picture after moving the comb is as shown in Fig.4(b). Just on this, the swarming is happened. Fig.4(c) is the statement picture of the original swarming. After producing the new queen bee, the old queen bee and some worker bees leave the comb to build a new comb which new position is at (6310,4769). Fig.5 is the increasing picture of the farina in the comb after twenty days. In this picture,
(a) Fig. 4. The progress picture of emluator running
44
B. Wu, H. Zhang, and X. Ni
(b)
(c) Fig. 4. (continued)
Fig. 5. The schedule graph of pollen in hive when emluator run for 20 days
there are two curves, C1 and C2. C1 is represent as the increasing curve of the farina of the old comb, while C2 is represent as the increasing curve of the farina of the new comb after swarming.
Simulation of Artificial Life of Bee’s Behaviors
45
5 Conclusions In this paper, we apply the graphical interfaces to realize the behaviors of the swarm in the virtual environment. The proposed program emerged the collectivity behaviors of the whole swarm through implementing the individual behaviors of the bees. That is not simple simulation but is virtual swarm which has artificial life. In the visual interface, we can see that new swarms produce and old swarms die. All of these are the results of the interactions of individual bees each other. If the program runs forever, we must be think that it is an ecosystem of the bees.
References 1. Zhou, D.Y., Dai, R.W.: Adaptability Behavior and Simulation. Journal of System Simulation 6, 578–583 (2000) 2. Sun, Y.X.: The Bees. Chinese curatorial Press, Beijing (2001) 3. Wu, J.B.: Emergent Behaviors of Autonomous Agents. Micromachine Development 6, 6–8 (1997) 4. Li, J.H.: Artificial Life: Explore New Form of Life. Research on Nature Dialectic 7, 1–5 (2001)
Hybrid Processing and Time-Frequency Analysis of ECG Signal Ping Zhang, Chengyuan Tu, Xiaoyang Li, and Yanjun Zeng* Beijing University of Technology, P.R. China 100022
[email protected]
Abstract. A new simple approach basing on the histogram and genetic algorithm(GA) to efficiently detect QRS-T complexes of the ECG curve is described, so as to easily get the P-wave (when AF does not happen) or the f-wave (when AF happens). By means of signal processing techniques such as the power spectrum function the auto-correlation function and cross-correlation function, two kinds of ECG signal when AF does or does not happen were successively analyzed, showing the evident differences between them.
,
1 Introduction The ECG (electrocardiogram) is a standard clinical examination method and as such a valuable tool in the diagnosis of cardiac disease. A key element in the automatic analysis of the ECG signal is the detection of QRS complexes [1]. Once the QRS complexex is detected, one can analyse the patient’s HRV (heart rate variability) along with a number of other parameters which are of diagnostic significance. ECG processing strategies which have been applied to detect QRS complexes are based on various methods such as band-pass filtering, adaptive filtering, nonlinear adaptive filtering, etc. [1-2]. Owing to the complexity and considerable computational expense associated with these filtering methods, their applicability (particularly, in clinical practise) is restricted. Recently, along with the development of analysis techniques by means of wavelets, a number of methods based on wavelet transform for detecting QRS complexes have been put forward[1,3-9]. These methods demand, however, the availability of a suitable mother-wave reflecting the properties of the signal to be analysed so that the important local features of this signal in the time domain as well as in the frequency domain are all preserved after a wavelet transform. Moreover, these methods rely on the definition of a suitable scale so that the signal abruptness may be preserved. If these conditions are not fulfilled, it is quite possible that there appears an omission (a QRS complex is not detected) or a mistake (something detected is not a QRS complex). In order to find a better solution to these problems, a simple and effective method based on histogram and the improved GA to search and detect QRS complexes has been developed by the authors. And as an example of its novel application, the P-wave can be extracted easily and efficiently from ECG curve whenever AF is absent or the f-wave can be analogously extracted when AF occurs. *
Corresponding author.
K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 46–57, 2007. © Springer-Verlag Berlin Heidelberg 2007
Hybrid Processing and Time-Frequency Analysis of ECG Signal
47
2 Detection of QRS Complexes from the ECG Curve 2.1 Fast Search for QRS Complexes The ECG plot appeared on a computer screen may be considered as a two-dimensional picture consisting of elements, organised in rows and columns, with value 0 (corresponds to white) and 1 (corresponds to black). The parameter QOP (quantity of pixels) is defined as the total number of picture elements per column, from which a histogram over the entire plot is obtained. The information embodied in the histogram can be used for a rapid detection of QRS complexes The ECG signal appears as a curve in the t-(time-) domain is shown in Fig. 1. For the further analys, we define several technical terms: X ECG picture matrix (binary: 0 corresponds to white, 1 corresponds to black)
Fig. 1. Typical ECG waveform
x(i, j) the related element of the matrix X d(j) QOP(the total number of picture elements of the related column) i, j the row coodinate and column coordinate of the element x(i, j) thereby
d ( j ) = ∑ x(i, j ) i
(1)
D={d(j)} where the set D = { d(j)} appears as a vector. The range of i and that of j are determined by the size and resolution of the ECG plot considered. For a typical ECG frame on a computer screen, we generally let: i = 1, 2, . . . , 600; j = 1, 2, . . . , 1200 Figure 2 shows the picture corresponding to the set D, namely the histogram, in which the abscissa is time, and the ordinate is QOP. Figure 3 shows the histogram after thresholding, i.e., the picture corresponding to a new set Dth obtained by means of selecting a suitable threshold value d0 (see Eq. (2)).
48
P. Zhang et al.
Fig. 2. The histogram corresponding to Fig. 1
Fig. 3. The threshold-form of Fig. 2
Dth={dth(j)} dth(j)= d(j), when d(j)≥d0 dth(j)=0 , when d(j)
(2)
Comparing Fig.1, Fig.2 and Fig.3, we easily see that the relative maxima in Fig. 3 correspond to the peaks of the underlying R-wave. Let n(k) denote the column coordinate of the peak of k-th R-wave, then a new set, N, constructed from these column coordinates, is obtained as follows N = {n(k)} = {n(1), n(2), . . .}
(3)
2.2 Criterion to Detect QRS Complexes Let the size of picture matrix, X, be K ×L (row × column), and that of the figure matrix Qqrs of the QRS mould (Fig. 3) be K1 ×L1 (elements qij ), and the characteristic vector of the mould be Dqrs :
Dqrs = [ y j ] j =1,2,LL, L1 ,
K1
y j = ∑ qij i =1
(4)
Then we can easily find out by means of this mould the positions of related QRS complexes through a few sequential searching at the corresponding coordinates in Fig. 1, as expressed by Eq. (3). The criterion utilized in this search process is related to the quantity of unmatched points, ek, and a new characteristic vector, Dk. An unmatched point is defined as such a point in the searched region where a difference exists between the values of the
Hybrid Processing and Time-Frequency Analysis of ECG Signal
49
corresponding elements of the two matrix Qqrs and Qk. Here,Qk is a figure matrix with the same size as the mould matrix Qqrst. As for the quantity of unmatched point ek, it can be calculated as follows:
∑ (Q xorQ k
qrs
)
(5)
where xor denotes an exclusive-or operation. This operation is carried out between the corresponding elements of ek and those of Qqrs and results in a new matrix. Thereafter we perform a ∑ operation (summation operation) with all the elements of this new matrix. As for the characteristic vector Dk, it is a set whose elements corresponds to QOP of the k-th column in the sub-region to be searched. The matching principle is as follows: A pattern that satisfies the following two requirements is just the QRS complex we want
(1-cp)*Dqrs ≤Dk≤(1+cp)* Dqrs, cp∈(0,1) e = min{ ek }k=1,2,…≤ ε
(6)
where ε is the given tolerance of e. 2.3 Using an Improved GA to Determine the Value of CP and Which Mould Will Be Adopted In this paper, we developed an improved GA, which is distinguished by its three new operations: restoration, reconstruction and recording-the-better, and by its ability of auto-adjustment of the parameters pc and pm(as shown in step 4 of the algorithm below) so as to converge to a globally-optimal solution without any influence on the stochastic ability of the following search process. This GA can be expressed as
: GA = { Se,Cr,Mu,Rs,Rc,Rb,Ed,fv }
(7)
where the elements of this set are Se......selection; Cr......crossover; Mu......mutation ; Ed......encoding, decoding; Fv......evaluation of fitness; Rs......restoration; Rc......reconstruction; Rb......recording-the-better
(8)
Now, we introduce several expressions of the functions of fitness, such as f((i,t)≡ f(xi(t))...... fitness of the i-th individual xi(t) at moment t
(9)
the better fitness at moment t
f m (t ) ≡ min{ f (i, t )} i
fm≡ the fitness (known as recording the better one) stored in the computer f*m (t)←fm(t) whenever fm (t)< fm ε= the given tolerance of fm and the meanings of these three new operations are as follows:
(10)
50
P. Zhang et al.
restoration Rs: whenever a crossover or a mutation fails( as fm(t)>fm ), we reject this unfavorable operation, and let the population state return to its state previous to that operation. reconstruction Rc: while a crossover or a mutation fails many times, let the population be reconstructed. recording the better Rb: whenever a crossover or a mutation succeeds( as fm(t)
Hybrid Processing and Time-Frequency Analysis of ECG Signal
51
ELSE GOTO
step 3;
END ELSE GOTO END
step 6;
END step 5(recording-the-better): f*m(t) ←fm(t); fm ←f*m (t); t ← t+1; k ←0; GOTO step 3; step 6: restoration; GOTO step 4; step 7: reconstruction; GOTO step 3; step 8: decoding; output the solution; step 9: stop. In the algorithm described above, t denotes the number of times of iteration, whereas T is the upper limit of t (as any real-time optimization process adopts a finite t), n denotes how many times the crossover or mutation successively failed, and N is the upper limit of n; k denotes how many times the restoration failed, and K is the upper limit of k. As for the algorithm of a crossover or a mutation, refer to reference [10]. It is worthy to point out that, generally, N=0.001~0.01T for a large T value, whereas N=0.01~0.2T for a small T value. A suitable value must be also chosen for K, and it may be K=0.01~0.1T. Applying this improved GA to optimize the selection of QRS-T moudle and the value of cp, there are two parameters we must deal with: the series number of QRS-T moudle and the value of cp. With the data from MIT/BIH database and that from the Beijing Chao-Yang Hospital, we took 100 QRS-T moudles. And from domain (0, 1), we took 50 numeral values. All these 100×50=5000 data together constructs the original population group under our consideration. Another two parameters of the above GA may be choosen as population size Popsize =100 length of the chromosome string L =13 As regards the function of fitness, the following is preferable: f(t) = e . where e is defined as the same as in Eq.(6).
(11)
52
P. Zhang et al.
3 Detection Rate With the data from MIT/BIH database and those from the Beijing Chao-Yang Hospital, we make a random sampling of 20070 times of cardiac cycle from 32 ECG plots, and try to detect one by one the QRS complex of each period by means of the method developed by the authors as described above. The results are as follows Table 1. The results of QRS complexes detection
times of cardiac cycle 20070
numbers of mistake 37
numbers of omission 42
detection rate 99.61 %
* detection rate =[times of cardiac cycle-(number of mistake+number of omision)]/ times of cardiac cycle.
4 Extracation of the P-Wave and F-Wave from ECG Curve The authors developed a novel application of the above method as follows. While a QRS-T complex (instead of a QRS complex) has been identified, remove it from the ECG curve. As the U-wave can be negligible, the remainder is nothing but the P-wave when AF (atrial fibrillation) does not happen, or is the f-wave when AF happens.
(a) P-wave
(b) f-wave Fig. 4. P-wave and f-wave
Hybrid Processing and Time-Frequency Analysis of ECG Signal
53
As shown in Fig. 4, the P-wave appears to be a series of smallwave-packets with relatively concentrated energy, whereas the f-wave takes the form of a series of irregular oscillations with their energy quite dispersed. 4.1 The Signal Processing and Analysis of ECG Signal Data used in our work are taken from MIT/BIH database as well as from Beijing Chao-Yang Hospital, among them there are 50 examples wherein AF happened, and the other 50 ones wherein AF did not happen. Our methods of signal processing and analysis relate to many techniques: the power spectrum function, the auto-correlation function and the cross-correlation function. Curves corresponding to the related functions have been normalized so that we can easily deal with them. 4.2 The Power Spectrum After a FFT operation on ECG's data, we can discover the related amplitude-frequency characters, so as to draw the ECG's power spectrum curve as shown in Fig.5. It is easily seen that the spectrum lines are uniformly distributed and the spectrum valleys stay at or near the zero value points when AF does not happen, whereas those lines distributed quite non-uniformly and the valleys stay somewhat far away from the zero value points when AF does happen.
(a) when AF does not happen
(b) when AF happens Fig. 5. The power spectrum
54
P. Zhang et al.
Now we discuss the dynamic frequency spectrum. For each sampling, ECG signal of eight sequential time intervals (10 seconds as one interval) is as group of data prepared. As QRS-T complexes are removed, the remainder will be P-wave( when AF does not happen ), or be f-wave( when AF does happen ), and then we do a FFT operation on each sample and draw a corresponding dynamic frequency spectrum, as shown in fig.6. Comparing Fig.6 (a) with Fig.6 (b), the difference between these two kinds of dynamic spectrum is very evident. The energy distribution of f-wave looks concentrating in quite an arrow interval (such as 0-5 Hz) when AF happens, whereas that of P-wave looks dispersing within a relatively wide interval when AF does not happen. Moreover, f-wave's dynamic frequency spectrum curves in different time interval differ one another, showing the irregularity of f-wave. 4.3 The Auto-correlation Function The auto-correlation function is utilized for describing the degree of correlation of values of a signal sampled at different time, being a kind of measurement of the degree of interrelation between these values. Let the sampling-series of values of ECG signal be x(n), its auto-correlation function is then as follows:
1 N →∞ 2 N
Rxx (m) = lim
N
∑ x ( n) x ( n − m)
n =− N
(12)
This function, shown as a corresponding curve, is given in Fig.7 (a) when AF does not happen, or in Fig.7 (b) when AF happens.
(a) when AF does not happen
(b) when AF happens Fig. 6. The dynamic frequency spectrum
Hybrid Processing and Time-Frequency Analysis of ECG Signal
55
(a) when AF does not happen
(b) when AF happens Fig. 7. The auto-correlation function
Comparing Fig.7 (a) with Fig.7(b), we see that the amplitudes on the auto-correlation curve descends rapidly with their amplitude ratio(Rmax1/Rmax2) >4.5 when AF happens, whereas that descends relatively slowly (Rmax1/Rmax2)<3 when AF does not happen.
5 The Cross-Correlation Function The cross-correlation function denotes the correlation between the values of two different signals sampled at different time.
1 N →∞ 2 N
Rxy (m) = lim
N
∑ x ( n) y ( n − m)
(13)
n =− N
Let the sampling series of the signal values of QRS-T mould be y(n), and that of ECG signal be x(m), the cross-correlation function is then as follows: This function, shown as a corresponding curve, is given in Fig.8 (a) when AF does not happen, or in Fig.8 (b) when AF happens. It is easily seen from fig.8, that the degree of correlation between ECG signal and the QRS-T mould signal appears to be relatively maximal at points where QRS-T complexes occur, and that on the cross-correlation curve there appear all the corresponding waves quite similar to the QRS-T complexes. It is also seen that, when AF does not happen, QRS-T complexes separate each other relatively uniformly, and
56
P. Zhang et al.
(a)
when AF does not happen
(b)
when AF happens
Fig. 8. The cross-correlation function
within the spaces between these complexes there are some relatively regular oscillations; whereas whenever AF happens , the QRS-T complexes separate each other non-uniformly and there are a series of very irregular oscillations within the spaces between two successive complexes.
6 Summary The authors developed a simple and effective new approach to remove QRS-T complexes from ECG curve, so as to easily get the P-wave when AF does not happen, or gain the f-wave when AF happens. By means of the signal processing methods used in this paper such as the power spectrum function the auto-correlation function the cross-correlation function etc., the distinctions between these two kinds of ECG curve, even too minor to be observed by a common people, do appear obviously on its frequency-domain pattern, which enable us to easily diagnose whether AF happens or not.
、
、
References 1. Kohler, B.U., Henning, C., Orglmeister, R.: The Principles of Software QRS Detection. IEEE Engineering in Medicine and Biology, pp. 42–57 (January/February 2002) 2. Keselbrener, L., Keselbrener, M., Akselrod, S.: Nonlinear high pass filter for R-wave detection in ECG signal. Med. Eng. Phys. 19(5), 481–484 (1997)
Hybrid Processing and Time-Frequency Analysis of ECG Signal
57
3. Gamo, C., Gaydecki, P., Zaidi, A., Fitzpatrick, A.: An implementation of the wavelet transform for ECG analysis, Advances in Medical Signal and Information Processing, 2000. In: First International Conference on (IEE Conf. Publ. No. 476), 4-6 September, pp. 32–40 (2000) 4. Li, X.Y., Wang, T., Zhou, P., Feng, H.Q.: ST-T complex automatic analysis of the electrocardiogram signals based on wavelet transform. In: IEEE 29th Annual, Proceedings of Bioengineering Conference, 2003, 22-23 March, pp. 144–145 (2003) 5. lnoue, H., Iwasaki, S., Shimazu, M., Katsura, T.: Detection of QRS complex in ECG using wavelet transform (in Japanese). In: IEICE Gen. Conf., vol. 67(A-4), pp. 198–207 (March 1997) 6. Kadambe, S., Murray, R., Boudreaux-Bartels, G.F.: Wavelet transform -based QRS complex detector. IEEE Tran. Biomed. Eng. 46, 838–848 (1999) 7. Bahoura, M., Hassami, M., Hubin, M.: DSP implementation of wavelet transform for real time ECG waveforms detection and heart rate analysis. Computer Methods and program in Biomedicine 55(1), 35–44 (1997) 8. Sahambi, J.S., Tandon, S.N., Bhatt, R.K.P.: A new approach for in-line ECG characterization. In: Proceeding of Fifteenth Southern Biomedical Engineering Conference, Dayton, USA, pp. 409–411 (1996) 9. Sahambi, J.S., Tandam, S.N., Bhatt, R.K.P.: Using wavelet transform for ECG characterization. IEEE Eng. Med. Biol. 16(1), 77–83 (1997) 10. Mitsuo, G., et al.: Genetic Algorithms and Engineering Design[M]. John Wiley & Sons Inc., New York, USA (1997)
Robust Stability of Human Balance Keeping Minrui Fei1, Lisheng Wei1, and Taicheng Yang2 1 Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronics and Automation, Shanghai University, Shanghai 200072, China
[email protected],
[email protected] 2 Department of Engineering and Design, University of Sussex, Brighton, BN1 9QT, UK
[email protected]
Abstract. Despite its apparent simplicity, the nature of the control mechanisms that allow humans to stand up is still an object of great research interest [1-9]. In a recent paper [1], a PID controller with a sensory delay is proposed to model the central nervous system for the control of balance keeping; and based on experiment data it is suggested that ageing people with poor balance keeping ability can be explained by a reduced derivative gain in the PID controller. Using the models presented in [1], this paper studies a further topic: robust stability of human balance keeping. Computation methods to find the ranges of the controller parameters for which the closed loop system is stable, and with a given stability phase or gain margin, are presented. An example is used in the paper to demonstrate the application of these methods.
1 Introduction Human bipedal stance is inherently unstable since it requires that a large body consisting of multiple flexible segments is kept in an erect posture with its center of mass (COM) located high above a relatively small base of support. The complexity of this system and its ability to maintain stable stance, despite various perturbations, have attracted the attention of many researchers in the field and have inspired various theories that try to explain the control mechanism of bipedal quiet stance, for example see [2] and references therein. The ankle joint torque needed to control the body during quiet stance can be evoked actively and passively. Passive torque components are the result of the intrinsic mechanical property, i.e. stiffness and/or viscosity, produced by muscle and surrounding tissue, such as ligaments and tendons. We can refer to the additional torque as active torque, which is generated by active muscle contraction. Since the COM is located in front of the ankle joint, plantar flexing torque is continuously required to prevent the body from falling forward [2]. However, the passive torque by itself is not sufficient to ensure this required plantar flexing torque [3–5]. Therefore, an additional active torque, regulated by the central nervous system (CNS) and produced by the plantar flexors, is needed [3–9]. In a recent work [1], based on experiment data, a model of “PID plus a delay” for the central nervous system is K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 58–66, 2007. © Springer-Verlag Berlin Heidelberg 2007
Robust Stability of Human Balance Keeping
59
proposed and it is found that (1) Balance-keeping ability can be analyzed based on body sway and the body sway is significantly increased in a balance-deficient group compared to a normal group. The elderly, who are more prone to falls, exhibit more extensive body sway, suggesting poorer balance-keeping ability; (2) To quantify the influence of aging, a model of PID plus a delay is used and it is observed that aging causes the value of kd in the PID controller to decrease; and (3) the clinical importance of kp and ki is still unclear and further research is necessary. This paper extends this work to study robust stability of a closed-loop system model of human balancing keeping. The model consists of an “inverted pendulum”, a simplified and commonly used “plant model” for the study of human upright standing [2-9], and a PID controller plus a delay for the central nervous system [1]. The details of the closed-loop system model and the robust stability analysis are presented in Section 2 and Section 3 respectively.
2 Model of Human Balance Keeping Figure 1 is a simplified structural model of static upright standing. Where m1 is the mass of the two legs, m2 is the mass of the pelvis, and m3 is the mass of the upper trunk. Experimental results show that the lumbosacral joint moves opposite to the direction of the ankle joint, suggesting that the trunk is kept perpendicular to the horizontal. The inverted pendulum provides a simplified model of upright standing in humans since, like the inverted pendulum, humans need feedback to maintain balance. Similar to control systems for engineering systems, balance-keeping control consists of sensors, actuators, and a controller. If the central nervous system is modeled as a PID controller, then the closed-loop system can be represented by a block-diagram of Figure 2.
left
m3
right
D
θ
l2
C
θ m1 A
m2
l1
B
m1
lg d
d
Fig. 1. A simplified structural model of static upright standing [1]
60
M. Fei, L. Wei, and T. Yang
Sensors+ Neuromuscular System kp + + kd s + + ki/s e- s
Inverted Pendulum Body [2m1lg+(m2+m3)l1]gsin
θ
+ +
2 1 g
2 1
[2m l +(m2+m3)l ]s
τ
2
com
sensory systems
Fig. 2. A block diagram of a closed-loop system for human balancing keeping [1]
It can show that the closed-loop system characteristic equation of Figure 2, the roots of which determine the system stability, is given by:
1 + Gc ( s)G p ( s ) = 0
(1)
Where: Gc ( s ) is the transfer function of the PID controller; and
G p ( s ) is the
transfer function for the plant and the sensing delay.
Gc ( s ) = k p +
ki + kd s s
(2)
G p ( s ) = G p1 ( s )G p 2 ( s ) =
(3)
1 e −τ s αs −β 2
Where
α = 2m1lg2 + (m2 + m3 )l12 , β = [2m1lg + (m2 + m3 )l1 ]g (sin θ ≈ θ ) Using the data in [1] gives:
k p = 519; ki = 3.0; kd = 72.3; τ = 0.11sec; α = 66.0; β = 107.0 The system step response in shown in Figure 3, and as expected it is stable. Step Response kd=72.3 2.5 2 1.5 1 0.5 0
0
2
4
6
8
10
Fig. 3. The closed-loop system step response (kd =72.3)
Robust Stability of Human Balance Keeping
61
3 Robust Stability of Human Balance Keeping 3.1 Heads Fundamental: Stability Range of kd For the model presented in Section 2, a general question people likely to ask is: for people have the same body parameters α and β, what are ranges of the central nervous system parameters kp , ki , kd and τ which can keep the body balancing stable. This is a robust stability problem in feedback control theory. A very useful theory for the application here is the “boundary crossing” theory [10]. It says that if the system characteristic equation (1) corresponding to a set of system nominal parameters is stable, varying a parameter or a group of parameters continuously from their nominal values leads to an unstable system, then in such a process, ± jω must be the roots of the characteristic equation for a particular set of changed parameters. Intuitively, when a system changes from stable, when all roots of the characteristic equation are at the left-hand of the s-plane, to unstable, when some or all roots of the characteristic equation are at the right-hand of the s-plane, the root locus must cross the boundary of imaginary axis in the s-plane. In practice, one can put s = jω in Equation (1) and find solutions for values of system parameters. These values are critical values of system parameters and beyond these values, system becomes unstable. To illustrate this, we first assume that the values of kp , ki and τ are fixed and we try to find a critical value of kd . Substituting jω into (2) and (3) gives: we first assume that the values of kp , ki and τ are fixed and put we try to find a critical value of kd .
ki + jk d ω jω k k = k p − j i + jk d ω = k p + j ( k d ω − i )
s = jω , then
Gc ( jω ) = k p +
ω
(4)
ω
and
G p ( jω ) = γ (ω )e − jτω = γ (ω ) cos(τω ) − jγ (ω ) sin(τω )
(5)
Where
γ (ω ) = G p1 ( jω ) =
1
(6)
− αω 2 − β
From (4) and (5) it gives:
GOL ( jω ) = Gc ( jω )G p ( jω ) = [k pγ (ω ) cos(τω ) + γ (ω ) sin(τω )( kd ω − j[γ (ω ) cos(τω )(kd ω −
ki
ω
ki
ω
)] +
) − k pγ (ω ) sin(τω )]
(7)
62
M. Fei, L. Wei, and T. Yang
For 1 + GOL ( jω ) = 0 , the real part and the imaginary part of (7) must satisfy:
A(ω ) = k p γ (ω ) cos(τω ) + γ (ω ) sin(τω )(k d ω −
ki
ω
) = −1
(8)
And
B(ω ) = γ (ω ) cos(τω )(k d ω −
ki
ω
) − k p γ (ω ) sin(τω ) = 0
(9)
From (9):
cos(τω )(k d ω −
ki
ω
) = k p sin(τω )
(10)
Now for a value of ω*, from (10) one can find a value of kd* (kp , ki and τ are fixed known values). Substituting this kd* into (8), if the equation holds, then this kd* is the critical value of kd critical. If equation (8) fails, then try another ω and repeat this process. A computer program can search kd critical over a range of ω. In this example, Step Response kd=59.1 3 2 1 0 -1
0
2
4
6
8
10
Fig. 4. The closed-loop system step response (kd= kd critical =59.1) Step Response kd=50 4
2
0
-2
0
2
4
6
8
10
Fig. 5. The closed-loop system step response (kd= 50)
Robust Stability of Human Balance Keeping
63
we get kd critical = 59.1 and for this critical value ω = 2.56 rad/s. The system step response when kd = kd critical = 59.1 is shown in Figure 4. It also shows that the oscillation frequency is equal to 2.56/2π = 0.41 Hz. Any kd < kd critical will lead to an unstable system. Figure 5 shows an unstable step response when kd = 50. 3.2 Stability with a Phase Margin or a Gain Margin
In practice, one often wants to find a critical value of kd = kd cri_pahse when the system is stable and has a specified stability margin. A phase margin φ is commonly used. In term, we replace G p (s) in (3) with G * p ( s ) = G p ( s )e −ϕ , this leads to:
G * p ( jω ) = (cos ϕ − j sin ϕ )[γ (ω ) cos(τω ) − jγ (ω ) sin(τω )] = [cos ϕγ (ω ) cos(τω ) − sin ϕγ (ω )sin(τω )] + j[ − sin ϕγ (ω ) cos(τω ) − cos ϕγ (ω ) sin(τω )] = C (ω ) + jD(ω )
(11)
Follow the same procedure before, we have: G *OL ( jω ) = [k p C (ω ) − D(ω )(kd ω −
ki
ω
)] + j[C (ω )(kd ω −
A *(ω ) = k p C (ω ) − D(ω )(kd ω − B *(ω ) = C (ω )(kd ω −
ki
ω
ki
ω
ki
ω
) + k p D (ω )]
) = −1
(12)
) + k p D(ω ) = 0
Using the above equations and the same method as before, for a phase margin of φ = 20 degree, we find that kd cri_phase = 132.26 and ω =5.35 rad/s for this kd cri_phase value. Similarly, if it is required to have a gain stability margin of M, one only needs to change (6) to
γ (ω ) = G p1 ( jω ) =
M −αω 2 − β
(13)
Then follow the procedures given in Section 3.1. For example, for a gain margin of M = 1.2, we find that kd cri_gain = 59.43 and ω = 2.87 rad/s for this kd cri_gain value. 3.3 Critical Values of kd for Different Sensing Delays
Due to different characteristics of central nervous system for different people, the value of τ is not the same for different people. Even for the same person, the value of τ is also not fixed under different physical and mental conditions. Following the same
64
M. Fei, L. Wei, and T. Yang 200
Kd
150 100 50 0
τ 0
0.05
0.1
0.15
0.2
0.25
Fig. 6. Critical values of kd for different sensing delay of τ
procedures as before, kd critical is calculated for different sensing delay of τ, and the result is plotted in Figure 6. 3.4 Critical Values of kp and kd for Different Sensing Delays
From equation (10), we have:
kd = ( k p tan(τω ) +
ki
ω
)/ω
(14)
Then, substitute them into equation (8) is obtained
k pγ (ω ) cos(τω ) + γ (ω ) sin(τω )k p tan(τω ) = −1
(15)
That is:
k pγ (ω ) cos(τω )
= −1
So the kp can be substituted as:
kp = −
cos(τω ) γ (ω )
(16)
Substitute equation (15) into equation (13) is obtained
sin(τω ) ki ⎧ ⎪ kd = (− γ (ω ) + ω ) / ω ⎪ ⎨ ⎪ k = − cos(τω ) ⎪⎩ p γ (ω )
(17)
Use the same data ki = 3.0; α = 66.0; β = 107.0 , and assume the sensing delay τ from 0.04sec to 0.20sec. Then we get
Robust Stability of Human Balance Keeping
65
sensing delays from 0.04sec to 0.20sec
0.2
τ(s)
0.15
0.1
0.05
0 2.5
4
x 10
2000 2
1.5
1
1000 0.5
kp(Nm/rad)
0 kd(Nm*s/rad)
(A)
0.15
0.15
0.1 0.05 0
2.5
0.1 0.05
0
1 2 kp(Nm/rad)
(B)
3 4
x 10
0
x 10
2 kp(Nm/rad)
0.2
τ(s)
τ(s)
4
0.2
1.5 1 0.5
0
1000 kd(Nm*s/rad)
(C)
2000
0
0
1000 kd(Nm*s/rad)
2000
(D)
Fig. 7. Critical values of kp and kd for different sensing delays (A): 3D visualization, (B):Kp–τ projection, (C):Kd–τ projection, and (D):Kp–Kd projection
4 Conclusions In this paper, we study the robust stability of the balance keeping control model that can be used for analyzing the balance keeping ability in humans. Experimental results indicate that the central nervous system parameters kd decreases with aging. It is a balance and stability related parameter. Some newly developed methods to find successfully the ranges of kd for which the closed loop system is stable, and with a given stability phase or gain margin, are presented, when the body parameters and the other central nervous system parameters are fixed. All the research findings of robust stability of human balance keeping have crucial theoretical and practicable values. Acknowledgments. This work was supported by Program for New Century Excellent Talents in University under grant NCET-04-0433, Research Project for Doctoral Disciplines in University under grant 20040280017, Key Project of Science &
66
M. Fei, L. Wei, and T. Yang
Technology Commission of Shanghai Municipality under Grant 061107031 and 061111008, Sunlight Plan Following Project of Shanghai Municipal Education Commission, and Shanghai Leading Academic Disciplines under grant T0103.
References 1. Hidenori, K., Jiang, Y.: A PID Model of Human Balance Keeping. IEEE Control Systems Magazine, 18–23 (December 2006) 2. Masani, K., Vette, A.H., Popovic, M.R.: Controlling balance during quiet standing: Proportional and derivative controller generates preceding motor command to body sway position observed in experiments. Gait & Posture 23, 164–172 (2006) 3. Morasso, P.G., Schieppati, M.: Can muscle stiffness alone stabilize upright standing? J. Neurophysiol. 83, 1622–1626 (1999) 4. Morasso, P.G., Sanguineti, V.: Ankle muscle stiffness alone cannot stabilize balance during quiet standing. J. Neurophysiol. 88, 2157–2162 (2002) 5. Loram, I.D., Lakie, M.: Direct measurement of human ankle stiffness during quiet standing: the intrinsic mechanical stiffness is insufficient for stability. J. Physiol. 545, 1041–1053 (2002) 6. Gatev, P., Thomas, S., Kepple, T., Halett, M.: Feedforward ankle strategy of balance during quiet stance in adults. J. Physiol. 514, 915–928 (1999) 7. Loram, I.D., Lakie, M.: Human balancing of an inverted pendulum: position control by small, ballistic-like, throw and catch movements. J. Physiol. 540, 1111–1124 (2002) 8. Lakie, M., Caplan, N., Loram, I.D.: Human balancing of an inverted pendulum with a compliant linkage: neural control by anticipatory intermittent bias. J. Physiol. 551, 357– 370 (2003) 9. Masani, K., Popovic, M.R., Nakazawa, K., Kouzaki, M., Nozaki, D.: Importance of body sway velocity information in controlling ankle extensor activities during quiet stance. J. Neurophysiol. 90, 3774–3782 (2003) 10. Ackermann, J.: Robust Control – system with uncertain physical parameters. Springer, Heidelberg (1993)
Modelling Pervasive Environments Using Bespoke and Commercial Game-Based Simulators Marc Davies1, Vic Callaghan1, and Liping Shen2 1 Digital Lifestyles Centre, Essex University, UK eLearning Lab, Shanghai Jiao Tong University, China
[email protected],
[email protected],
[email protected] 2
Abstract. This paper details an ongoing investigation, linking intelligent buildings and computer game technology. Intelligent buildings are containers for life-sized organisms (people, pets etc), sustaining ecologies that evolve and interact in a symbiotic way with the technological infrastructure, which includes intelligent agents. This work explores how computer games software can be used to create a simulation tool for the development of new ubiquitous agent programs and environments. We report on our experiences of adapting a retail package, (Electronic Arts’ Sims) and building our own bespoke simulator. We use a table to compare and summarise the strengths and weaknesses of each approach; the general conclusion being that there are significant benefits to be gained from adapting commercial games packages for use as professional simulation tools. Finally, we conclude the paper by describing our plans to apply this work to the development of a mixed-reality eLearning application.
1 Introduction 1.1 Why Use a Simulator over a Real-World Test-Bed? Technology is assuming an ever increasing role within the environments in which we live and work [4]. Developing intelligent embedded-agents and pervasive computing environments can be a costly process. When testing agents in a real-world environment, researchers can only perform experiments in real-time (e.g. with each iteration taking days) and often have little control over natural elements, (e.g. sunlight). The reasons for this are that agents model and learn behaviours by monitoring people, as they go about their everyday activities [1]. This makes it practically impossible to repeat tests under identical conditions. When developing a new pervasive agent, current researchers inevitably need a physical device, and/or environment to test their program. Building a simulator from scratch is a formidable task, requiring a considerable programming effort to create realistic graphical environments and user-behaviours (especially interactive behaviours). We hypothesized that a Pervasive Environment Simulator, (PES) based on computer game technology, could take advantage of the existing high-detail graphics physics and some artificial intelligence techniques used in modern games, to provide a high-quality model of a real-world test-bed. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 67–77, 2007. © Springer-Verlag Berlin Heidelberg 2007
68
M. Davies, V. Callaghan, and L. Shen
Additionally we hypothesized that creating a PES by modifying a computer game would be easier and produce a higher quality simulation than a bespoke system. 1.2 Existing Projects Computer games and related technologies are already being used for researching nonentertainment applications by various private and public organizations around the world. The Entertainment Technology Center at Carnegie Mellon University has run a project using their ‘Alice’ graphics engine, to teach artists how to program virtual worlds [11]; In another example, the U.S. ‘Darwars’ training program uses a simulated environment based on the Unreal Tournament game engine, to produce foreign landscapes where soldiers can interact with virtual inhabitants allowing them to acquire language skills [18]. Computer Science skills are taught by allowing participants to design their own computer games, matching the expectations of younger generations, who are unimpressed by simple visualizations [13]. Bespoke simulators are already being used to model household environments. The ‘MavHome’ Project [5] consisted of an intelligent living environment, with certain components, (e.g. window blinds) controllable from by a simulator created for the project [5]. Simulators are also being used to augment telemetry from sensors in realworld locations, improving the reliability of readings and allowing other operations that would otherwise be impossible. For example a security system that augments a camera-feed to allow an observer to ‘see-through’ solid objects, by rendering the hidden space in the simulator [14]. Additionally, simulators can provide training tools for complex tools and vehicles, e.g. flying aircraft, manoeuvring underwater robots etc [3]. Aircraft simulators have been converted into several popular computer game titles, most notably the ‘Microsoft Flight Simulator’ series, useful to prospective pilots learning to fly [12] and of high entertainment value to gamers around the world. This provides encouragement for this project. As a simulator can be converted into a commercial game, the process should be reversible, with games modified into simulators for research and development applications? 1.3 Bespoke Vs. Game-Based Simulators As part of our research two simulators were developed; a) A completely bespoke system, written using the Java programming language. b) A simulator created by modifying an off-the-shelf copy of a popular computer game. Both programs modelled the iDorm2, at the University of Essex. A full sized two-bedroom apartment, constructed to be a pervasive computing test-bed, featuring hollow walls and ceilings fitted with a myriad of embedded-computer based technology [2] [8].
Figs. 1 and 4. Views of the University of Essex iDorm test-bed
Modelling Pervasive Environments Using Bespoke
69
2 Simulator Design 2.1 Bespoke PES Design Rationale. A two-dimensional PES, (2D-PES), aimed to provide a benchmark, for comparison with the game-based PES. Additionally by creating a bespoke system we were able to further our understanding of the program architecture required. This knowledge was used when implementing the game-based 3D-PES, letting us identify the components of the original program requiring modification. It was important to identify how pervasive devices and features could be simulated, later assess which could be incorporated into a game-based PES. Simulator Architecture. In more technical terms the 2D-PES consisted of four Java programs, linked by socket communication; a) Two-programs operating the actual simulator. b) A central server program. c) A third-party agent program, also written using Java. Threads were used to handle time-delays and loops in program code. The central server regulated the exchange of all data between the simulator and the third-party agent program. Readings for sensors in the environment were transmitted Thread Simulator Human Interface
Thread
Simulator, Controllers & Rendering Components
Thread C l 3rd Party Agent Program
Server Program
Fig. 5. Bespoke 2D-PES Architecture
Going to selected room / device B
Going to selected room / device
Hunger > 70%
Hunger = 1%
Hygiene = 100% E
A Hygiene < 30%
Tiredness > 70%
C
Tiredness = 1% Boredom > 70%
Boredom = 1% D
Going to selected room / device
Going to selected room / device
Fig. 6. The Bespoke 2D-PES
Fig. 7. Avatar A.I. architecture FSM
70
M. Davies, V. Callaghan, and L. Shen Table 1. 2D-PES Avatar AI states State A B C D E
Actor Action All needs outside thresholds. No action required from the A.I. Actor continues using current device or following user instructions. Hunger need > 70%. Actor moves to a randomly selected device which can be used to reduce hunger, (refrigerator, oven). Tiredness need > 70%. Actor moves to a randomly selected device which can be used to reduce tiredness, (sofa, chair, bed). Boredom need > 70%. Actor moves to a randomly selected device which can be used to reduce boredom, (phone, television). Hygiene need < 30%. Actor moves to a randomly selected device which can be used to raise its hygiene level, (shower, washing machine).
from the simulator to the agent program on each thread cycle. A list of settings was also received by the simulator, from the agent program, containing states pervasive objects needed setting to by the simulator. The agent determined settings by analysing sensor data sent on the previous thread cycle. Simulation Environment. The simulated iDorm environment, (Fig. 6) contained an avatar inhabitant, (represented by a magenta circle), and numerous static, (grey rectangles) and pervasive objects, (orange/green rectangles). As the avatar interacted with an object, the device changed colour showing the state of an attached sensor, (interaction amounted to switching appliances on/off). If an object was green the attached sensor equalled 1 (high). If orange, the sensor equalled 0 (low). The avatar interacted with objects in response to the level of four ‘needs’ attributes, (hunger, tiredness, boredom, hygiene), which influenced activities performed (Fig. 7, Table 1). 2.2 Commercial Game-Based PES Design Rationale. A three-dimensional PES, (3D-PES), created by modifying an off-the-shelf copy of the Sims computer game, (Maxis/EA Games, 2000). Apart from the 3D graphics and supporting tools, a particularly attractive feature in the Sims was the fairly realistic behaviour of inhabitants. Simulating the behaviour of people is a considerable challenge, the game avatar artificial intelligence, (AI) uses a “competing weighted behaviour” approach. One possible variation of this technique used for the avatar AI in the 2D-PES, is shown in Fig. 7 above. Using a series of weights to create an artificial bias in decision making, an artificial personality representative of a person of any gender or age can be created for an avatar, and with additional research characteristics such as psychological/mental illness, could also be modelled. A PES could create a safety barrier, allowing agents/environments to be tested on a target audience, without placing real people at risk. Possibilities also include social research applications, (e.g. observing people with conflicting personalities living together).
Modelling Pervasive Environments Using Bespoke
71
Simulator Architecture. In more technical terms this system consisted of a five room environment, again modelled on the iDorm2. Each object and person contained in the environment was controlled by at least one thread, placed on a stack and run in sequence by the game. Object threads were used to regulate the animation displayed by the game virtual machine [6]. Most objects could only access their own threads, so for example a television couldn’t access information contained in a thread for a lamp. To create a Sims-based PES, the original program code had to be modified so objects could access threads for other devices and information contained within. For this stage of the project, the most efficient way to achieve this was to program a single Sims object to act as a ‘remote-control’ for other pervasive devices. The ‘Dumbold Voting Machine’ [7] an add-on device available online, was modified to act as a remote-interface, and once re-programmed stored the current state of each pervasive object in the environment to memory. Agent code ran from the voting machine thread prompting state changes to other objects as required. Agents determined when to make changes using sensor values, updated on each thread cycle. Menus from the voting machine were re-programmed providing a manual interface to force actions performed by a Sims avatar, (Fig. 9).
Figs. 8 and 9. The Sims-Based 3D-PES & A Modified Sims Object Interface
Simulation Environment. The Sims game is a simulated domestic environment, designed to model one or several people living their daily lives. The original program allows a player to design, build and furnish a house to their own specifications, using numerous pre-programmed materials and objects available in the game libraries. Using a game to create a PES introduced several advanced features that add greater realism to the environment, but were too minor to allocate resources for programming into a bespoke system; including, avatars who randomly visit the virtual home.
3 PES Comparison Creating two different simulators allowed a comparative study. The results intending to expose issues involved in modifying games, and relative advantages/disadvantages compared to a bespoke system. We discussed the performance of the simulators under four criteria we regarded as important for pervasive computing applications;
72
M. Davies, V. Callaghan, and L. Shen
• • • •
Environment Representation. This criterion covered how realistic the simulator graphics used, made the environment and its objects appear to be. Avatar Representation. How life-like were the movements and behaviours of the avatar inhabiting the virtual environment? How advanced was the AI used? Where actions purely reactive or could the avatar make decisions? Agent Representation. Perhaps the most important feature of a PES, allowing observation of environment changes, to evaluate how test-agents perform. How was influence of agents operating in the environment shown? Programmability. How easy was it to program? Did the platform used contain restrictions, which caused problems during simulator development?
3.1 Comparison Results Table Table 2. The strengths and weaknesses for both simulators under the chosen criteria Bespoke PES
Game-Based PES Environment Representation The speed-of-time could be Realistic 3D Sims graphics and animations. The speed-of-time could be changed. changed. The simulator could be paused. The simulator could be paused. Dawn and dusk are represented Objects were added / changed in the with natural-style changes to light environment quickly using the Sims game levels. features. Two-dimensional Java graphics Lights and windows both illuminated made some features difficult to rooms naturally, with stronger illumination identify. in areas closer to the light source. Added/changing objects in the While dawn and dusk were represented environment required modifying with changes to light levels in the code. environment, the transitions occurred too Lights, radiators and windows quickly to appear natural and sometimes affected the environment evenly caused sensors to miss events. rather than having weaker influence in proportion to the distance from the source object. Avatar Representation Four A.I. ‘needs’ variables were Eight A.I ‘needs’ variables plus social used. values. Avatars interacted with objects by Thought bubbles showed an avatar’s standing next to or sitting on them. desires. No visitors were available. Animations show avatar interaction with Avatars didn’t show any emotion. objects. Inhabitants couldn’t leave the Avatars can visit and/or leave the environment. environment. Avatars didn’t perform event Avatars performed event chains, (e.g. to eat chains for tasks, instead only using they walked to the fridge, cooked a meal at a single object. the stove, then ate at the table). Avatars were messy, leaving rubbish around in unnatural locations and ignoring rotting food.
Modelling Pervasive Environments Using Bespoke
73
Table 2. (continued) Agent Representation Colour changes to objects made Animations make state changes easily state changes easily visible. visible. Agent activities were recorded to Multiple states were available for some file. objects. Agents were loaded from a 3rd Agents could make modifications to the party Java program file, with the environment faster than for the bespoke server blocking any direct system, since the code was part of the simulator contact. simulator program. Only on/off states were possible. Objects could randomly breakdown. Sensors were invisible in the A ‘control’ object in the Sims-PES allowed simulator. agents to be tested by forced avatar The time required for messages interaction. from the simulator and the agent SimAntics agent code had to be program to be passed across the programmed directly into the simulator. server created a short lag-time, Special sections were created during rebefore the environment reacted to programming to facilitate this. It was not possible to create output files. any required changes. Sensors couldn’t be seen in the simulator. Programmability No restricted code. Changes to object code could be made Java is a multi-purpose OO quickly by re-structuring behaviour trees. Edits occur in real-time with the game language with many libraries and running. facilities. A popular programming language Little code other than Sims objects was with programming guides and IDE editable. SimAntics is a visual programming tools. Since everything had to be written language designed for the Sims game, so is from scratch programming took a very specific in which operations could be long time. performed.
3.2 Comparison Environment Representation. The game-based simulator was created by modifying an existing infrastructure, adding and removing code-fragments where necessary. By using the Sims as a template the game-based PES objects looked realistic and could randomly breakdown where appropriate. Avatars had the appearance of real-people, interacting with the environment using realistic motions. The bespoke Java simulator visualized the same objects, but the devices were only shown as rectangles. This could cause confusion in complex demonstrations, if an observer had no knowledge of the layout in the iDorm environment. The 3D simulation of the Sims-PES also included features lost in the bespoke 2D-PES; For example objects mounted on walls. Avatar Representation. Both simulators used ‘needs’ attributes to create an artificial intelligence for avatars inhabiting the environment. The bespoke system was simpler using just four variables to produce a reactive A.I, whilst the Sims game used eight ‘needs’ attributes, plus additional variables allowing some deliberation by avatars in
74
M. Davies, V. Callaghan, and L. Shen
social situations. Small thought-bubbles appear above the head of a Sims avatar showing that character’s mood and/or desires. In a PES this feature can be used to observe how an agent/environment design affects the emotional state of avatars, possibly revealing areas requiring improvement. Agent Representation. The Sims uses pre-programmed animations for many of its objects, to visually display any state changes, (e.g. switching a television on/off). Most of these animations were still present in the 3D-PES, although some were modified or removed for operational purposes. As explained earlier, the bespoke simulator also used animation on a much simpler scale, limited to two animations representing the on/off states of pervasive objects. By re-programming some of the original SimAntics code, some 3D-PES objects could visualise several new states and give different responses from normal to certain scenarios, allowing agents to remotely control devices. The 2D-PES system created output files recording the activities of the avatar and agents interacting with objects in the environment during testing. It was not possible to include this feature in the 3D-PES due to the restrictions and limitations of the Sims code. The simulator could use a feature of the Sims game, which saved/re-loaded the current environment, but no data was recorded to file. Programmability. Sims object code was programmed using the visual language SimAntics, running from the Edith Virtual Machine [6] [16]. A new piece of code was added to the original game, where SimAntics agent programs could be attached directly to the simulator. Components were added to the bespoke 2D-PES ensuring agent programs could be loaded into the simulator from an external Java file, written by a developer. The Sims game required re-programming before objects could be controlled by an agent. This wasn’t a problem for the bespoke PES. 3.3 Comparison Summary Criteria such as ‘Computational Performance’ were intentionally omitted from this evaluation. The aim was to accurately simulate a real-world environment; therefore at this stage of the research we were not concerned with simulator efficiency. Features such as ambient light changes representing dawn and dusk and the needs attributes for the avatar AI were programmed into the bespoke simulator, while the Sims-based simulator had many of these minor-features included as part of the original game. The Sims uses a more advanced AI system for avatars than the bespoke PES. A single inhabitant was used to test both PES environments, partly to reduce the programming requirements for the bespoke system, but mostly to allow each program to be tested using the same set of rules, allowing a more accurate comparison. Neither PES actually displayed sensors in their simulation. This was a decision made during the development stage to reduce the complexity of the simulated environments for this stage of the project. A next-generation PES will include sensors in the simulation, allowing several to be attached to a single object.
Modelling Pervasive Environments Using Bespoke
75
4 The Next Step Our immediate aims are to extend the work to simulate the smart-classrooms (see Figs.10 and 11) used in the Open eLearning Platform in Shanghai, which supports more than 15000 learners. The platform delivers fully interactive lectures to PCs, laptops, PDA, IPTV and mobile phones from high-tech teaching rooms known as smart classrooms. The Essex iDorm includes a study-room, which is typical of the domestic space where remote learning from Shanghai can be delivered. Using the modified simulator, we aim to investigate how a virtual classroom might be best created to give teachers and distributed learners a sense of sharing the same space.
Figs. 10 and 11. The SJTU Smart Classroom & the Remote Classroom
With the simulator, and emotion sensing developed in another project [9], we will also investigate how learner’s emotion evolves and affects the activities in the classroom [15]. This is part of a broader research strategy seeking to link a nextgeneration game-based PES with a real-world test-bed, creating a mixed-reality environment. Some advantages of this set-up would be allowing virtual avatars to interact with real-world objects and vice-versa. Additionally, using virtual sensors to augment real-world counterparts should allow; a) more complex environments to be simulated; b) innovative devices and functions to be evaluated ahead of realisation, (particularly speculative devices that may not physically exist); c) the experience of playing games made more exciting by connecting it to the physical environment. We are also looking at how PES might be situated in globally distributed simulations such as Second Life [10] and Project Darkstar [17].
5 Conclusions In this paper we described the first stage of an ongoing investigation into modelling pervasive environments with computer game technology. Primarily this stage focused on whether it was possible to take advantage of high quality graphics, and pseudorealistic avatar behaviours, provided by computer games, to create a pervasive environment simulator, (PES). We discovered a PES had numerous advantages over a real-world test-bed, most notably; a) the speed-of-time was variable, allowing testing to be performed faster; b) event and sensor readings within the environment could be recorded, allowing experiments to be replayed in full or partially; c) environmental
76
M. Davies, V. Callaghan, and L. Shen
parameters in the virtual home could be specified, allowing agent programs to be tested under identical conditions; d) it has proven to be less expensive, in terms of cost, floor-space and maintenance requirements than using a real environment. We also noted that: a) several researchers could simultaneously use their own copy of a PES, customized to their experimental requirements; b) a PES would be easily portable and presentable, unlike a full-scale real-world environment; and c) a PES eliminates the problem for computer scientists who develop pervasive agents but lack skills required to build real-world devices to test the code. To test our hypotheses we assessed whether modifying a computer game, had greater benefits than creating a bespoke PES or using a real-world environment to test pervasive agent programs. On comparison, (Table 2) for all but one of four criteria, a game-based PES would be more beneficial for pervasive computing research, than a bespoke counterpart. Our hypotheses were proven after modifying a retail-copy of the Sims computer game. A PES boasting high-level graphics and artificial intelligence was created in the same timeframe as a bespoke Java-based simulator, using simple two-dimensional visualisations and more basic A.I. Since the original game code was written by a team of developers with their own ideas, a single person could modify it into a PES but still maintain the ideals of the original programmers, preventing any personal bias from being introduced into the new simulator. In more scientific terms, the more natural behaviours of the Sims avatar and devices are a significant advantage to environments incorporating learning agents. Game A.I. can also be programmed with weights to let an avatar mimic a person of any age or gender. Such features are reflective of the real-world, so must be included in a simulated test-bed useful for developing pervasive agents and environments. Finally, our immediate aims are to extend this simulation work to the creation of mixed reality distributed shared spaces, involving a richer combination of simulated people and appliance based agents. Initially, we plan to create a virtual classroom, based on our well proven Open eLearning and iDorm platforms. Our longer terms aims are to extend this simulation so that it can be used, not only to develop intelligent environments but to study the symbiotic relationships that evolve in such spaces. We hope that this paper will have shown that game technology offers a means to provide an environment that has the potential to support such exciting research. Acknowledgments. We are pleased to acknowledge the support of Electronics Arts who supplied the Edith editor for the Sims. Thanks are also due to Bernard Horan, Sun Microsystems, Michael Gardner, Chimera Socio-Technical Research Institute and Ruimin Shen, Shanghai Jiaotong University, who have greatly motivated this work by describing longer term visions, which we hope to address in future phases.
References 1. Callaghan, V., Clark, G., Colley, M., Hagras, H., Chin, J.S.Y., Doctor, F.: Intelligent Inhabited Environments. BT Technology Journal 22(3) (2004) 2. Callaghan, V., Woods, J., Fitz, S., Dennis, T., Hagras, H., Colley, M., Henning, I.: The Essex iDorm: A Testbed for Exploring Intelligent Energy Usage Technologies in the Home. In: Proceeding of the 3rd International Conference on Intelligent Green and Energy Efficient Building & New Technologies, International Convention Centre, Beijing China, pp. 26–28 (2007)
Modelling Pervasive Environments Using Bespoke
77
3. Chernett, P., Callaghan, V., Colley, M.J., Duffy, N.D., Edwards, I., Herd, J.T., Hunter, J., Lane, D.M., Penrose, J., Randall, G.W., Smith, D., Smith, J., Standeven, J., Whittaker, G.A., Wood, A.: Mixing Simulated and Real Subsystems for Subsea Robot Development. In: IEEE International Conference Oceans 98, Nice, France (1998) 4. Clarke, G., Callaghan, V.: Ubiquitous Computing Informatization, Urban Structures and Density. Built Environment Journal 33(2) (2007) 5. Cook, D.J., Youngblood, M., Heierman III, E.O., Gopalratnam, K., Rao, S., Litvin, A., Khawaja, F.: MavHome: An Agent-Based Smart Home. In: Mattern, F., Naghshineh, M. (eds.) Pervasive computing: first international conference, Pervasive 2002, Zurich, Switzerland, August 26-28, 2002, pp. 521–524. Springer, Heidelberg (2002) 6. Forbus, K.D., Wright, W.: Some notes on programming objects in The Sims™. Northwestern University (2001) 7. Hopkins, D.: Dumbold Voting Machine, Retrieved (2006), http://www.donhopkins.com/ drupal 8. IIEG: iDorm2, Retrieved (March 18th, 2007), http://iieg.essex.ac.uk/idorm2/index.htm 9. Leon, E., Clarke, G., Callaghan, V., Sepulveda, F.: A user-independent real-time emotion recognition system for software agents in domestic environments. Engineering Applications of Artificial Intelligence 20(3), 337–345 (2007) 10. Linden Lab.: Second Life, Retrieved (April 5th, 2007) http://www.secondlife.com 11. Marinelli, D., Pausch, R., LaForce, J.: Entertainment Technology Center. In: Catarci, T., Little Thomas, D.C. (eds.) IEEE Multimedia, Multimedia At Work, vol. 7(4), pp. 78–81 (2000) 12. Microsoft Game Studios: Microsoft Flight Simulator 2004: A Century of Flight, CD (2004) 13. Overmars, M.: Teaching Computer Science through Game Design. IEEE Computer 37(4), 81–83 (2004) 14. Ou, S., Karuppiah, D.R., Fagg, A.H., Riseman, E.: An Augmented Virtual Reality Interface for Monitoring of Smart Spaces. In: Second Annual Conference on Pervasive Computing and Communications, Orlando, Florida ( March 14-17th, 2004) 15. Shen, L., Leon, E., Callaghan, V., Shen, R.: Exploratory Research on an Affective eLearning Model. In: International Workshop on Blended Learning 2007 (WBL 07), University of Edinburgh, Scotland (August 15-17, 2007) 16. Barrett III, P.J.: Sims Zone, The Interview: Retrieved (March 15th, 2007) http:// www.thesimszone.co.uk/interviews/index.php?ID=1 17. Sun Microsystems: Project Darkstar, Retrieved (April 20th, 2007) https://games-darkstar. dev.java.net 18. Voth, D.: Gaming technology helps troops learn language. IEEE Intelligent Systems 19(5), 4–6 (2004)
The Research of Artificial Animal’s Behavior Memory Based on Cognition Xiaojuan Ban1, Shurong Ning1, Jing Shi1, and Dongmei Ai2 1
,
University of Science and Technology Beijing Information Engineering School, Postfach 100083, Xueyuan Road No.30, Hai Dian District, Beijing
[email protected] 2 University of Science and Technology Beijing, Application Science School
,
Abstract. In some artificial systems, performing the realistic perception for actors which will handle both the processes to simulate the sensing organs and identify, will spend most computational time. Unfortunately, this matter even ruins the result of decision based on perception. In order to reduce the computation cost from a systemic view and optimize the performance of system, a brand-new perceptual focuser was proposed. The perceptual focuser is the core of artificial animal sensation system, which provides the external environment and internal condition information for the behavior decision system. Artificial animal behavior memory formation is also the result of the focuser’s analysis focusing. This paper proposes and analyzes two kinds of memory-form algorithm. The quadratic method to food spot which gains has made the variance computation, rejects noise spots which are apart from other normal spots, so that it obtains the expected food distribution position. In view of the quadratic method’s insufficiency, the improvement mean-cluster algorithm profits from the data mining theory, which makes the noise-spotrejection accurately. Select the algorithm according to the different situation, which can make the focuser achieve the validity of the realization of artificial animal food memory.
1 Introduction The cognition, one important part in a memory process, which is also called to recognize again, refers to the apperceived thing in currently to reappear could still be known. The modern cognition psychology thought the cognition process is the information accepting, coding, storing, extracting and using process. Artificial animal model based on the cognition straddles the boundary between the fields of computer graphics and artificial life, and carry on the simulation in the hypothesized environment to it. The important manifestation of cognition in the model is the behavior memory, which is mainly formed by the focuser. This article takes the forage task memory K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 78–87, 2007. © Springer-Verlag Berlin Heidelberg 2007
The Research of Artificial Animal’s Behavior Memory Based on Cognition
79
as the example, and shows the realization process of artificial fish’s behavior memory. The focuser is the core of artificial animal sensation system, namely depends upon a group of online virtual sensors to provide the related information, and has included the sensation attention mechanism, which enables it to control the focuser working under the demand of current task, and filter the useless sensation information to the current behavior demand. Regarding to the artificial fish, the first we must do is to abstract the states which is possibly carried on, and obtain the DFA model describing changes of state. Then focuser abstracts the information and induces into two state primitives, and takes the current state together for the automaton’s input. Then the corresponding output can be obtained, namely the next expected step artificial fish is in. Among them, Hp is the hunger primitive, which describes artificial fish interior states; Hd is the risk factor primitive, which describes artificial fish external environment information. Next time, the focuser will feed back newly produced state primitives, and carry on the revision to the previous step primitive, which realizes the intellectualized focusing process. The DFA model described artificial fish behavior is shown in Fig. 1.
Fig. 1. DFA Model of Artificial Fish Behavior
2 A Realization of Behavior Memory An important characteristic of Animal behavior intellectualization is that it can consider many kinds of behaviors and the environment, then chooses compromise motion. Moreover, animal (including fish) behavior process also is continuous, which cannot make “the behavior blank” appear between two behaviors. In other words, in the process of current behavior, the animal itself already makes the decision of what should be carried on next step, or at least has the tendency and the desire to carry on this behavior. The behavior memory has solved the continuous problem of artificial animal behavior process. After the behavior memory has formed, the artificial animal may act according to the current condition and its own memory, to complete behaviors such as searching for food, mating and so on. 2.1 Description of Data Structure The behavior memory is saved through the chain structure, which is also called the memory chain. After the artificial animal takes the food each time successfully, they
80
X. Ban et al.
store food space position information in the memory chain as a node, and after a period of time tm to form a food position memory chain with certain length. Data Structure is shown in Fig.2.
Fig. 2. Structure of Memory Chain
Among them, head is a memory chain indicator, rear is the tail indicator, and mark is the indicator. The memory chain capacity is not infinite, therefore we set Maxlength=30, which is attribute to memory capacity. It will not be the goal to take the food position record, and what we need is to obtain the expectation position of food distribution by computing, and store it into the Head node, which is the expecting memory of forage task. When the artificial fish returns once again to taking food condition, its memory becomes effective, and then the artificial fish will swim towards a more crowded food position in memory, so that they can easily get food again. 2.2 Description of Algorithm Because of the irregularity of food spatial positions, it is quite difficult to find out the central point of distribution precisely. If we now asks the center of mean value or the center of gravity, the answer will be influenced by some points that are far away from most points, which are called noise points. And the expected position of food distribution will be quite different from the actual. The solution is to use the focuser to kick the noise points out of the memory chain. Then calculate the central point using the points set which has been disposed by focuser. Meanwhile, the memory can also be renewed. When the memory chain is full, it will use the FIFO algorithm, which deletes first several points in the memory chain, and simultaneously save the new obtained points to rear of the chain. As the updating points goes each time, kick out noise points by calculating, so it maintains the forage task memory always in most expected condition. The artificial fish takes the food the position as a node to save in the memory chain. And the algorithms are shown as follows. •
Quadratic Variance Algorithm
The variance is popularly said to be the degree of central deviation, for describing undulation size of one batch of data from central position. The bigger variance is, the more unstable data shows. The quadratic variance algorithm is carried on to calculate two times variance of the data.
The Research of Artificial Animal’s Behavior Memory Based on Cognition
81
1 S 12 = [( x1 − x)2 + ( x2 − x) 2 + ... + ( xn − x) 2 ] n
(1)
S x2n = ( xn − x)2
(2)
1 S 22 = [( S x21 − S 12 ) 2 + ( S x22 − S 12 )2 + ... + ( S x2n − S 12 ) 2 ] n
(3)
S2 Among them, 1 is the first variance value which figures out according to the samS2 ple point, xn is square of the difference between the length of vector according to 2 point N’s position (or the angle with coordinates) and the mean value. S 2 expresses 2 (S 2 − S12 ) 2 the variance of second time. Finally point X whose xn is bigger than S 2 will be considered to be noise point and should be deleted first. Asking the variance one time may not be often unsatisfied, because the noise spots influence the result, and some non-noise spots can also be considered as the noise spot to be rejected. Asking the quadratic variance for the second time so that the noise points deviation scope will be enlarged, which can reduce the influence of the noise points. What needs to explain is, this noise reduction method is to find out most irregular points. But when points are distributing in several pieces, in one piece points are very close to each other, this algorithm will almost lose its efficiency.
•
The improved mean-cluster algorithm
In view of the flaw of quadratic variance algorithm, I improve the mean- cluster algorithm, to carry on noise reduction processing. The algorithm description is as follows. 1. Take each initial sample point as a cluster project. 2. Put one project into a class which is closest to it, then calculate center coordi nates of two classes which obtains or loses projects. 3. Do step 2 till projects cannot be redistribution. 4. Give each kind of fish its own respective identification scope, and kick out all noise points that fall outside the scope. 5. Calculate gravity center of points left, and it would be the memory of food position. Although cluster algorithm is quite complex, but it may accurately calculate the expected food distributed center position, which has overcome the flaw of quadratic variance algorithm.
82
X. Ban et al.
3 Experiment Result 3.1 Quadratic Variance Algorithm Take the vectors in the memory chain as a sample, and test the effect of the algorithm. The sample set shows in the following Table 1. By calculating, point 1 6 8 and 18 must be the noise points, and the average vector modulus is 5.875 after kicking them out of the set. Noise points’ distribution shows in Fig. 3. When the sample vector modulus distribution assumes as several crowded spot cloud, the quadratic variance algorithm cannot pick one of those spots clouds out,
、、
Fig. 3. Distribution of Noise points Table 1. Sample set 1 ID 1 2 3 4 5 6 7 8 9 10
Modulus 2.23 6.85 5.56 3.92 8.10 10.58 2.91 9.65 7.56 5.73
ID 11 12 13 14 15 16 17 18 19
Modulus 4.89 6.32 7.67 5.61 7.58 6.54 3.13 2.49 5.75
The Research of Artificial Animal’s Behavior Memory Based on Cognition
83
Table 2. Sample set 2 ID 1 2 3 4 5 6 7 8
Modulus 1.1 1.2 1.3 1.4 1.25 1.34 1.61 1.54
ID 9 10 11 12 13 14 15 16
Modulus 7.8 8.7 8.0 8.5 7.98 8.21 8.35 8.13
Fig. 4. Sample Set 2
while rejects some spots of each piece from spots cloud, then the result must be incorrect, which has big error. The sample set 2 shows this question. Point 1 7 9 10 12 and 13 are noise points, and the result is 4.702, which is obviously incorrect and is shown in Fig.4.
、、、 、
3.2 The Improved Mean-Cluster Algorithm The sample set of points is shown in Table 3. The minimum distance between two points (one is 6, the other is 15) of the set is 0.977. After calculate central point (ID=18) of point 6 and point 15, delete them temporarily. Then calculate distances again and do the same operation till there is only two points left. Points whose ID numbers are larger than 17 are central points. The cluster relationship shows in Table 4 and Fig.5. The account ’n’ of notation ’-’expresses that after n times of calculation that point can be chosen.
84
X. Ban et al. Table 3. Sample set 3 ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
X 10.23 2.85 5.12 -1.23 -7.5 1.03 -3.6 2.8 2.3 1.47 1.29 -1.6 0.01 0.39 1.23 5.2 -3.9
Y 2.15 2.36 2.15 -2.15 2.1 5.36 -4.2 6.39 -3.2 2.15 -1.23 -1.03 -2.41 1.27 5.25 -5.23 1.28
Z 1.68 3.2 4.25 -1.68 -3.45 1.36 -5.2 1.2 1.1 -3.21 0.2 1.64 -1.38 -1.3 2.31 -2.09 1.08
Table 4. Cluster relationship ID 6 4 8 10 9 2 19 12 17 20 21 27 5 1 16 30
ID 15 13 18 14 11 3 22 24 25 23 26 28 7 29 31 32
C 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 End
D 0.977 1.302 2.090 2.364 2.390 2.510 3.247 3.258 4.508 4.675 4.791 5.909 7.613 9.093 8.262 11.69
Center coordinate 1.13 5.305 1.835 -0.61 -2.28 -1.53 1.965 5.848 1.518 0.93 1.71 -2.255 1.795 -2.215 0.65 3.985 2.255 3.725 0.593 -2.248 -0.44 -0.50 -1.639 0.6 -2.20 -0.179 0.84 2.975 4.051 2.621 -0.64 0.765 -0.708 1.169 2.408 0.957 -5.55 -1.05 -4.325 5.700 2.279 1.318 5.450 -1.475 -0.386 -0.05 -1.263 -2.355
Each cluster result corresponds to a distance value. Regarding to the given artificial fish sensation scope d, distances which are smaller than d are out of sensation scope.
The Research of Artificial Animal’s Behavior Memory Based on Cognition
85
Fig. 5. Cluster relationship
Fig. 6. The route choice of artficial fish without behavior memory
In this example, when d=2.42, we can get the center point G=(1.032,1.27,-0.156), and that would be what we need. In the space of three dimensions, artificial fish can get foods use this method. This can be shown in Fig.7 and Fig.8. The artificial fish (shown in Fig.7) with behavior memory can swim and tend towards the center point of foods by computing. The artificial fish (shown in Fig.6) without behavior memory swam and kept away from foods.
86
X. Ban et al.
Fig. 7. The route choice of artficial fish with behavior memory
4 Conclusions As is shown in experiment above, the quadratic variance algorithm has smaller computational complexity, but its application is also limited. The improved mean-cluster algorithm can discover precisely each cluster, but it’s more complex than the former. Animal's sensation result is the feedback of its own impression, which forms the memory through the cognition to be fuzzier. As artificial fish’s low status, it enables the sensation result only to form a general impression. Therefore, in hypothesized environment artificial fish may only need to get the sensation to the environment approximately, and this causes the probability of quadratic variance algorithm to be used in A Fish as well as the improved mean-cluster algorithm. The artificial animal is the hypothesized realization to the nature survival animal; therefore its kind is varied. Each kind has different sensation ability. If one kind of fish has shorter life cycle, worse sensation ability and smaller type, the quadratic variance algorithm is the good suitable choice. On the contrary, if it is longer life cycle, well developed sense organ, the improved mean-cluster algorithm will be better. This article is mainly discussed the behavior memory formed algorithm, its good and bad points and the application scope based on the focuser, which enable the artificial animal to have certain life characteristic, and has confirmed the efficiency of the theory. The next step is sensation fusion, which will cause the sensation focuser to be more functional, and then the artificial animal will be able to be more lifelike, and its life characteristic will also be able to be richer. Acknowledgments. This work is supported by National Natural Science Foundation of P.R. China (No.60503024).
The Research of Artificial Animal’s Behavior Memory Based on Cognition
87
References 1. Langton, C.G.: Artificial Life[J], SFI Studies in the Science of Complexity (1998) 2. Xiaoyuan, T.: Artificial Fish–Artificial Life Approach of Computer Animation[M], Beijing, Tsinghua University (2001) 3. Xiaojuan, B., Hongjuan, C., Zhaoshun, W., Xuyan, T.: Research on Cognitive Modeling Approach for Artificial Fish[J]. Computer Engineering and Applications 8, 13–15 (2002) 4. Xiaojuan, B., Guangping, Z., Xuyan, T.: Design and Implementation of Perception System Based on Self-Learning for Artificial Fish. Chinese Institute of Electronics 32, 2041–2045 (2004)
How to Ensure Safety Factors in the Development of Artificial Heart: Verified by the Usage of “Modeling and Simulation” Technology Mitsuo Umezu Integrative Bioscience and Biomedical Engineering, Graduate School of Waseda University, #58-322 3-4-1 Ohkubo, Shinjuku, Tokyo, 169-8555, Japan
[email protected]
Abstract. The author has involved in two types of clinically available artificial hearts in Japan for these 30 years. Firstly, Toyobo auxiliary pulsatile ventricular assist pump was commercialized in 1991 and over 700 clinical cases are reported in Japanese market. Secondly, an implantable non-pulsatile ventricular assist pump (EVAHEART, Sunmedical Tech. Res. Co.) has been implanted into 11 patients in Japan. The author has been establishing a methodology to eliminate considerable risk factors through a development of hydrodynamic performance test, durability test and biocompatibility test. These in vitro tests are all effective to ensure a safety of the system and also in vitro tests have a contribution to reduce a number of animal experiments.
1 Introduction The author has 30 years experience on the development of artificial hearts, and involved in two types of Japanese-made artificial hearts which have been applied clinically in Japan, as shown in Fig.1. Firstly, auxiliary pneumatically-driven pulsatile assist pump was commercialized by Toyobo Company in 1991 based on the fundamental studies by the artificial heart group of the National Cardiovascular Center Research Institute, Osaka. [1-2] This device has been implanted in over 700 patients in Japan up to now. On the other hand, implantable centrifugal type ventricular assist device project was organized by Dr.Kenji Yamazaki of Tokyo Women’s Medical University in 1990 and Sun Medical Technology Research Co., Waseda University and University of Pittuburgh Medical Center have been contributed for the development of clinical quality ventricular assist pump, called EVAHEART[3-4]. It has been implanted into 11 human cases in Japan. The first three cases were all discharged and alive over 2 years. The first patient received EVAHEART in May, 2005. He has a full-time job, and drives a car for work everyday. To achieve this present successful level, huge number of in vitro experiments and animal experiments were performed. Throughout the project, roles of biomedical engineers become more important. Bioengineers have been establishing a methodology of various types of in vitro tests to eliminate considerable risk factors, through a development of hydrodynamic performance tests, fatigue tests and K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 88–96, 2007. © Springer-Verlag Berlin Heidelberg 2007
How to Ensure Safety Factors in the Development of Artificial Heart
Pulsatile (Toyobo)
89
Non-pulsatile (EVAHEART)
Fig. 1. Two types of clinical ventricular assist device (VAD)
biocompatibitity tests. As in vitro tests have been proved effective, some typical tests are briefly introduced in this paper.
2 Present Status of the Toyobo VAD 2.1 Description of the Toyobo VAD System Fig.2 shows the Toyobo VAD system, consisting of a pneumatically-driven pulsatile pump (Fig.2, left) and its drive console (Fig.2 right). The VAD has two mechanical disc valves and moving diaphragm to eject pulsatile flow. As the VAD is located outside of the body, inlet and outlet cannulae are placed through the chest wall. The inlet cannula is generally inserted into the apex of the left ventricle (LV), whereas the outlet cannula is anastomosed to the ascending aorta.
Fig. 2. Implantation of Toyobo LVAD (LV apex ~ Ascending Aorta Bypass)
2.2 Clinical Application of the Toyobo VAD The Toyobo VAD has been implanted into more than 700 patients in Japan. When a clinical application of the Toyobo VAD was started in early 1980s, major VAD candidates were the patients with profound heart failure following acute myocardial
90
M. Umezu
infarction or after collective surgery, because the mortality in above cases were very high even though medical therapy or circulatory assist by intra-aortic balloon pumping(IABP) were applied. Implantation of the initial 16 clinical cases was performed at the National Cardiovascular Center, Osaka, where the original developmental study for Toyobo VAD was carried out. 9 cases could be weaned from the VAD after 6~15 days pumping, but only two patients could be discharged from the hospital. According to the recent clinical data summary, patients with dilated cardiomyoathy (DCM) have had more opportunity to receive VAD since early 90s [5]. It was reported that number of DCM patients with Toyobo VAD was 161 cases in between 1992 and 2004. And average pumping duration was 226 days (4~1245 days). This value is much longer than that of initial VAD patients in early 80s, all of which were mainly expected for a cardiac recovery of the natural failing heart. Major causes of the termination with VAD assisted patients were cerebral bleeding (41 cases), multiple organ failure (35 cases) and infection (16 cases). Although the Toyobo guaranteed onemouth pumping in clinical application, longer period of usage and exchange of the device are recognized as a common procedure in postoperative care. However, reliable data to justify above procedure must be obtained based on biomedical engineering analysis, because there was no consideration for such a long term use at the initial developmental stage.
3 Present Status of EVAHEART 3.1 Introduction of the EVAHEART System EVAHEART is a Japanese-made centrifugal blood pump as an implantable LVAD. The EVAHEART project was organized by a cardiac surgeon, Dr. Kenji Yamazaki of Tokyo Women's Medical University, and supported by Sun Medical Technology Research Corporation and Waseda University since 1991. Through the developmental process, more than 50 industrial companies with unique technologies have been contributed to the improvement of EVAHEART, and University of Pittsburgh Medical Center has joined for animal experiments since 1993. The weight and volume of the clinical available model is 422g and 132ml, respectively. Sufficient pump performance has already been confirmed through in vitro durability test as well as long term animal experiments. Three cases of clinical application were performed at Tokyo Women's Medical University and the National Cardiovascular Center in 2005 as a pilot clinical evaluation, and all discharged in favorable condition. The first patient has a full-time job. There are 8 more clinical applications up to now, and only one case was died from brain bleeding. 3.2 Hydrodynamic Performance of the EVAHEART Six times of the EVAHEART design modification was performed based on both in in vitro and in vivo study for 8 years. The in vitro data have been obtained from Wasedaoriginal mock circulatory system shown in Fig.3. Special features of this loop are as follows:
How to Ensure Safety Factors in the Development of Artificial Heart
91
1) Anatomically identical shaped silicone left ventricle and atrium. 2) Similar pressure-volume curves in one cardiac cycle driven by linear actuator. The EVAHEART was installed into the mock loop between LV apex and ascending aorta. This location is just simulated to the animal model.
Fig. 3. Schematic drawing of mock circulatory system with EVAHEART
Fig.4 shows one of the fundamental flow characteristics represented as a “HeadFlow” relationship. 9L/min of blood pump flow was achieved against 100mmHg at the pump speed of 2400 rpm, which exhibited a satisfactory bypass flow for the patient of profound heart failure. On the other head, it is essential to know the reliable bypass flow at any time for the safe postoperative care. Therefore a software was designed to estimate the bypass flow. The estimated flow by motor current and rotational speed was continuously compared with that measured by electromagnetic flow meter using the previouslydescribed mock circulatory system. Fig. 5 is a display of the flow monitor used for the clinical application. When a normal circulating condition (awake) was simulated in the mock circuit, the estimated flow was 5.84L/min, whereas the flow measured by the electromagnetic flow meter was 5.68L/min. Even if the pump flow was increased by twice as a simulation of exercise state, a difference between two flow values was also small and permissible. Therefore, it was confirmed that non-invasive flow measurement system can be used clinically.
M. Umezu
Head (mmHg)
92
200 180 160 140 120 100 80 60 40 20 0 0
5
10 Flow Rate (L/min)
3,000rpm 2,200rpm
2,800rpm 2,000rpm
2,600rpm 1,800rpm
15
20
2,400rpm 1,600rpm
Fig. 4. Fundamental flow characteristics of EVAHEART
Fig. 5. Example of monitor display of EVAHEART flow estimated by motor current and rotational speed
3.3 Durability Test for EVAHEART Durability test protocol, including a design of the machine, was discussed with US Food and Drug Administration (FDA), because there are new design and function in the EVAHEART, such as cool seal system. Whole view and schematic drawing are indicate in Fig. 6.[6] Major circulatory loop of the durability test machine is composed of motor-driven left ventricle with two valves, elastic tubes for afterload compliance, screwclamp for peripheral resistance and overflow-type preload chamber. The pulsatile flow is circulated in the mock circuit, where EVAHEART is also driven in between LV and aorta, which is the same way as the clinical implantation. To ensure a practical durability for a long-time usage, general cycle of three activity levels, these are “sleep”, “awake” and “exercise”, are sequentially shifted one after the other everyday. Operating condition is summarized as shown in Table 1. For example, normal cardiac function (awake) was set at 70 BPM with a total flow (sum of the cardiac output and the bypass flow) of 6.6 L/min and bypass flow from EVAHEART was automatically sent at 6.3 L/min, if the rpm was fixed at 1900 rpm.
How to Ensure Safety Factors in the Development of Artificial Heart
93
18 identical durability mock loops were placed and 18 EVAHEARTs were evaluated over one year. Fortunately, no injured surface was observed. There was no termination of the test due to the pump failure. It was statistically proved that this EVAHEART system was successfully passed the reliability of 80% and the confidence level of 90%, even if one system had a problem.
Fig. 6. Durability test circuit for EVAHEART
3.4 Biocompatibility Test for EVAHEART Biocompatibility is one of the most important items to be considered for long-term usage of the blood pump. Fig. 7 shows our in vitro biocompatibility test circuit for hemolysis and / or thsombus formation tests. [7] Test protcol of biocompatibility is as follows: 1) Heparinized fresh porcine blood harvested from the same animal should be used for comparative study. Table 1. Operating Condition of Durability test for EVAHEART
L/min L/min
Sleep (50 BPM) 5.7 5
Awake (70 BPM) 6.6 6.3
Exercise (120 BPM) 9 6.4
Active level (pulse rate) TF BF
AoP
mean Systolic Diastolic
mmHg mmHg mmHg
83.3 100 75
84.7 105 75
87.3 110 75
LVP
Systolic Diastolic
mmHg mmHg
100 15
105 15
130 10
8
15
1
Duration Hour/day EVAHEART: 1900 rpm (const.).
94
M. Umezu
2) Therefore, identical two circuits should be always prepared for comparative study. 3) Pulsatile circulation to simulate hemodynamic pressure / flow condition is preferable to obtain practical data. Based on the protcol, the complete closed circuit was developed. In each circuit in Fig. 7, three spiral-vortex diaphragm-type pulsatile pumps are installed [8], but each function is different. The circulation is guaranteed by the LV pump as shown at the left bottom of the circuit in Fig. 7. Only this pump has two valves and ejects pusatile flow by pneumatic pressure. Polyurethane compliance tube is installed at the outlet of the LV to reproduce similar aortic pressure waveforms. The pump located at the top is used as a compliance adjustment. The right bottom pump was used for venous reservoir. The resistance is placed between compliance and venous pumps. Fig. 8 is one of the test results after one hour circulation of fresh bovine blood. Anticoagurant agent (heparin) was used to maintain the ACT of around 300sec. The same amount of blood was divided and injected into two identical circuits. As can be seen on the upper photographs of Fig. 8, fibrin network was observed and some blood cells were captured on the miractran surface of the LV pump. On the contrary, MPC surface was clean.[9] Although both coating materials are polyurethane, biocompatibility was clearly differenciated.
Fig. 7. In vitro biocompatibility test circuits for hemolysis and for thrombus formation test
Fig. 9 is one of the mechanical hemolysis test results. Instead of the SV pump at the LV portion, non-pulsatile pumps (Biopump and EVAHEART) were installed in each circuit. Hemoglobin level was represented as N.I.H (Hemolytic Index) which was derived from the equation in Fig. 9. It was found that the hemolysis level data were reproducible and hemolysis by EVAHEART was much lower than that by Biopump.
How to Ensure Safety Factors in the Development of Artificial Heart
95
Miractran H
μ
μ
μ
100 m
D-HJunction
Diaphragm
Housing
D
500 m
100 m
MPC H
μ
μ
Housing
μ
500 m
100 m
100 m
D
Diaphragm
D-HJunction
Fig. 8. Example of the biocompatibility test results obtained from our circuit 0.006 0.005
00 1/ 0.004 g( .H 0.003 .I .N 0.002 0.001
⊿
N.I.H= PFH×V×(100-Ht)/(Q×T)
⊿PFH : Increase in Plasma Free
Hemoglobin V : Circulation blood volume Ht : Hematcrit Q : Pump flow T : Circulation time
0
Fig. 9. Comparison of mechanical hemolysis between Biopump and EVAHEART
4 Conclusions The author has been establishing a methodology to eliminate considerable risk factors through a development of practical in vitro test circuits. As mentioned above, it was confirmed that all in vitro tests are effective to ensure a safety toward clinical application. The author would like to propose this kind of approach, as another EBM: Engineering Based Medicine. Acknowledgment. This EVAHEART research is performed as a collaboration among Tokyo Women’s Medical University, Sunmedical Technology Research Corporation, and Waseda University. And research on in vitro testings have been achieved by
96
M. Umezu
research associates and graduate students of Umezu Laboratory, Waseda Graduate School, organized by Advanced Research Institute for Science and Engineering (05P29).
References 1. Takano, H., Hayashi, K., Umezu, M., Tomino, T., Koh, Y., Kito, Y., Koyanagi, H., Naito, Y., Fujita, T., Manabe, H.: Fundamental studies of clinically applicable heart assist device: Development of the system for heart assist device. Jpn. J. Artif. Organs 9, 601–603 (1980) 2. Umezu, M., Takano, H., Hayashi, K., Taenaka, Y., Nakamura, T., Matsuda, T., Akutsu, T.: Development of a control-drive unit for left ventricular assist device. In: Proceedings of Multi-international Instrumental Conference (MICONEX ’83), pp. 759–763 (1983) 3. Yamazaki, K., Mori, T., et al.: The cool-seal concept: A low temperature mechanical seal with recirculating purge system for blood pumps. In: Kobayashi, H., Akutsu, T. (eds.) Artificial Heart, Tokyo, Japan, vol. 6, pp. 339–344. Springer, Heidelberg (1994) 4. Umezu, M., Yamazaki, K., Yamazaki, S., Iwasaki, K., Miyakoshi, T., Kitano, T., Tokuno, T.: Japanese-made implantable centrifugal type ventricular assist system (LVAS): EVAHEART. Biocybernetics and Biomedical Engineering 27(1/2), 111–119 (2007) 5. Registry of artificial organs-2000: The Japanese J. Artif. Organs (in Japanese) 30, 46–53 (2000) 6. Kitano, T., Tokuno, T., Kaneko, K., Yamazaki, K., Kihara, S., Umezu, M., Iwasaki, K., Conti, J.C.: A new and more practical approach for in vitro durability test for rotary LVAS pump. ASIO J. 50(2), 133 (2004) 7. Iwasaki, K., Umezu, M., Tsujimoto, T., Saeki, W., Inoue, A., Nakazawa, T., Arita, M., Ye, C.X., Imachi, K., Ishihara, K., Tanaka, T.: A challenge to develop an inexpensive ’Spiral vortex pump’. In: 2nd Waseda-Renji Hospital BioMedical Engineering Symposium, Shanghai, China, pp. 8–10 (2002) 8. Umezu, M., Ye, C.X., Nugent, A., Nakamura, T., Iwasaki, K., Arita, M., Shiraishi, Y., Tanaka, T., Imachi, K., Ishihara, K., Chang, V.P.: Spiral vortex ventricular assist device: Its history and present status. In: 2nd Waseda-Renji Hospital BioMedical Engineering Symposium, Shanghai, China, pp. 6–7 (2002) 9. Ishihara, K., Tsuji, T., Kurosaki, T., Nakabayashi, N.: Hemocompatibility on graft copolymers of poly (2-methacryloyloxyethyl phosphorylcholine) side chain and poly (n-butyl methacrylate) backbone. Journal of Biomedical Materials Research 28, 225–232 (1994)
Parametric-Expression-Based Construction of Interior Features for Tissue Engineering Scaffold with Defect Bone Chunxiang Dai, Qingxi Hu, and Minglun Fang Rapid Manufacturing Engineering Center, Shanghai University, 200444 Shanghai, China
[email protected]
Abstract. Constructive features of tissue engineering scaffold with defect bone are defined. And based on the definitions and medical image 3D reconstructions as well as CAD technology, the constructing method for macrostructure of its interior features of tissue engineering scaffold with defect bone is presented, which is based on parametric expressions. And demonstrations of two constructing methods are given. Finally from calculations of the porosity, the structures of interior features are proven to meet the demand of porosity of tissue engineering scaffold with defect bone.
1 Introduction As development of advanced manufacturing technology, material science and biomedicine, researches and applications for bone tissue engineering have been greatly progressed in recent years and constructing tissue-engineered bone is being on marketing way. Repairing bone defect with tissue engineering is a brand-new therapy mode, which is related to bone defect reparation, scaffold construction and 3D-reconstruction [1, 2]. And these technologies have become the research hotspot in the fields of tissue engineering and bio-manufacturing home and abroad [3, 4, 7]. This paper, based on definitions of tissue engineering scaffold with defect bone, presents and analyzes the constructing method for macrostructure of its interior features.
2 Feature Definitions of Tissue Engineering Scaffold with Defect Bone Bone tissue engineering scaffold plays an important role in repairing defect bone, which acts not only as a support to retain the shape of original tissue but also as a template to provide cells with space for lodging, growing, differentiating and proliferating, and with which the regeneration of harmed tissue is guided and the structure of regenerating tissue is controlled. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 97–103, 2007. © Springer-Verlag Berlin Heidelberg 2007
98
C. Dai, Q. Hu, and M. Fang
Features of TESD
Exterior feature
Interior feature
Macrostructure
Microstructure
Fig. 1. Feature definitions of TESD
Tissue engineering scaffold with defect bone, simplified as TESD below, is the carrier adapt to tissue cell growing. According to constructing demands for bone scaffold applied by tissue engineering, biomaterial and rapid manufacturing, features of TESD are divided into exterior feature and interior feature. The definition levels are shown as figure 1. Exterior feature is the exterior shape feature of TESD, i.e., the shape of defect bone, and interior feature is the inner porous structure of scaffold whose function is to provide the environment of bone cell growing. Interior feature is the channels of supplying cells with nourishment and oxygen. Macrostructure is the inner porous structure constructed inside scaffold through computer modeling technology and microstructure is that constructed within scaffold by biomaterial through shaping technology. Exterior and interior features constitute the whole character of TESD while Macroand microstructures compose the inside porous character of TESD which functions as porousity and connectivity of TESD.
3 Construction of Macrostructure for TESD Based on the definitions above, the interior macrostructure of TESD is constructed inside scaffold through computer modeling technology, which should be modeled on tissue engineering and process requirement [5-6]. 3.1 Parametric-Expression-Based Construction of Macrostructure for TESD When constructing Macrostructure for TESD, the pore diameters and their intervals can be controlled and adjusted through the parametric expressions of CAD technology, as shown as figure 1, where dx, dy and dz are the pore diameters in directions of X, Y and Z, k is the coefficient of pore interval, here take k as 1.4, and tx, ty and tz are the pore intervals in directions of X, Y and Z. The intervals are different since pore diameters of
Parametric-Expression-Based Construction of Interior Features
99
Fig. 2. Building the related parameter expressions
Fig. 3. Defect bone model
three directions are distinct. Also, xx, yy and zz are pore number in directions of X, Y and Z, and trnc( ) is the round function in CAD technology. In the expressions above, x, y and z are constant which means the maximum length of bounding box of TESD, determined by practical TESD. As an example in this paper, defect bone model is shown as figure 2, and from its maximum size of outline, take x=50, y=16 and z=52, shown as figure 3.
(a) isometric view
(b) front view
(c) top view
Fig. 4. The maximum bounding box of defect bone model
(d) side view
100
C. Dai, Q. Hu, and M. Fang
front view
side view
top view
Fig. 5. The geometric meaning of dx, dy, dz and tx, ty, tz
The geometric meaning of above-mentioned parameters, dx, dy, dz and tx, ty, tz are shown as figure 4. 3.2 Demonstrations of Constructing Methods In order to illuminate the constructing methods of macrostructure based on parametric expressions, hereinafter, scaffold models will be constructed through setup
(a)
(b)
(c)
Fig. 6. The generated TESD model when dx=2, dy=1.5 and dz=1, here (a) is the original scaffold without pores whose volume is V1=9840mm3, (b) is the negative model with pores and (c) is the result whose volume is V2=4475mm3
Parametric-Expression-Based Construction of Interior Features
101
demonstrations of two groups of pore diameters, which are non-homogeneous. Also the porosities are calculated. Demonstration 1 According to the parameters setup in figure 1, i.e., dx=2, dy=1.5 and dz=1, then the interval distances between pores are separately 2.8, 2.1 and 1.4 in direction X, Y and Z. After Boolean operation the TESD model with non-homogeneous macrostructure is generated as shown as figure 6. From the formula of calculating porosity
【
】
δ= Vp÷Vs
δ
(1)
where is the porosity of scaffold, Vp is the volume of scaffold pores and Vs is the volume of scaffold without pores, and through the function of mass properties in CAD technology, the values can be calculated, that is, Vs=V1=9840mm3, Vp=V1-V2= 5365mm3. Thus, the porosity of scaffold is
δ= Vp÷Vs =(V -V )÷V 1
2
1
= 54.5%
If accounting for the porosity, 20%~30%, generated from biomaterial and manufacturing technology, then the model gained above can theoretically meet the porous demand of TESD model.
Fig. 7. Setup the related parameter expressions, i.e., dx=1.3, dy=1 and dz=0.8
【
】
Demonstration 2 According to the parameters setup in figure 7, i.e., dx=1.3, dy=1 and dz=0.8, then the interval distances between pores are separately 1.82, 1.4 and 1.12 in direction X, Y and Z. After Boolean operation the TESD model with non-homogeneous macrostructure is generated as shown as figure 8. And now the porosity of scaffold is
δ= Vp÷Vs =(V -V )÷V 1
3
1
= 57.0%
Similarly, if taking account for the porosity, 20%~30%, generated from biomaterial and manufacturing technology, then the model gained above can also theoretically meet the porous demand of TESD model.
102
C. Dai, Q. Hu, and M. Fang
dz dy
dx (a)
(b)
Fig. 8. The generated TESD model when dx=1.3, dy=1.0 and dz=0.8, here (a) is the whole scaffold with pores whose volume is V3=4227mm3 and (b) is magnified vew of the local area
From two setups of pore diameters demonstrated above, it can be drawn out that descent of diameters, i.e., from dx=2, dy=1.5 and dz=1 to dx=1.3, dy=1.0 and dz=0.8, effects unobviously the porosity which increases only fromδ= 54.5% toδ= 57.0%. So pursuing too small diameters of macro-pore for scaffold to increase porosity is not available. Sometimes it will bring on dramatical reduction of calculating speed, even on computer break-down. After the model, which is generated in Demonstration 2, being converted into STL format, it can be imported to RP machine and manufactured into RP model, as shown as figure 9, where (a) is SLS model and (b) is SLA model.
(a)
(b)
Fig. 9. The RP model of TESD with macrostructure, where (a) is SLS model and (b) is SLA model
Parametric-Expression-Based Construction of Interior Features
103
4 Conclusion Based on the definitions and medical image 3D reconstructions as well as CAD technology, the constructing method for macrostructure of its interior features of TESD is presented, which is based on parametric expressions. And demonstrations of two constructing methods are given. From calculations of the porosity, the structures of interior features are proven to meet the demand of porosity of TESD. And the final scaffold model can be transformed into STL format to import to RP machine to be manufactured. Acknowledgements. The author thanks to Shanghai Municipal Education Committee’s Development Fund (No. 06AZ029) for its financial support in this research.
References [1] Giltaij, L.R.: BMP-7 in orthopaedic applications: a review. Musculoskel. Res. 6, 55–62 (2002) [2] Minamide, A., Boden, S.D., Viggeswarapu, M., Hair, G.A., Oliver, C., Titus, L.: Mechanism of bone formation with gene transfer of the DNA encoding for the intracellular protein LMP-1. Bone Joint Surg. 85, 1030–1039 (2003) [3] Karageorgiou, V., Kaplan, D.: Porosity of 3D biomaterial scaffolds and osteogenesis. Biomaterials 26, 5474–5491 (2005) [4] Elutmacher, D.W.: Scaffold design and fabrication technologies for engineering tissues – state of the art and future perspectives. Biomaterials: Sci. Polymer Edn. 12, 107–124 (2001) [5] Borden, M., Attawia, M., Khan, Y., Laurencin, C.T.: Tissue engineered microsphere-based matrices for bone repair, design and evaluation. Biomaterials 23, 551–559 (2002) [6] Sun, W., Starly, B., Nam, J., Darling, A.: Bio-CAD modeling and its applications in computer-aided tissue engineering. Computer-Aided Design 37, 1097–1114 (2005) [7] Shao, X.X., Hutmacher, D.W., Ho, S.T., Goh, J.C.H., Lee, E.H.: Evaluation of a hybrid scaffold/cell construct in repair of high-load-bearing osteochondral defects in rabbits. Biomaterials 27, 1071–1080 (2006)
Computation of Uniaxial Modulus of the Normal and Degenerated Articular Cartilage Using Inhomogeneous Triphasic Model Haijun Niu1, Qing Wang2, Yongping Zheng2, Fang Pu1, Yubo Fan1, and Deyu Li1 1
Department of Bioengineering, The Beihang University, Beijing, China, 100083 2 HTI, The Hong Kong Polytechnic University, Hongkong, China, 00852
[email protected]
Abstract. Articular cartilage is a biological weight-bearing tissue covering the bony ends of articulating joints. Subtle changes in tissue composition can lead to degeneration of articular cartilage. This study develops an improved inhomogeneous triphasic model with four parameters and extracts the uniaxial modulus (Ha) for both normal and degenerated articular cartilage, and then predicts the swelling pattern of the cartilage. The results indicated that the new inhomogeneous triphasic model can extract uniaxial modulus more accurately, and the predict results also appeared to well match the experimental strain data. This inhomogeneous triphasic model can describe the depth-dependent material attribute of normal and degenerated articular cartilage more exactly.
1 Introduction Articular cartilage is the layer of low friction, load-bearing soft tissue that covers the articulating bony ends in diarthrodial joints. The functions of this tissue are to distribute stresses and to provide a low-friction bearing surface for joint motion. Under normal physiological conditions, these functional properties can persist for many decades [1]. However, some reasons such as overloading could lead to the degeneration of this tissue and even result in osteoarthritis (OA) the most common one of all joint diseases [2]. Articular cartilage is a multiphasic mixture. The special mechanical properties of this tissue are controlled by its complex biochemical composition, the molecular and ultrastructural architecture of the macromolecules, and the heterogenous distribution throughout the depth of the tissue. [3-4]. Under biological condition, articular cartilage is in a swollen state, which plays a significant role in the biomechanical behavior of articular cartilage when the tissue is loaded [4-5]. Factors such as proteoglycan concentration, fixed charge density, water volume fraction, and the intrinsic material properties of the cartilage solid matrix govern the inhomogeneous swelling strain distribution. It has been suggested that the elasticity modulus
-
K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 104–110, 2007. © Springer-Verlag Berlin Heidelberg 2007
Computation of Uniaxial Modulus of the Normal and Degenerated Articular Cartilage
105
extraction based on the triphasic model and quantification of the swelling effects can be used to detect material inhomogeneity, anisotropy as well as to characterize the degenerative changes associated with osteoarthritis [5-8]. Mixture models have been successfully used to predict the response of articular cartilage to various loading conditions. Mow et al. built on a biphasic mixture model of articular cartilage where the collagen–proteoglycan matrix is modeled as an intrinsically incompressible porous-permeable solid matrix, and the interstitial fluid is modeled as an incompressible fluid [10]. The biphasic model was able to predict the mechanical responses of articular cartilage in the confined compression creep, stressrelaxation, and dynamic loading [6,10]. The biphasic model was later extended by many researchers. Mak combined the linear biphasic theory with the quasi-linear viscoelasticity theory of Fung and developed a biphasic poroviscoelastic theory [11]. This model can describe the response of cartilage in unconfined compression well. Lai et al. [6] proposed a triphasic model of articular cartilage as an extension of the biphasic theory, where proteoglycans are modeled as the negative charge density fixed to the solid matrix and the monovalent ions in the interstitial fluid modeled as an additional fluid phase. This model provides a more accurate description of the tissue composition and mechano-electrochemical response. However, in this model, the cartilage solid matrix was assumed to have homogeneous material properties with uniform aggregate modulus, Ha, and Poisson’s ratio, ns. Narmoneva et al. extended this model to include the layered material properties to describe the depth-dependent strains of cartilage. In their bi-layer model, Ha was assumed to change linearly with the depth in the upper layer and to be a constant in the lower layer [9]. In this study, an improved layered inhomogeneous triphasic model with four parameters is developed based on the one proposed by Narmoneva et al [9]. This model was used to extract the uniaxial modulus (Ha) of both normal and degenerated articular cartilage, and then predicts the swelling pattern for normal and degenerated articular cartilage.
2 Model 2.1 Triphasic Theory The central concept of the theoretical model of articular cartilage is the principle of balance of forces i.e. that at equilibrium the sum of all forces inside cartilage is zero, including the osmotic pressure, applied pressure and tensile stresses in the collagen network. Lai et al. proposed a triphasic theory on the basis of the mixture theory to model the free-swelling behavior of cartilage [6]. In the model, the cartilage was dealt as a mixture of linear, isotropic, incompressible collagen-proteoglycan solid matrix and incompressible fluid consisting of water and ions. Narmoneva et al. extended the model to include the layered material properties [9]. According to the triphasic theory [6] and the extended layered model [9], the total stress in the isotropic porouspermeable mixture of the fluid, solid and ion phases in cartilage, σ , can be written as follow,
σ = σ s + σ w + σ + + σ − = − pI + λ s tr ( E ) I + 2μ s E
(1)
106
H. Niu et al.
Where σ represents the total stress for the mixture; the indices w, s, +, - denote quantities associated with water, solid, cation and anion, respectively; E is the infinitesimal strain tensor measured from the physiological salt bath corresponding to the hypertonic reference configuration; p is the fluid pressure; in here,
p = pDonnan + p0 −T1, pDonnan and p0 are the Donnan and the entropic components of the swelling pressure in the fluid phase, respectively, and T1 = c0 kTN Avo /1.5 ⋅[1 − φ0 ] , F
w
c0F is fixed charge density and φ0w is water volume fraction. λs and μs are Lame coefficients of the solid matrix. According to Eq. (1), the total stress in cartilage at equilibrium under free-swelling conditions consists of two components, the interstitial fluid pressure (p) and the elastic stress dependent on the material properties of the solid matrix. The relationship between aggregate modulus (HA) and the material parameters can be described as H A = λs + 2 μ s . It was assumed in the model that the contribution of the entropic effects to cartilage swelling pressure was zero and there was no external applied hydrostatic pressure [9]. Therefore, the fluid pressure (p) approximately equaled to the Donnan osmotic pressure pDonnan , i.e., all swelling effects arising from the electrostatic interaction between negatively charged PGs and ions. A linear constitutive expression for
p Donnan can be obtained from the boundary conditions for equivalent chemical potential of water and NaCl ions across the free surface:
([
p Donnan ≈ RT (c0F ) 2 + (2c * ) 2
]
1/ 2
− 2c * − tr ( E ) ⋅ (c0F ) 2 /(φ 0w [(c0F ) 2 + (2c * ) 2 ]1 / 2 )
Where R is gas constant, T is the absolute temperature and
)
(2)
c* is the NaCl
concentration; Fixed charge density ( c0 ) and water volume fraction ( φ0 ) values of F
w
cartilage could be measured using biochemical method. 2.2 Inhomogeneous Bi-layer Model Previous studies had showed that the distribution of the fixed negative charges and the water volume fraction in cartilage are inhomogeneous. The distribution of the swelling-induced strains is also different at different depths [8-9]. Therefore, the homogeneous model can not predict the alteration in the swelling-induced strain and the aggregate modulus, especially for the degraded cartilage well [9]. The single layer homogeneous model is showed in Fig1-a. Narmoneva et al. [9] modeled cartilage to be a two-layer structure in terms of Ha , where Ha in the upper layer (near the cartilage surface) linearly increased with the depth and Ha in the lower layer was a constant (Fig1-b). Based on our earlier observation that the modulus in the deep layer also increased with the increase of the depth [12], we proposed an improved two-layer model with four parameters. In here, we called it four parameters triphasic model (Fig1-c). In this model, we used three
Computation of Uniaxial Modulus of the Normal and Degenerated Articular Cartilage
107
Ha to describe the bi-layer structure. In the deep region (Layer1), which was attached to the subchondral bone, the aggregate modulus varied linearly from Ha1 at the cartilage-bone interface to the value
Ha2 at the Layer1-Layer2 interface. In the
upper layer (Layer2), the aggregate modulus was also proposed to vary linearly from Ha2 at the Layer1-Layer2 interface to Ha3 at the articular surface. The thickness of Layer1 was defined as h1 . Therefore, these four parameters
Ha1 , Ha2 , Ha3 and h1 determine the magnitude and distribution of the axial swelling-induced strain E F
together with the other parameters including c0 ,
0
Bone
Cartilage
Bone
φ0W , vs .
1
Cartilage
Bone
Cartilage
Fig. 1. Schematic diagram of the cartilage biomechanics models (a) homogeneous model; (b) Narmoneva’s triphasic model; (c) Four parameters triphasic model. The thickness of the cartilage specimen was normalized.
3 Results In order to verifying the model, some free-swelling experiment strains data should be gotten to extract aggregate modulus (HA). in this study, all the data include the swelling-induced strains E and the values of c0F and φ0W for one normal canine articular cartilage specimen (approximately 1 mm in thickness) and one degenerated human articular cartilage specimen (approximately 3.5 mm in thickness) were excerpted from Narmoneva’s dissertation [9]. The distribution of strains of two cartilage specimens caused by changing the NaCl concentration from 2 Mol to 0.15 F
Mol. The value of c0 and
φ0W
of two specimens were shown in figure 2, both of
them vary with the thickness in the cartilage layer. In our computation, poisson’s ratio vs =0.37 was used for normal canine cartilage and vs =0.05 for degenerated human cartilage.
108
H. Niu et al.
Fig. 2. The depth-dependent distribution of Fixed Charge Density c 0F and Water Volume Fraction φ 0W of the normal canine cartilage specimen and degenerated human cartilage specimen. (from Narmoneva’s thesis[9]). The thickness of the cartilage specimen was normalized. Table 1. The value of the parameters for two cartilage specimens calculated using homogeneous model, Narmoneva’s triphasic model and four parameters triphasic model respectively Normal Canine Cartilage Specimen Homogeneous Model
Ha =39.0MPa, (err=0.50)
Narmoneva’s triphasic model
Ha1=50.2 MPa, Ha2=8.1 MPa, h1*=0.26, (err=0.46)
Four Parameters Triphasic Model
Ha1=60.8MPa, Ha2=27.5MPa, Ha3=7.5 MPa h1*=0.79, (err=0.43)
Degenerated Human Cartilage Specimen Homogeneous Model
Ha =34.0MPa, (err=0.43)
Narmoneva’s Triphasic Model
Ha1=34.2 MPa, Ha2=0.01 MPa, h1*=0.6, (err=0.39)
Four Parameters Triphasic Model
Ha1=50.3MPa, Ha2=16.6MPa, Ha3=0.01MPa h1*=0.55, (err=0.35)
* The thickness of the cartilage specimen was normalized.
Computation of Uniaxial Modulus of the Normal and Degenerated Articular Cartilage
109
Using the distribution of the axial component of the swelling-induced strain E F
and the assigned c0 ,
φ0W and vs ,
the four parameters Ha1 , Ha2 , Ha3 and
h1 were
predicted. The least square error (LSE) between the predicted strain values and the swelling-induced strains E was used as the curve fitting criteria. For comparison purpose, we also calculated the parameters using Narmoneva’s model and homogeneous model. The computation program was edit using Mathematic software (Mathematics 5.0). Table 1 shows the comparison between the uniaxial modulus results predicted using the three different models. The least square error (err) between the predicted strains and the experimented strains also were shown in the table. Combined with the results of the unaxial modulus Ha shown in Table 1, the nonuniform distribution of the swelling-induced strains of the normal and degenerated cartilage specimen caused by changing the NaCl concentration from 2 Mol to 0.15 Mol were predicted using homogeneous model, Narmoneva’s triphasic model and four parameters triphasic model respectively. For comparing clearly, the distributions of strains predicted by three models were shown in the same figure 3(a) and (b). It is obvious that no matter for normal cartilage or for degenerated cartilage, no matter for animal cartilage or for human cartilage, the triphasic model with four parameters could predict the distribution of strains better. Strain
Strain
Depth
(a) Normal Cartilage Specimen –––––homogeneous model ―four parameters triphasic model
――
Depth
(b) Degenerated Cartilage Specimen - - - - three parameters triphasic model ♦ the experimental swelling-induced strains
Fig. 3. The nonuniform distribution of the experimental swelling-induced strains of the normal and degenerated cartilage specimen caused by changing the NaCl concentration from 2 M to 0.15 M and the theoretically predicted strain using homogeneous model, Narmoneva’s triphasic model and four parameters triphasic model respectively. The thickness of the cartilage specimen was normalized.
4 Conclusions Swelling is one of the early indicators of cartilage pathological change with osteoarthritis, the modulus extraction based on the triphasic model and the swellinginduced strains are important to the study of the damage of cartilage matrix. This study built a triphasic model with four parameters; the model can effectively predict a
110
H. Niu et al.
highly nonuniform distribution of swelling strains of articular cartilage and estimate material properties of articular cartilage tissue more exactly. This study is also important for understanding of the pathogenesis of OA and designing methods for the diagnosis and treatment of the disease. Acknowledgments. The work has been supported by the Research Grant Council of Hong Kong (PolyU5199/02E, PolyU 5245/03E) and National Natural Science Foundation of China. (10672014 and 10527001).
References 1. Mow, V.C., Ratcliffe, A., Poole, A.R.: Cartilage and diarthrodial joints as paradigms for hierarchical materials and structures. Biomaterials, 67–97 (1992) 2. Brandt, K., Doherty, M., Lohmander, S.: Osteoarthritis Cartilage. Oxford University Press, Oxford (1998) 3. Kempson, G.E.: The Joints and Synovial Fluid, vol. II, pp. 177–238. Academic Press, New York (1980) 4. Mow, V.C., Zhu, W., Ratcliffe, A.: Structure and function of articular cartilage and meniscus. In: Mow, V.C., Hayes, W.C. (eds.) Basic Othopaedic Biomechanics, pp. 143– 198. Raven Press, New York (1991) 5. Maroudas, A.: Balance between swelling pressure and collagen tension in normal and degenerate cartilage. Nature, 808–809 (1976) 6. Lai, W.M., Hou, J.S., Mow, V.C.: A triphasic theory for the swelling and deformation behaviors of articular cartilage. J. Biomech. Eng. 245–258 (1991) 7. Maroudas, A.: Transport of solutes through cartilage – permeability to large molecules. J. Anat. 335–347 (1976) 8. Narmoneva, D.A., Wang, J.Y., Setton, L.A.: Nonuniform swelling-induced residual strains in articular cartilage. J. Biomech. 401–408 (1999) 9. Narmoneva, D.A.: Material property determination for normal and osteoarthritic articular cartilage using triphasic mechano-chemical theoretical model of osmotic loading [D]. Phd Thesis, Duke university, Durham, North Carolina (2000) 10. Mow, V.C., Kuer, S.C., Lai, W.M., Armstrong, C.G.: Biphasic Creep And Stress Relaxation Of Articular Cartilage In Compression: Theory And Experiments. J. Biomech. Eng. 73–84 (1980) 11. Mak, A.F.: The Apparent Viscoelastic Behavior of Articular Cartilage—The Contributions from the Intrinsic Matrix Viscoelasticity and Interstitial Fluid Flows. J. Biomech. Eng. 123–130 (1986) 12. Zheng, Y.P., Mak, A.F.T., Lau, K.P., Qin, L.: An ultrasonic measurement for in vitro depth-dependent equilibrium strains of articular cartilage in compression. Phys. Med. Biol. 3165–3180 (2002)
Effect of the Plantar Ligaments Injury on the Longitudinal Arch Height of the Human Foot Yunfeng Yang1, Guangrong Yu1, Wenxin Niu2, Jiaqian Zhou1, Yanxi Chen1, Feng Yuan1, and Zuquan Ding2 1
Orthopaedic department of Tongji hospital affiliated to Tongji University, Shanghai, China
[email protected] 2 School of Life Science and Technology of Tongji University, Shanghai, China
Abstract. Most of the foot deformities relate with the arch collapse or instability, especially the medial longitudinal one. Though the function of the plantar fascia to the arch height has been investigated by some authors, the other plantar ligaments` effects are still unclear. The purpose of this study is to explore the roles of the plantar soft tissues in the foot arch biomechanics, including the plantar fascia, spring ligament complex, short plantar ligament and long plantar ligament through normal adult fresh frozen specimens in different injured condition. Also, a three-dimensional finite element model of a normal left foot was developed, which was comprising most joints of the foot and consisted of bone segments, major ligaments and plantar soft tissue. The validity of the three-dimensional finite element model was verified by comparing results with experimentally measured data via the displacements and Von-mise stress of each bone segments. These intrinsic ligaments of the foot arch were sectioned in different sequences in the cadaveric experiment, which simulated the different pathologic situations of the plantar ligaments injury and described the bone segments displacement and stress distribution.
1 Introduction The main factors that contribute to an acquired flat foot deformity are excessive tension in the triceps surae, obesity, PTT dysfunction, or ligamentous laxity in the spring ligament, plantar fascia, or other supporting plantar ligaments. Too little support for the arch or too much arch flattening effect will lead to collapse of the arch. The plantar fascia, or plantar aponeurosis, is the investing fascial layer of the plantar aspect of the foot. It is part of the retinacular system, which consists of a network of connective and adipose tissues whose main functions are to support and protect underlying vital structures of the body. The anatomy and functions of the plantar fascia has been well described by Sarrafianand[1] and others[2],[3],[4]. The spring ligament has been reported to be composed of the inferior calcaneonavicular and superomedial calcaneonavicular ligaments and an important stabilizer of the longitudinal arch of the human foot[5]. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 111–119, 2007. © Springer-Verlag Berlin Heidelberg 2007
112
Y. Yang et al.
Cadaveric studies have been done to investigate the biomechanical consequence of plantar ligamnets release. Huang[6] reported that the average vertical displacements between the talar neck and supporting platform were 7.3 and 8.4 mm, respectively in 12 cadaveric feet with intact plantar fascia and fasciotomy under a load of 690 N. Kitaoka et al.[ 7],[ 8] noted the high tensile loads required for failure of the plantar fascia in a biomechanical cadaver study. He also found the majority of failures, or ruptures, during testing occurred at the plantar fascia origin from the os calcis, the most common site of clinical plantar fascial rupture and symptoms of plantar fasciitis. Daly et al.[9] found evidence of flattening of the arch in 16 feet in 13 patients who underwent plantar fasciotomy for intractable plantar fasciitis. However, there is no report about the biomechanics of the foot bones in cadaveric experiment combined with finite element analysis in the situations of plantar ligaments injury. The objective of this study is to establish a detail FEM of a normal human adult foot and analyze the effect of plantar ligaments on the tarsal bones displacements and the foot arch deformation, including the plantar fascia, spring ligament complex, short plantar ligament and long plantar ligament.
2 Methods The geometry of the left foot was acquired from a 24 year male without any foot pathology by computed tomography scan. The contours of the bone and soft tissue were determined by an automatic contouring program, and used to generate the solid models by a CAD program (AutoCADR14.0). The foot bones were created with 4-node tetrahedral elements and analyzed using a CAE program (ANSYS9.0). The articulations and ligament structures of the foot were created with LINK10 and Link12 model respectively, and Shell93 model was used to construct the plantar soft tissues. The material properties were assumed to be linear elastic. A detail left foot model was built, which consisted of 170426 nodes, including articular cartilages, ligaments, and plantar soft tissue. Static loading of 700N was employed axially through distal tibia to simulate the one foot standing with the heel of the model fixed and the other part of the plantar soft tissue elements restrained in the vertical axis and free in the transverse plane. All nodes on the upper cross-sectional area of the distal tibia were restrained in the transverse plane but free in the coronal axis. Then a rigid plane under the foot plantar soft tissues was established to simulate the ground. The reaction of each bone segment of the foot arch was recorded and analyzed. Seven fresh adult cadaveric feet (the 1/3 part of the shranks were attached) were tested in the cadaveric experiments. The skin and muscles above the ankle joint were detached while kept the ligaments of the ankle intact (Fig. 1). The four major bone segments and stabilizer of the foot arch (plantar fascia, spring ligament, long and short plantar ligament) were marked and identified before all experiments. 700N axial loading from the proximal tibia was applied by MTS in the gradient of 100N. Simulation of the ligaments injury was undertaken by ligaments release in different combination and sequence. The displacements of the major bones were collected in
Effect of the Plantar Ligaments Injury on the Longitudinal Arch Height
113
gray level images by two digital cameras and recorded into personal computer. The displacements of the bone segments were calculated by Digital Speckle Correlated Methods and compared with the FEM results for verification. The FEM angular displacement of the major tarsal bones were also analyzed based on the lines passed through the middle points of the articulations, such as the subtalar joint, talonavicular joint, chopart joint, Lisfranc joint, and metatarsophalangeal joints.
Fig. 1. Cadaveric feet were loaded with MTS to 700N axialy. The displacement of the marked bone segments were recorded by digital two cameras.
3 Results All the bone segments marked moved downward in the sagittal plane under axial load in intact situation.The calcaneus showed plantarflexion and the other tarsal and metatarsal bones appeared dorsiflexion, which lead to the longitudinal and transversal arch of the model flatten. The finite element model and the cadaveric feet showed the same tendancy when 700N load was applied to the distal part of the tibial axialy in intact situation except the calcaneus and the fifth metatarsal. When all the four plantar ligaments were sectioned, all the bone segments appeared to displace in all three global planes, which was dorsiflexion in the saggital plane, abduction in the transverce plane and external rotation in the coronal plane. The FEM and the cadaveric experiment showed the same tendancy(Fig.2, Fig.3). The rotation changes between intact, plantar fascia realesed and all four major ligaments released of the finite element model were showed in the Figures (Fig.6- Fig.8).
114
Y. Yang et al.
Fig. 2. The vertical displacement of the finite element model(FEM) under 700N loading as the intact(1) and all four major ligaments released(3)
Fig. 3. The vertical displacement of the cadaveric foot in sagittal and coronal planes captured by two digital cameras under 700N loading before (upper two images) and after (images below) the four major ligaments released
Effect of the Plantar Ligaments Injury on the Longitudinal Arch Height
115
The vertical displacement of the bone segments under 700N axial loading of the FEM and CSE 30 ) m 25 m ( t 20 n e m 15 e c l a 10 p s i 5 d
CSE FEM
0 Ca
Na
Mc
M1
M2
M3
M4
M5
Fig. 4. The vertical displacement of the bone segments under 700N axial loading in the intact finite element model (FEM) and cadaveric specimen experiments CSE), showed the same tendancy except the calcaneous and the fifth metatarsal
The vertical displacements of the bone segments of the foot under 700N in the FEM 30 )m m( tn 25 em ec 20 al ps 15 id la 10 ci tr 5 ev 0
intact AR
Ca Ta
Na
Cu Mc
Ic Lc
M1 M2
M3
M4 M5
Fig. 5. The displacements of the bone segments of the finite element foot model under 700N in intact and four plantar ligaments released(AR)
The bone segments rotated in the three planes and showed the same tendancy except the calcaneous and the fifth metartasal in the coronal plane. Following the flatten of the longitudinal arch, the foot bones showed dorsiflexion in the sagittal plane, abduction in
116
Y. Yang et al.
The angle in sagittal plane of the bone segments under 700N axial loading ) 25 e e r 20 g e d ( 15 n o i x 10 e l f i 5 s r o d 0
intact AR
Ca
Na
Mc
M1
M2
M3
M4
M5
Fig. 6. Rotations of the bone segments in the sagittal plane in the intact and four major plantar ligaments released
The angle in transverce plane of the bone segments under 700N axial loading 20 ) e e r 15 g e d ( n 10 o i t c u 5 d b a 0
intact AR
Ca
Na
Mc
M1
M2
M3
M4
M5
Fig. 7. Rotations of the bone segments in the transverce plane in the intact and four major plantar ligaments released
the transversal plane and external rotation in the coronal plane. The degrees of the rotation changed when the plantar ligaments were released and reach to the top as the four major ligaments were all sectioned. Supposing the last condition to be 100%, the contribution of the plantar fascia to the stability of the foot arch was calculated. They were 34.53%, 22.46% and 12.63% in sagittal, transversal and coronal plane respectively.
Effect of the Plantar Ligaments Injury on the Longitudinal Arch Height
117
The angle in coronal plane of the bone segments under 700N axial loading ) e e 4.5 r g 4 e d 3.5 ( n 3 o i t 2.5 a t 2 o r 1.5 l 1 a n 0.5 r e t 0 x e
intact AR
Ca
Na
Mc
M1
M2
M3
M4
M5
Fig. 8. Rotations of the bone segments in the coronal plane in the intact and four major plantar ligaments released
4 Discussion Computer models have several features that make them more appealing than other types of models[10]. An infinite number of computer models can be developed and tested for different conditions and it`s primary characteristic is maintained after exhaustive testing. Also, computer models provide information that cannot be easily obtained using other types of models, such as load distributions within soft tissues, internal stresses, joint reaction forces, and muscle force analysing[11]. Once an accurate computer model has been developed and validated appropriately, simulations can be performed quickly and at low cost in diverse situations, such as injury, surgery, dynamic motion simulation and graphical animation of the experimentation and help to enhance the understanding of the underlying mechanisms. The capability of the FE model to predict the internal stress within the bony and soft tissue structures makes it a valuable tool to enrich the knowledge of ankle-foot biomechanics. Computational analysis of the foot biomechanics has its advantage in providing an overall stress distribution of the foot. It is also more economical than in vitro cadaver experiments. In view of the previously existed computational models, only a detailed representation of the foot geometry and joint characteristics together with realistic loading conditions can depict the internal stress and strain distributions of the foot complex. Many authors have used FEM to quantify the biomechanical role of plantar fascia in load bearing and found the vertical displacement of the foot increased with fasciotomy[12,][13]. In the literature, 3D geometrical detailed FE models have been developed[14],[15],[16],[17],[18],[19], but were not employed to quantify the biomechanical role of plantar fascia to the tarsal and metatarsal bone simultaneously and did not aim at the one foot standing posture, which is important to support the body weight in walking. Based on the anatomy and computer software, a detail finite element model of a normal adult left foot was established, including the bone segments, articulations,
118
Y. Yang et al.
foot intrinsic ligaments and plantar soft tissue. The elements used to construct the foot joints, ligaments and plantar tissue were distinct from the literatures, which could not be used to analyze significant displacement of the foot arch. The section of the plantar fascia is used to treat some foot problems, such as heel pain, poliomyelitis and congenital equinovarus[5]. But later outcome were still unknown. Some patients suffered a flatten foot and occurrence of lateral pain of the mid-foot. The current study simulated the contribution of the plantar fascia in the supportive function to the longitudinal foot arch, which comprised one third of the function of plantar ligaments in sagittal plane and helped to restrict abduction and external rotation of the bone structure. The plantar fasciotomy should be the last option in clinical therapy following the conservative methods. To simplify the FE analysis, homogeneous and linearly elastic material properties were assigned to the bony and ligamentous structures and the ligaments within the toes and other connective tissues such as the joint capsules were not considered. The current FE model did not account for the surface interactions between bony, ligamentous and muscles structures. The structural simplification of the FE model would result in a reduction of joint stability of the foot arch structures and an increase in predictions of joint and arch deformation. Because of the use of linear truss elements to approximate the nonlinear profile of the plantar fascia structure, assumption of linear material property and the neglection of structural interface between the plantar fascia and surrounding tissue, the predicted plantar fascia strain in this study was likely underestimated.
5 Conclusions All the four plantar ligaments play an important role in stabilizing the normal foot arch, especially the plantar fascia. The medial longitudinal foot arch collapses and elongate significantly flowing fore foot abduction and hind foot valgus following the four plantar ligaments released without the function of the tendons and extrinsic stabilizer. The current study proposed a validated three-dimensional foot model which can be modified to simulate other foot conditions in the future. This foot model can be useful in observing stress distributions inside the foot, designing footwear, investigating the biomechanical behavior of the foot subjected to different damages and operation design.
References 1. Sarrafian, S.K.: Functional characteristics of the foot and plantar aponeurosis under tibiotalar loading. Foot Ankle 8, 4–18 (1987) 2. Murphy, A.: Pneumaticos. Biomechanical Consequences of Sequential Plantar Fascia Release. Foot Ankle Int. 19, 149–152 (1998) 3. Chuter, V., Payne, C.: Limited joint mobility and plantar fascia function in Charcot’s neuroarthropathy. Diabet Med. 18, 558–561 (2001) 4. Snider, M.P., Clancy, W.G., McBeath, A.A.: Plantar fascia release for chronic plantar fasciitis in runners. Am. J. Sports Med. 11, 215–219 (1983)
Effect of the Plantar Ligaments Injury on the Longitudinal Arch Height
119
5. Taniguchi, A., Tanaka, Y., Takakura, Y., Kadono, K., Maeda, M., Yamamoto, H.: Anatomy of the spring ligament. J. Bone Joint Surg. Am. 85, 2174–2178 (2003) 6. Huang, C.K., Kitaoka, H.B., An, K.N., et al.: Biomechanical evaluation of longitudinal arch stability. Foot Ankle 14, 353–357 (1993) 7. Kitaoka, H.B., Luo, Z.P., An, K.N.: Analysis of longitudinal arch supports in stabilizing the arch of the foot. Clin. Orthop. Relat. Res. 250–256 (1997) 8. Kitaoka, H.B., Luo, Z.P., An, K.N.: Reconstruction operations for acquired flatfoot: biomechanical evaluation. Foot Ankle Int. 19, 203–207 (1998) 9. Daly, P.J., Kitaoka, H.B., Chao, E.Y.: Plantar fasciotomy for intractable plantar fasciitis: clinical results and biomechanical evaluation. Foot Ankle 13, 188–195 (1992) 10. Berkelmans, W.A.M., Poort, H.W., Slooff, T.J.J.H.: A new method to analysis the mechanical behavior of skeletal parts. ACTA. Orthop. Scand. 34, 301–317 (1972) 11. Salathe Jr., E.P., Arangio, G.A., Salathe, E.P.: A biomechanical model of the foot. J Biomech. 19, 989–1001 (1986) 12. Arangio, G.A., Reinert, K.L., Salathe, E.P.: A biomechanical model of the effect of subtalar arthroereisis on the adult flexible flat foot. Clin. Biomech. (Bristol, Avon) 19, 847–852 (2004) 13. Gefen, A.: Stress analysis of the standing foot following surgical plantar fascia release. J. Biomech. 35, 629–637 (2002) 14. Camacho, D.L., Ledoux, W.R., Rohr, E.S., et al.: A three-dimensional, anatomically detailed foot model: a foundation for a finite element simulation and means of quantifying foot-bone position. J. Rehabil. Res. Dev. 39, 401–410 (2002) 15. Jacob, S., Patil, M.K.: Three-dimensional foot modeling and analysis of stresses in normal and early stage Hansen’s disease with muscle paralysis. J. Rehabil. Res. Dev. 36, 252–263 (1999) 16. Gefen, A., Megido-Ravid, M., Itzchak, Y., et al.: Biomechanical analysis of the three-dimensional foot structure during gait: a basic tool for clinical applications. J. Biomech. Eng. 122, 630–639 (2000) 17. Chu, T.M., Reddy, N.P., Padovan, J.: Three-dimensional finite element stress analysis of the polypropylene, ankle-foot orthosis: static analysis. Med. Eng. Phys. 17, 372–379 (1995) 18. Patil, K.M., Braak, L.H., Huson, A.: Analysis of stresses in two-dimensional models of normal and neuropathic feet. Med. Biol. Eng. Comput. 34, 280–284 (1996) 19. Cheung, J.T., Zhang, M., Leung, A.K., et al.: Three-dimensional finite element analysis of the foot during standing–a material sensitivity study. J. Biomech. 38, 1045–1054 (2005)
Internet Living Broadcast of Medical Video Stream Shejiao Li1 , Bo Li1 , and Fan Zhang1,2 1
2
College of Computer & Information Engineering, Henan University, Kaifeng 475001, P.R. China
[email protected] College of Electronic and Information Engineering, Tianjin University, Tianjin 300072, P.R. China
Abstract. DirectShow based network living broadcast scheme of MPEG-4 medical video stream is presented. In this scheme, Digital Subtraction Angiography (DSA) video stream are captured by an advanced image capture board. MPEG-4 and DirectShow framework technique are used for data’s compression and transmission. Data of DSA video stream can be transported easily from server sender Filter to client receiver Filter according to TCP and UDP protocols. Experimental results show that the image is clear in 100M LAN, and the delay is less than one second. High performances of definition and real-time of the DSA video are both achieved in this scheme.
1
Introduction
Telemedicine is medical care and health supporting practices based on patient information derived from images transmitted from a remote site. It may be as simple as two health professionals discussing a case over the telephone, or as sophisticated as using satellite technology to broadcast a consultation between providers at facilities in two countries, using video conference equipment or robotic technology. Telemedicine can be practiced in many ways and each has challenges and conditions that must be considered. Telemedicine can vary depending on either the facilities using telemedicine, or the type of data transmitted. In telemedicine, surgery digital living broadcast and discussion is one of the most important application [1]. Videoconferencing via Internet is a viable method for transmitting information in real time allowing surgeons worldwide to work together during surgical procedures. The broadcasting computer station was also able to receive real-time video and sound from the distant computers, allowing complete interaction between both parties over the duration of each transmission session [2],[3]. In this case, transmission of patient related information is necessary. Video and other data from the output of medical equipment, such as DSA, should be stored and transmitted for the living broadcast via network. In the main digital system of the DSA machine, the images are digitized and archived incorporating the Digital Imaging and Communications in Medicine K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 120–128, 2007. c Springer-Verlag Berlin Heidelberg 2007
Internet Living Broadcast of Medical Video Stream
121
(DICOM) format [4]. A single DICOM file contains both a header (which stores information about the patient’s name, the type of scan, image dimensions, etc), as well as all of the image data which can contain information in three dimensions. This is different from the popular Analyze format, which stores the image data in one file and the header data in another file. DICOM image data can be compressed to reduce the image size. Files can be compressed using lossy or lossless variants of the JPEG format, as well as a lossless Run-Length Encoding format which is identical to the packed-bits compression found in some TIFF format images. Usually the size of DSA image is 10241024. DSA images must be compressed to reduce the bandwidth requirement and allow cost-effective delivery [5],[6]. In this scheme, MPEG-4 is adopted. The DSA video is gained from TV terminal of medical equipment. To capture the video stream, an advance capture board and DirectShow technology are used. The video stream are transported to client using TCP and UDP protocols. Then client decode the compressed data stream and display it. The video image can be processed by the image processing Filter, such as edge detecting, sharping, noise reduction etc..
2
Digital Subtraction Angiography
Blood vessels often absorb as much radiation as the surrounding tissues and therefore cannot be discerned on X-ray images. By injecting contrast agents into blood vessels, these vessels can be visualized: the contrast medium contains iodine, which absorbs x-rays. Due to the presence of bone, the small contrast differences caused by the contrast medium in the blood vessels are difficult to distinguish in the presence of bone structures, since the eye is not able to detect contrast differences less than about 3%. It is necessary to use a procedure that enhances contrast. Over the past two decades, DSA has become a well-established modality for the visualization of blood vessels in the human body [7]. With this technique, a sequence of 2D digital X-ray projection images is acquired to show the passage of a bolus of injected contrast material through the vessels of interest. In the images that show opacified vessels (often referred to as contrast images or live images), background structures are largely removed by subtracting an image acquired injection (usually called the mask image). Sample DSA Images are shown in Fig. 1. A digital mask image is made prior to the injection of a contrast agent and stored in the computer. Then a contrast medium is injected into a vein or artery. When the bolus arrives a sequence of images is obtained. The pre-contrast mask image is now subtracted from these subsequent live images (with contrast). The resulting images will only contain information that was not present in the mask, that is: information about the location of the contrast medium. The resulting images will only show the vessels; other anatomical details that disturb the image have been removed by the subtraction procedure. In reality things are somewhat more difficult due to the introduction of artifacts, such as motion of the anatomy.
122
S. Li, B. Li, and F. Zhang
Fig. 1. DSA images
3
Video Encoding and Decoding
The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital video technology a necessity. A problem, however, is that still image and digital video data rates are very large. For example, if we have 640 × 480 image, 3 bytes per pixel (R,G,B) and 30 frames/second, then for 30-minutes stream, it would need 46.3 GB and bandwidth would require 26.4 MB/s. So the compression of video data is needed. For this reason, Video compression standards have been developed to eliminate redundancy, allowing video information to be transmitted and stored in a compact and efficient manner. MPEG (Motion Picture Experts Group) video compression is used in many current and emerging products. It is at the heart of digital television set-top boxes, DSS, HDTV decoders, DVD players, video conference, Internet video, and other applications [8]. MPEG-4 compression algorithms were developed to address the need for higher quality pictures and increased system flexibility, which are required by multi-media systems. Since it was developed later, MPEG-4 was able to leverage the efforts behind the development of both the JPEG and H.261 algorithms. As with H.261, only the YUV color component separation with the 4:2:0 sampling ratio is allowed by the MPEG-4 standard. Unlike H.261, the frame size is not fixed although a 352240 frame size is typically used. MPEG-4 adopted the macro block of H.261 (4 Y blocks, 1 U block, and 1 V block) as the basic unit for compression. To compress each macro block, the MPEG-4 standard allows the compressor to select from several compression options.
4
DirectShow Framework
Microsoft DirectShow is an architecture for streaming media on the Microsoft Windows platform. DirectShow provides for high-quality capture and playback
Internet Living Broadcast of Medical Video Stream
123
of multimedia streams [9],[10]. It supports a wide variety of formats, including Advanced Streaming Format (ASF), Motion Picture Experts Group (MPEG), Audio-Video Interleaved (AVI), MPEG Audio Layer-3 (MP3), and WAV sound files. It supports capture from digital and analog devices based on the Windows Driver Model (WDM) or Video for Windows. DirectShow is integrated with other DirectX technologies. DirectShow is designed to address each of these challenges. Its main design goal is to simplify the task of creating multimedia applications on the Windows platform by isolating applications from the complexities of data transports, hardware differences, and synchronization issues. To achieve the throughput necessary for streaming video and audio, DirectShow uses Microsoft DirectDraw and Microsoft DirectSound to render data efficiently to the system’s sound and graphics cards. Synchronization is achieved by encapsulating the multimedia data in time-stamped media samples. To handle the variety of sources, formats, and hardware devices, DirectShow uses a modular architecture in which operating system components called Filters can be mixed and matched to provide support for many different scenarios. DirectShow includes Filters that support codes written for the Audio Compression Manager (ACM) and Video Compression Manager (VCM) interfaces. DirectShow enables applications to play files and streams from various sources, including local files and remote files on a network. DirectShow has native compressors and de-compressors for some file formats, and many third-party hardware and software decoders are compatible with DirectShow. In addition, DirectShow supports legacy VFW codes based on the Video Compression Manager (VCM) and Audio Compression Manager (ACM) interfaces.
5
DirectShow Based Video Stream Transmission
Capture
Encoder
NetSender
VideoRender
Decoder
NetReceiver
Internet
The basic principle of video stream transmission based on DirectShow is shown in Fig. 2.
Fig. 2. The basic principle of video stream transmission based on DirectShow
Video stream are captured by Capture Source Filter in sever, and transmitted to MPEG-4 Encoder Filter where the video stream are compressed. The coded video stream is transmitted to Netsender Filter and send to Internet by UDP protocol or send to the multicast group by multicast.In client, the Receiver Source Filter receives data from Internet , deliver it to MPEG-4 Decoder Filter where the video stream are decompressed. Then the decoded data are send to Video Render Filter.
124
5.1
S. Li, B. Li, and F. Zhang
CnetSender Filter
The inheritance relationship of CNetSender Filter is: CBaseFilter
CNetSender
Implement method is as follows: (1) Implement the pure virtual function CBaseFilter :: GetPin, which retrieves object’s pointer of filter pins. In this case, it retrieves object’s pointer of the CnetSenderPin. (2) Implement the pure virtual function CbaseFilter :: GetPinCount. , which retrieves the number of filter pins. CnetSenderPin is a pin class of CnetSender Filter. The inheritance relationship of CnetSenderPin is: CRenderedInputPin
CNetSenderPin
Implement method is as follows: (1) Implement the pure virtual function CrenderedInputPin :: Receive (ImediaSample ∗ pMediaSample). This function is used to receive one frame data, pMediaSample is the pointer of this data. (2) Implement the pure virtual function CrenderedInputPin :: CheckMediaType (const CmediaType ∗ inMediaType). This function is used to check the type of connection media, inMediaType is the pointer of media type. 5.2
CNetReceiver Filter
The inheritance relationship of CNetReceiver Filter is: CBaseFilter
CNetReceiver
Implement method is as follows: (1) Implement the pure virtual function Csource :: GetPin, which retrieves object’s pointer of filter pins. In this case, it retrieves object’s pointer of the CNetReceiverPin.
Internet Living Broadcast of Medical Video Stream
125
CBaseOutputPin
CSourceStream CNetSenderPin
(2) Implement the pure virtual function Csource :: GetPinCount. , which retrieves the number of filter pins. CnetSenderPin is a pin class of CnetReceiver Filter. The inheritance relationship of CnetReceiverPin is: Implement method is as follows: (1) Implement the pure virtual function CBaseOutputPin :: DecideBufferSize (ImemAllocator ∗ pAlloc, ALLOCATOR-PROPERTIES ∗ pprop). This function is used to set the maximum length of one frame data to prop-¿cbBuffer. (2) Implement the pure virtual function CSourceStream :: CheckMediaType (const CMediaType ∗ inMediaType). This function is used to the type of connection media, inMediaType is the pointer of the media type.
6 6.1
Video Stream Transmission Via Internet Internet Transmission
(1) Media transmission via Internet In DirectShow, the Filters connects successfully, and the data receive correctly only when the media types are consistence,. So the client must receives the server’s media type. Because the TCP is dependable connection, it is adopt to send media type. A TCP socket thread is constructed in server. As long as detecting the client trying to connect, the server send media type. (2) Video stream transmission via UDP multicast Due to the large number of video stream data and many clients, the UDP multicast is a good way to transmit video stream. In this way, the less bandwidth are used while many clients are connected.The video stream are implemented in Receive function of CnetSenderPin. 6.2
Receiving Data from Internet
Internet data transmission scheme is as follows: (1) After the connection by TCP, client can receive media type immediately and create CnetReceiverPin instance according to the media type. CMediaType ∗ pmt pmt.SetType(& MEDIATYPE-Video); //set the Majortype pmt.SetFormatType(& FORMAT-VideoInfo); //set the media format type // inFormat is the media type data received from Internet VIDEOINFOHEADER ∗ pvi = (VIDEOINFOHEADER ∗) inFormat;
126
S. Li, B. Li, and F. Zhang
const GUID subtype = GetBitmapSubtype(& pvi − > bmiHeader); pmt.SetSubtype(& subtype); //set subtype pmt.SetFormat((BYTE∗)inFormat, inLength); //set video format type (2) After adding in the multicast group, a double-buffer queue is constructed. The queue constitutes by two single chained list, writing data queue and reading data queue:
PData
PData
PData
PData
Head
PData
PData Tail
PData
PData
PData
PData
After finished a frame data in write chain, this data block is delete from head of writing queue and send it to the tail of reading queue. After run out of this data block in reading chain, it is deleted from reading queue and send it to the tail of writing queue. 6.3
Video Stream Render
When the Filters are connected successfully, FilterGraph display the received Video stream under the control of Filter Graph Manager. IMediaControl ∗ pMC = NULL; IMediaEventEx ∗ pME = NULL; // media control interface pGraph − > Queryinterface ( IID-ImediaControl, (void ∗ ∗ ) & pMC); //media event interface pGraph − > Queryinterface ( IID-ImediaEventEx, (void ∗ ∗ ) & pME); pMC − > Run (); //run Filter chain
7
Experimental Result
Experiment is implemented in Huaihe Hospital of Henan University. The specification of digital subtraction angiography equipment of Huaihe Hospital is GE LCV+. Digital subtraction angiography video stream are captured by an advanced image capture board. The specification of image capture board is OKM30A. MPEG-4 and DirectShow framework technique are used for data’s compression and transmission. Data of DSA video stream are transported easily from server sender Filter to client receiver Filter according to TCP and UDP protocols. Table 1 shows the experimental environment and equipments of this scheme.
Internet Living Broadcast of Medical Video Stream
127
Experimental results show that the image is clear in 100M LAN, and the delay is less than one second. High performances of definition and real-time of the DSA video are both achieved in this scheme. Memory usage in server is 10M. Memory usage is 6M and 21M in Client(render model VMR) and Client(render model Normal) respectively. The frame frequency is always 29 frames/second. Experimental results under above environment are shown as Table 2. Table 1. Experimental environment Equipments Specification DSA equipment GE LCV+ Image capture board OK-M30A,1024 × 1024, 10bit MPEG-4 Codec/Decodecer Xvid Network 100M Lan Server CPU PV 2.0G, 512M memory Client CPU XP 1700+, 512M memory, 32M video card Table 2. Experimental results
Server Client(render model VMR) Client(render model Normal)
Memory usage Frames/second 10M 29 6M 29 21M 29
VMR (Video Mixing Render) is used firstly to render in client according to the data type. In addition, if the CPU occupancy rate is high in client, the packet dropping rate of data received from Internet will rise, and will result the Mosaic when the video render.If close some other programs, quality of video is all right.
8
Conclusion
In this scheme, Digital Subtraction Angiography (DSA) video stream are captured by an advanced image capture board. MPEG-4, and DirectShow framework technique are used for data’s compression and transmission. Data of DSA video stream can be transported easily from server sender Filter to client receiver Filter according to TCP and UDP protocols. Experimental results show that the image is clear in 100M LAN, and the delay is less than one second. High performances of definition and real-time of the DSA video are both achieved in this scheme. And by the image processing function, we can also process images received from Internet. Experimental results show that the living broadcast of interventional operation/surgery via Internet is a feasible scheme.
128
S. Li, B. Li, and F. Zhang
References 1. Perednia, D.A., Allen, A.: Telemedicine technology and clinical applications. Journal of the American Medical Association 273(6), 483–488 (1995) 2. Gandsas, A., Altrudi, R., Pleatman, M., Silva, Y.: Live interactive broadcast of laparoscopic surgery via the Internet. Current Surgery 60(2), 126–129 (2003) 3. Tao, Y., Miao, J.: Workstation scheme and implementation for a medical imaging information system. Chinese Medical Journal 116(5), 654–657 (2003) 4. Huang, Z., Zhuang, T.: Evolution of DICOM Standard and its Latest Changes. Chinese Journal of Medical Instrumentation 28(3), 203–207 (2004) 5. Jose, R., Pablo, G., Miguel, S.: A Compression and Transmission Scheme of Computer Tomography Images for Telemedicine Based on JPEG2000. Telemedicine Journal and e-Health 10, 40–44 (2004) 6. Ramakrishnan, B., Sriraam, N.: Internet transmission of DICOM images with effective low bandwidth utilization. Digital Signal Processing 16(6), 825–831 (2006) 7. Brody, W.: Digital Subtraction Angiography. IEEE Transactions on Nuclear Science 29(3), 1176–1180 (1982) 8. Prasad, R., Ramkishor, K.: Implementation of MPEG-4 Video Encoder on RISC Core. IEEE Transactions on Consumer Electronics 49(2), 1–6 (2003) 9. Dasu, A., Panchanathan, S.: A Survey of Media Processing Approaches. IEEE Transactions on Circuit and System for Video Technology 12(8), 1–13 (2002) 10. Lu, Q.: DirectShow Development Guidebook. Tsinghua University Press, Beijing (2003)
Predicting Syndrome by NEI Specifications: A Comparison of Five Data Mining Algorithms in Coronary Heart Disease Jianxin Chen1 , Guangcheng Xi1 , Yanwei Xing2 , Jing Chen1 , and Jie Wang2 1
2
Key Laboratory of Complex Systems and Intelligence Science Institute of Automation, Chinese Academy of Sciences 100080, Beijing, China {jianxin.chen,guangcheng.xi}@ia.ac.cn GuangAnMen Hospital, Chinese Academy of Chinese Medical Science 100053, Beijing, China
Abstract. Nowadays, most Chinese take a way of integration of TCM and western medicine to heal CHD. However, the relation between them is rarely studied. In this paper, we carry out a clinical epidemiology to collect 102 cases, each of which is a CHD instance confirmed by Coronary Artery Angiography. Moreover, each case is diagnosed by TCM experts as what syndrome and the corresponding nine NEI specifications are measured.We want to explore whether there exist relation between syndrome and NEI specifications. Therefore, we employ five distinct kinds of data mining algorithms: Bayesian model; Neural Network; Support vector machine ,Decision trees and logistic regression to perform prediction task and compare their performances. The results indicated that SVM is the best identifier with 90.5% accuracy on the holdout samples. The next is neural network with 88.9% accuracy, higher than Bayesian model with 82.2% counterpart. The decision tree is less worst,77.9%, logistic regression models performs the worst, only 73.9%. We concluded that there do exist relation between syndrome and western medicine and SVM is the best model for predicting syndrome by NEI specifications.
1
Introduction
Coronary heart disease (CHD)is a serious disease causing more than 1 million Chinese to death each year. [1].In China, most people take a way of integration of TCM and western medicine to heal CHD.The following is a brief introduction to TCM. 1.1
TCM
TCM has been always regarded as a key component in five thousand years of Chinese civilization history. It has a history of more than 3000 years, while 1000 years are spending on healing CHD, so it piles up extensive experience. TCM, K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 129–135, 2007. c Springer-Verlag Berlin Heidelberg 2007
130
J. Chen et al.
whose core is syndrome, or ‘Zheng’ in Chinese, is on her way to modernization, aiming to be accepted, like western medicine, as sciences [2],[3]. The kernel of TCM is syndrome. Every herbal is prescribed in accord with syndromes. However, till now, the relation between syndrome and physical and chemical specifications of western medicine is rarely explored. Furthermore, during animal experiments in laboratories, disliking human, animals can not be felt pulse, so determination of syndrome in animals is significantly difficult. However, the bloods of human and animals are more easily to obtained, therefore, we can explore the relation between syndrome and some blood specifications. Here, we choose Neuro-Endocrine-Immune(NEI) specifications. The following is the backgroud of NEI. 1.2
NEI System
In modern Western medicine (WM), NEI system acts as a pivot in modulating host homeostasis and naturally optimising health through complex communications among chemical messengers (CMs), including hormones, cytokines and neurotransmitters [4]. If we consider CMs as the biochemical ingredients of the NEI system, then those genes that (directly or indirectly) encode these CMs can be considered as the genic ingredients of the NEI system. Here, our goal is using the information of NEI specifications to predicting whether a patient is a specific syndrome. We employ five kinds of classical data mining methods to perfom the task and we compare the methods to search the best one. Therefore, the problem is classification problem. 1.3
Data Mining Algorithms
Under the background of supervised classification problem, data mining algorithms mainly comprise of five broadly used kinds: Bayesian method, neural network, support vector machine, decision trees and logistic regression. Each kind is developed quickly and usually combines with each other to solve some hard problems [5]. Bayesian network (BN) is chosen from Bayesian method to perform classification here. Radial Basis Function Network is selected from neural networks for its higher performance in doing classification than other algorithm, such as recurrent neural network. We used a well-known support vector machine algorithm, Platt’s SMO Algorithm [6], as a representative of SVM classification since it can process both categorical and numerical variables. For decision tree kind, Quinlan’s C4.5 algorithm [7] is employed for performing tasks here. Logistic regression is a classical method and it is used here to perform the task in this paper.
2
Biological Variables and Clinical Data Collection
The 102 patients included in the survey are incoming patients of AnZhen Hosptial in Beijing from June 2005 to April 2006. Each patient is diagnosed by western
Predicting Syndrome by NEI Specifications
131
medicine experts as CHD by Coronary Artery Angiography, meanwhile, each patient is diagonosed by experts as what syndrome.Blood samples of all patients were collected after a 12-hour overnight fast before cardiovascular procedures. Totally, 9 NEI specifications are measured from the blood samples. There are Mpo1, Wbc1, Tni2, Tnf, Et, Il6, Il8, No and Hicrp. Additionally, the basic information of each patients,such as names,ages and so on are also recorded, but this part of data is not included in the process of data mining.
3
Prediction Models
New data always needs already existed algorithms to test its performance. We employed four types of classification algorithms: Bayesian model, neural networks, SVM and decision trees. These models were jigged for inclusion in this research due to their popularity in the recently published documents. The following is a brief introduction to the four classification algorithms and the parameter setting of each model. 3.1
Bayesian Network
A Bayesian network (BN) is a graphical model that encodes probabilistic relationships among attributes of interest. Several advances have been made to ameliorate Bayesian network to fit all kinds of realistic problems [8]. We select stimulated annealing as method for searching network structures. Estimator is BayesNetEstimator, which is the base class for estimating the conditional probability tables of a Bayesian network once the structure has been learned. 3.2
Radial Basis Function Network
As shown in Fig. 1, RBF network has two layers, not counting the input layer, and differs from a multilayer perceptron in the way that the hidden units perform computations.Each hidden unit essentially represents a particular point in input space, and its output, or activation, for a given instance depends on the distance between its point and the instance-which is just another point. Intuitively, the closer these two points, the stronger the activation. This is achieved by using a nonlinear transformation function to convert the distance into a similarity measure. A bell-shaped Gaussian activation function, whose width may be different for each hidden unit, is commonly used for this purpose. The hidden units are called RBFs because the points in instance space for which a given hidden unit produces the same activation form a hypersphere or hyperellipsoid. (In a multilayer perceptron, this is a hyperplane.) 3.3
Support Vector Machine
The SVM is a state-of-the-art maximum margin classification algorithm rooted in statistical learning theory [11],[12]. SVM performs classification tasks by maximizing the margin separating both classes while minimizing the classification errors. We used sequential minimal optimization algorithm to train the SVM here, as shown in As shown in Fig. 2
132
J. Chen et al.
Fig. 1. The topology of MLP network
Fig. 2. A illustration of SVM
Predicting Syndrome by NEI Specifications
3.4
133
Decision Trees C4.5
As the name implies, this algorithm recursively separates observations in branches to construct a tree for the purpose of improving the prediction accuracy. In doing so, they use mathematical algorithms information gain to identify a variable and corresponding threshold for the variable that splits the input observation into two or more subgroups. This step is repeated at each leaf node until the complete tree is constructed. This step is repeated at each leaf node until the complete tree is constructed Confidence factor is set as 0.01. The minimum number of instances per leaf is 2. 3.5
Logistic Regression
Logistic regression is a generalization of linear regression [13]. It is used primarily for predicting binary or multi-class dependent variables. Because the response variable is discrete, it cannot be modeled directly by linear regression. Therefore, rather than predicting point estimate of the event itself, it builds the model to predict the odds of its occurrence. In a two-class problem, odds greater than50% would mean that the case is assigned to the class designated as 1 and 0 otherwise. While logistic regression is a very powerful modeling tool, it assumes that the response variable (the log odds, not the event itself) is linear in the coefficients of the predictor variables.
4 4.1
Performance Evaluation and Results Performance Measures
We employed three hackneyed performance measures: accuracy, sensitivity and specificity. A distinguished confusion matrix is obtained to calculate the three measures. Confusion matrix is a matrix representation of the classification results. the upper left cell denotes the number of samples classifies as true while they were true (i.e., TP), and lower right cell denotes the number of samples classified as false while they were actually false (i.e., TF). The other two cells (lower left cell and upper right cell) denote the number of samples misclassified. Specifically, the lower left cell denoting the number of samples classified as false while they actually were true (i.e., FN), and the upper right cell denoting the number of samples classified as true while they actually were false (i.e., FP).Once the confusion matrixes were constructed, the accuracy, sensitivity and specificity are easily calculated as: sensitivity = TP/(TP + FN); specificity = TN/(TN + FP). Accuracy = (TP + TN)/(TP + FP + TN + FN);10-fold cross validation is used here to minimize the bias produced by random sampling of the training and test data samples. Extensive tests on numerous data sets, with different learning strategies, have shown that 10 is about the right number of folds to get the best estimate of error, and there is also some theoretical evidence that backs this up [9],[10].
134
4.2
J. Chen et al.
Results
Every model was evaluated based on the three measures discussed above (classification accuracy, sensitivity and specificity). The results were achieved using average value of 10 fold cross-validation for each algorithm. As shown in Figure 3, we found that the Bayesian model (BN) achieved classification accuracy of 82.2% with a sensitivity of 81.1% and a specificity of 85.5%. The neural network achieved classification accuracy of 88.9% with a sensitivity of 88.9% and a specificity of 88.8%. The decision trees (C4.5) achieved a classification accuracy of 77.9% with a sensitivity of 76.5% and a specificity of 81.9%. The logistic regression achieved a classification accuracy of 73.9% with a sensitivity of 84.6% and a specificity of 70.3%. However, SVM preformed the best of the five models evaluated. It achieved a classification accuracy of 90.5% with a sensitivity of 92% and a specificity of 90%.
Fig. 3. The performance of five data mining algorithms as classification problem
5
Conclusion
In this paper, we employ five kinds of popular data mining models to perform classification task in identifying syndrome by NEI specifications in CHD. The data was recruited from clinics with whole 102 cases. We used 10-fold cross validation to compute confusion matrix of each model and then calculate the three performance measures-sensitivity, specificity and accuracy to evaluate five kinds
Predicting Syndrome by NEI Specifications
135
of models. We found that the Bayesian model (BN) achieved classification accuracy of82.2% with a sensitivity of 81.1% and a specificity of 85.5%. The neural network achieved classification accuracy of 88.9% with a sensitivity of 88.9% and a specificity of 88.8%. The decision trees (C4.5) achieved a classification accuracy of 77.9% with a sensitivity of 76.5% and a specificity of 81.9%. The logistic regression achieved a classification accuracy of 70.3% with a sensitivity of 84.6% and a specificity of 73.9%. However, SVM preformed the best of the five models evaluated. It achieved a classification accuracy of 90.5% with a sensitivity of 92% and a specificity of 90%. We concluded that syndrome does exist strong relation with NEI specifications and our results shown that SVM will provide a better insight to predicting syndrome by NEI specifications. Acknowledgments. The work has been supported by 973 Program under grant No. (2003CB517106 and 2003CB517103) and NSFC Projects under Grant No. 60621001, China.
References 1. World Health Organization.: World Health statistics Annual. Geneva, Switzerland, World Health Organization (2006) 2. Normile, D.: The new face of Traditional Chinese Medicine. Science 299, 188–190 (2003) 3. Xue, T.H., Roy, R.: Studying Traditional Chinese Medicine. Science 300, 740–741 (2003) 4. Roth, J., LeRoith, D., et al.: The evolutionary origins of hormones, neurotransmitters, and other extracellular chemical messengers: implications for mammalian biology. The New England Journal of Medicine 306, 523–527 (2006) 5. Brudzewski, K., Osowski, S., Markiewicz, T.: Classification of milk by means of an electronic nose and SVM neural network. Sensors and Actuators B 98, 291–298 (2004) 6. Keerth, S., Shevade, K., et al.: Improvements to Platt’s SMO Algorithm for SVM classifier design. Neural Computation 13, 637–649 (2001) 7. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993) 8. Huang, C.L., Shih, H.C., Chao, C.Y.: Semantic Analysis of Soccer Video Using Dynamic Bayesian Network. IEEE Transactions on Multimedia 8, 749–760 (2006) 9. Witten, I.H., FrankMichalewicz, E.Z.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) 10. Delen, D., Walker, G., Kadam, A.: Predicting breast cancer survivability. Artif. Intell. Med. 34, 113–127 (2005) 11. Vapnik, K.: Statistical learning theory. Wiley, New York (1998) 12. Graf, A., Wichmann, F., Bulthoff, H., et al.: Classification of faces in man and machine. Neural Computation 18, 143–165 (2006) 13. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, New York (2001)
Application of Image Processing and Finite Element Analysis in Bionic Scaffolds’ Design Optimizing and Fabrication Liulan Lin, Huicun Zhang, Yuan Yao, Aili Tong, Qingxi Hu, and Minglun Fang Rapid Manufacturing Engineering Center, Shanghai University, P.O. Box 113, 99 Shangda Road, Shanghai, 200444, China
[email protected]
Abstract. Design optimizing is the key step in obtaining bionic scaffolds with proper shape and inner microstructure, which are two critical parameters for bionic scaffolds in Tissue Engineering. In this paper, the application of image processing and finite element analysis in the design optimizing of bionic scaffold’s shape and inner microstructure were studied respectively. The bionic scaffold’s shape was obtained through Mimics’ image processing and 3D reconstruction technologies. Finite element analysis (FEA) was used in evaluating the mechanical properties of scaffold’s structure models with different macropores shape and distribution to obtain the optimized parameters. Three groups of bioceramic scaffolds samples were fabricated through an indirect method combining stereolithography (SLA) and slurry casting, and then mechanical experiments were tested. The change trendy of the compressive strength obtained through mechanical experiments was consistent with the FEA results basically so the significance of FEA in bionic scaffolds’ design optimizing was proved.
1 Introduction In Tissue Engineering, temporary 3D bionic scaffolds are essential to guide cell proliferation and to maintain native phenotypes in regenerating biologic tissues or organs [1]. The shape and inner microstructure are the two critical properties of bionic scaffolds for repairing defective bone. Bionic scaffolds should have the same shape as restoration for repairing the defective bone, so the scaffolds could be placed well in body and guide the neonatal bone’s growth correctly. To satisfy tissue engineering’s requirement, bionic bone scaffolds must have exact shape with the defects, polygradient porous configuration with characteristics and properties such as porosity, surface area to volume ratio, pore size, pore interconnectivity, shape (or overall geometry), structural strength and biocompatibility. These characteristics and properties are often considered to be critical factors in their designing and fabrication [2]. Design optimizing is the key step in obtaining bionic scaffolds with proper shape and inner microstructure. Traditional methods of scaffold fabrication include fiber bonding, solvent casting and particulate leaching [3], membrane lamination, melt molding, gas foaming, cryogenic induced phase separation [4],[5] and so on. However, all of these techniques are mainly based on manual work and lack of K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 136–145, 2007. © Springer-Verlag Berlin Heidelberg 2007
Application of Image Processing and Finite Element Analysis
137
corresponding designing process, so extra procedure was needed to obtain suitable shape and the microstructure wasn’t able to be controlled well. These traditional also have many disadvantages such as long fabrication periods, poor repeatability and insufficient connectivity of pores [6]. To overcome the limitations of these conventional techniques, automated computer controlled fabrication techniques, such as rapid prototyping (RP), are being explored. Based on layer by layer manufacturing process, parts with complex shape or structure could be produced through RP technologies easily and rapidly. Several kinds of RP technologies such as stereolithography (SLA) [7],[8], selective laser sintering (SLS) [9],[10], fused deposition modeling (FDM) [11],[12], three-dimensional printing (TDP or 3DP) [13],[14] and so on, have been applied widely in fabricating bionic scaffolds for tissue engineering and achieved some progress. Using RP technologies in bionic scaffolds preparation could fully performs the significance of designing and improves the bionic scaffolds’ properties. In this paper, the application of image processing and finite element analysis in the design optimizing of bionic scaffold’s shape and inner microstructure was studied respectively. The bionic scaffold’s shape was obtained through Mimics’ image processing and 3D reconstruction technologies. The inner microstructure of bionic scaffolds should be polygradient porous configuration with macro-pores and micro-pores. The macro-pores’ size and distribution was designed by CAD software and could be manufactured through RP technologies, while the micro-pores were caused by burning off of the pore-forming agent forming the spacing. So the design optimizing means finding the optimized parameters for size and distribution of macro-pores. Finite element analysis (FEA) was used in evaluating the mechanical properties of the optimized bionic scaffold’s microstructure models and models without design optimizing. Several groups of bioceramic scaffolds samples were fabricated through an indirect method combining stereolithography (SLA) and slurry casting and mechanical experiments were done to validate the FEA results.
2 Designing and Analyzing Process 2.1 Image Processing and 3D Reconstruction The patient’s CT date of defective skull were imported into Mimics 9.11 (The Materialise Group, Leuven, Belgium) and the 3D STL model of the skull was obtained through image processing and 3D reconstruction technologies. Then the restoration’s 3D STL model of defect was constructed by the symmetrically repairing operation in Mimics. The shape of the restoration was just the shape of the bionic bone scaffold to prepare. 2.2 Scaffold’s Structure Model Design and Analysis The macro-pores of bionic scaffold’s inner microstructure should be 3D connective gridding structure with proper size to assure the scaffold’s connectivity and be suitable for cell’s growth and proliferation. There were lots of scaffold models with different pore shape, size and distribution. Considering the preparing technology, four kinds of structure model with different pore shape and distribution were created in UG NX3.0 (UGS PLM Solutions, Plano, TX, USA) to make the analysis and contrast.
138
L. Lin et al.
As shown in Fig.1, model A has cylindrical macro-pores with diameter of 0.80 mm and distance between adjacent pores of 2.0 mm. Model C has square macro-pores with edge length of 0.71 mm and distance between adjacent pores of 2.0 mm to have the equal porosity with model A. The macro-pore’s distribution of model B and model D are different of model A and model C. The three coordinate axis pores were intersected in one point in model A and model C, while in model B and model D the X-axis pores and Y-axis pores were not in the same horizontal plane.
Fig. 1. The structure models of bionic scaffolds
The compress stimulation of the microstructure models were solved in finite element analysis software Ansys. The change of mechanical strength between the models before and after design optimizing was contrast and analyzed. According formulas of the Mechanics of Materials, the ratio between loads and strain is a constant to the same material. The four models were set with same element type and material attributes and the deformation of each model under the same load (compressing or bending) was contrasted. Based on the max total strain density value of each model, the compressive strength or bending strength was calculated. 2.3 Mechanical Experiment Three groups of bioceramic scaffolds of these four microstructure model were fabricated through an indirect method combining stereolithography (SLA) and slurry casting technologies. Compression tests were conducted with an INSTRON 5542 material testing system with Merlin™ software using a 50N load-cell (Norwood, MA, USA). Compression tests with real materials were done to validate the effect of design optimizing and the finite element analysis’ accuracy.
Application of Image Processing and Finite Element Analysis
139
3 Results 3.1 Result of Defective Bone Repairing As shown in Fig.2, the CT scanning images of patient’s skull were imported into Mimics 9.11. After selecting suitable threshold, the 3D model of the skull was reconstructed exported to STL model (Fig.3 A). Threshold selecting was very important for the reconstructed 3D model’s quality. In this paper, the bone model was defined through masks with threshold between 226 and 1738. The defective skull was repaired by the symmetrically repairing operation in Mimics. The shape of the restoration was just the shape of the bionic bone scaffold to prepare. Fig.3 B showed the bionic scaffold model with macro-pores, which was created through Boolean operation in Magics X (The Materialise Group, Leuven, Belgium), which is a software especially for processing STL files.
Fig. 2. The structure models of bionic scaffolds
Fig. 3. Result of repairing of the defective skull (A) and the model of restoration and bionic scaffold with macropores (B)
140
L. Lin et al.
3.2 Result of Finite Element Analysis Bionic scaffolds are full with porous microstructure, so it could be assumed that this porous microstructure is a homogeneous, isotropic material. Bio-ceramic is a brittle material and performs flexibility property before break (the max strain value reaches 0.03). The Young’s Module of scaffolds material was worked out based on previous mechanical experiments (E=36.96MPa). The most common loads of scaffolds suffered were compression and bending, especially the compressive strength is the most important mechanical property of bionic scaffolds. Selected the same element type and material attributes, the compression and bending analysis of these four models were analyzed in Ansys. The max strain value was obtained and then the compressive strength and bending strength of every model was calculated respectively according to mechanical formula. Based on the compressive strength and bending strength of every model, the influences of macro-pores shape and distribution on the mechanical properties were analyzed and discussed. Fig.4 shows the compressive strength of each model under unidirectional compressive load. The compressive strength of model C is more than twice of model A’s and the compressive strength of model D is 24.7% higher than model B’s. Compared with model A, the compressive strength of model B increases 63.6%, while the compressive strength of model D decreases 6.2% compared with model C. From Fig.4, following conclusions could be obtained. Firstly, scaffolds with square macro-pores exhibit much better compressive property than scaffolds with cylindrical macro-pores having the same porosity. Secondly, altering the macro-pores distribution could improve the compressive strength of model with cylindrical macro-pores apparently but result a little reduce of the compressive strength of model with square macro-pores. ) 0.3 a P M ( 0.25 h t g n 0.2 e r t s 0.15 e v i 0.1 s s e r p 0.05 m o c
0.2536
0.2389
0.1916
0.1171
0
A
B
C
D
model
Fig. 4. Compressive strength of the four models under unidirectional compressive load
Application of Image Processing and Finite Element Analysis
141
Fig. 5. The total strain intensity of structure models under unidirectional compressive load
Fig.5 showed the total strain intensity of four structure models under unidirectional compressive load analyzed in Ansys. All of these were under the same load situation with the bottom surface fixed and pressure of 10000N/m2 was on the top surface. The total strain intensity reached the maximum value in the stress concentration area, which was the area that was broken firstly. The stress concentration area was usually in the smallest sections. As shown in Fig.7, both the stress concentration degree of model A and B were higher than the model C and D respectively and the smallest section of model with cylinder macro-pores was smaller than the model with square macro-pores under the same pores distribution. So the scaffolds with square macropores exhibit much better compressive property than scaffolds with cylindrical macropores having the same porosity. Altering the macro-pores distribution could increase the smallest section area. To the models with cylinder macro-pores, altering the macro-pores distribution could reduce the concentration degree of stress apparently, so the compressive strength of model B increased clearly from model A. To the models with square macro-pores, altering the macro-pores distribution aggravated the concentration degree of stress, so the compressive strength of model D reduced a little from model C. 3.3 Result of Finite Element Analysis As shown in Fig.6, three groups of bioceramic scaffold samples of the four kinds of structure model were fabricated through an indirect method combining with stereolithography (SLA) and slurry casting technologies. Compression tests were conducted with an INSTRON 5542 material testing system using a 50N load-cell. The compressive tests results were shown as Fig.7. From Fig.7, it was seen that the compressive strength of scaffold samples C is 59.2% higher than scaffold samples A and the compressive strength of scaffold samples D is 5.4% more than scaffold samples B. These distinctions were not as evident
142
L. Lin et al.
Fig. 6. Bioceramic scaffold samples of the model structure )a 0.7 P 0.6 M ( h t 0.5 g n e r 0.4 t s e 0.3 v i s s 0.2 e r p m 0.1 o c
0
0.5434
0.5698
0.5724
C
D
0.3579
A
B
scaffold sample
Fig. 7. Compressive strength of the scaffold samples under unidirectional compressive load
as the FEA results, but also showed that scaffolds with square macro-pores exhibit better compressive strength than scaffolds with cylinder macro-pores. Compared with scaffold samples A, the compressive strength of scaffold samples B increases 51.8%, while the compressive strength of scaffold samples D nearly equal with scaffold samples C. This trendy also exhibited that altering the macro-pores distribution could improve the compressive strength of scaffold samples with cylindrical macro-pores apparently while made less influence on scaffold samples with square macro-pores.
4 Discussion 4.1 The Advantage of Using Mimics in Scaffolds’ Shape Designing The shape of human bone is very complex and irregular, while the bone defects caused by accidents or disease are even more. Before the appearance of the medical image processing system, the traditional method to repair the defective bone needs a surgery first, and in this operation, the shape and the size of the defective part of the bone is estimated by the naked eyes of doctors and some measure tools. And then, the model of the defective part is made by a manual operation followed by a second operation to plant the model to the defective part of the bone. This traditional method to repair the defective bone is of low efficiency, long period, low precision, and has a high cost. And the second operation usually causes hurt to the patient.
Application of Image Processing and Finite Element Analysis
143
As the appearance and the development of medical image processing system, this kind of system is used more and more in the repair of the defective bone. First, the CT images are obtained, and then the 3D CAD model of the defective bone is reconstructed by the medical image processing system. Then using the function of the system or other CAD software, the defective bone is repair and the CAD model of the repair is obtained. At last, the CAD model of the repair is fabricated. This method could have a high precision, shorten the period, and avoid the hurt to the patient in the second operation [15-16]. Mimics is a kind of such digital 3D medical image processing system. It provides the function to read the medical images, to reconstruct the images, and to repair the defective bone. The unique mask operation makes the segmentation of the images and the reconstruction of the tissue easy and convenient. Using the reconstructing function of Mimics, the related area can be calculated and the 3D model can be obtained. Usually, these operations require a high speed. However, Mimics can successfully finish on normal PC. Mimics provides a great convenience to the doctors, shortens the treating time, reduces the hurt to the patient and brings a great improvement to the clinic effect. 4.2 Mechanical Experiment The macro-pores design optimizing contains the design of the size, shape, distribution and so on. These parameters are very important to improve bionic scaffold’s properties. As the biomaterials used to prepare scaffolds were always very expensive and the fabricating process is complicated and needs long time, using finite element analysis to evaluate the mechanical properties of scaffolds as the design changed could save the cost and preparation cycle. The change trendy of the compressive strength obtained through mechanical experiments was consistent with the FEA results basically in this study validated the significance of FEA in bionic scaffolds design optimizing. Although the change trendy of the compressive strength obtained through FEA and mechanical experiments was consistent basically, the value of compressive strength of each scaffold sample was different from the FEA results of structure models. The compressive strength of all the four scaffold samples obtained through compressive tests was higher obviously than the FEA results of the four structure models. This was because the factual strain value when the scaffolds were broken was much bigger than 0.03, which was set in Ansys. The bioceramic materials properties were not exactly consistent with the material attributes set in Ansys. The fabrication methods and processing technologies could also influence the mechanical properties to a large extent.
5 Conclusions In this paper, the application of image processing and finite element analysis in the design optimizing of bionic scaffolds’ shape and inner microstructure was studied respectively. The bionic scaffold’s shape was obtained through Mimics’ image processing and 3D reconstruction technologies. Finite element analysis (FEA) was used in evaluating the mechanical properties of scaffolds’ structure models with different macro-pores shape and distribution to obtain the optimized parameters. Three groups
144
L. Lin et al.
of bioceramic scaffolds samples were fabricated through an indirect method combining stereolithography (SLA) and slurry casting and mechanical experiments were tested. The change trendy of the compressive strength obtained through mechanical experiments was consistent with the FEA results basically, so the significance of FEA in bionic scaffolds’ design optimizing was validated. The compressive strength of all the four scaffold samples obtained through compressive tests was lower obviously than the FEA results of the four structure models and the possible reasons were discussed. Acknowledgments. The authors would like to acknowledge the support of Shanghai Education Fund (No. 5A281) and Shanghai Academic Excellent Youth Instructor Special Foundation.
References 1. Tan, K.H., Chua, C.K., Leong, K.F., Cheah, C.M.: Scaffold development using selective laser sintering polyetheretherketone-hydroxyapatite biocomposite blends. Biomaterials 26, 4281–4289 (2005) 2. Yang, S.F., Leong, K.F., Du, Z.H., Chua, C.K.: The design of scaffolds for use in tissue engineering: Part 1-Traditional factors. Tissue Eng. 7(6), 679–690 (2001) 3. Linbo, W., Jiandong, D.: Compression Molding of PCL Porous Scaffolds with complicated shape for Tissue Engineering. Polymer Material Science and Engineering 25(1), 296–299 (2005) 4. Deville, S., Saiz, E., Tomsia, A.P.: Freeze casting of hydroxyapatite scaffolds for bone tissue engineering. Biomaterials 27, 5480–5489 (2006) 5. Madihally, S.V., Howard, W.T.: Matthew Porous chitosan scaffolds for tissue engineering. Biomaterials 20, 1133–1142 (1999) 6. Junmin, Q., Kai, C., Hao, A., Zhihao, J.: Progress in research of preparation technologies of porous ceramics. Ordnance Material Science and Engineering 28(5), 60–64 (2005) 7. Woesz, A., Rumpler, M., Stampfl, J., Varga, F.: Towards bone replacement materials from calcium phosphates via rapid prototyping and ceramic gelcasting. Materials Science and Engineering C 25, 181–186 (2005) 8. Chen, Z., Li, D., Lu, B.: Fabrication of osteo-structure analogous scaffolds via fused deposition modeling. Scripta Materialia 52, 157–161 (2005) 9. Williams, J.M., Adewunmi, A., Schek, R.M., Flanagan, C.L.: Bone tissue engineering using polycaprolactone scaffolds fabricated via selective laser sintering. Biomaterials 26, 4817–4827 (2005) 10. Chen, V.J., Smith, L.A., Ma, P.X.: Bone regeneration on computer-designed nano-fibrous scaffolds. Biomaterials 27, 3973–3979 (2006) 11. Kalitaa, S.J., Bosea, S., Hosickb, H.L.: Amit Bandyopadhyay, Development of controlled porosity polymer-ceramic composite scaffolds via fused deposition modeling. Materials Science and Engineering C 23, 611–620 (2003) 12. Zein, I., Hutmacher, D.W., Tan, K.C.: Fused deposition modeling of novel scaffold architectures for tissue engineering applications. Biomaterials 23, 1169–1185 (2002) 13. Leea, M., Dunna, J.C.Y., Wu, B.M.: Scaffold fabrication by indirect three-dimensional printing. Biomaterials 26, 4281–4289 (2005)
Application of Image Processing and Finite Element Analysis
145
14. Leukers, B., Gulkan, H., Irsen, S.H.: Hydroxyapatite scaffolds for bone tissue engineering made by 3D printing. Journal of Materials Science: Materials in Medicine 16, 1121–1124 (2005) 15. Huanwen, D., Yingjun, W., Qingshui, Y.: Recent development of Computer-aided tissue Engineering. Chinese Journal of Reparative and Reconstruction of Surgery 5, 574–577 (2006) 16. Xi, H., Longbiao, Z., Zhisong, Z., Jianwei, Z.: CT-image Based Reverse of Custom Made Stem. Journal of Nantong University (Natural Science) 5, 52–56 (2006)
The Mechanical Properties of Bone Tissue Engineering Scaffold Fabricating Via Selective Laser Sintering Liulan Lin, Aili Tong, Huicun Zhang, Qingxi Hu, and Minglun Fang Rapid Manufacturing Engineering Center, Shanghai University, Shanghai, China 99 Shangda Road Shanghai 200444, China
[email protected], tony_li@ shu.edu.cn
Abstract. Performance of bone tissue depends on porous scaffold microstructures with specific porosity characteristics that influence the behavior of the ingrown cells. The mechanical properties of porous tissue scaffolds are important for their biomechanical tissue engineering application. In this study, the composite materials powder was developed for the selective laser sintering process, and the parameters of selective laser sintering were optimized. With the
aim of evaluating the influence of porosity on mechanical properties, we have studied the load limits for three specimens of scaffolds which have different porosity. Young’s modulus was computed by determining the slope of the stress - strain curve along the elastic portion of the deformation. In addition, the final element analysis (FEA) module of UG NX4 was used to analyze these scaffolds. The results showed that the bone tissue engineering scaffolds were fabricated by SLS technology have good mechanical properties, which have good potential for tissue engineering applications.
1 Introduction In bone tissue engineering (BTE), 3D scaffolds are essential to guide cell proliferation and to maintain native phenotype in regenerating biologic tissues or organs [1], [2], [3]. These scaffolds give shape to regenerate tissue and temporarily fulfill the structural function of native tissue. In addition to fitting into the anatomical defect, they have possessed sufficient strength and stiffness that will bear in vivo loads so that the scaffolds can function before the growing tissue replaces the gradually degrading scaffolds matrix [4], [5], [6]. Conventional methods for making scaffolds include solvent casting, fiber meshes, phase separation, melt molding and gas foaming [7], [8]. These techniques lack precise control of pore shape, pore geometry and spatial distribution. Some methods also require the use of organic solvents that leave undesirable residues in the finished products, and thus create host reactions due to inflammation or toxicity [9]. The use of rapid prototyping (RP) allows the production of scaffolds with controlled hierarchical structures directly from computer data. Furthermore, the shape of the scaffold can be designed by taking anatomical information of the patient’s target defect (e.g. CT, MRI) to obtain a custom-tailored implant [10].One rapid prototyping method, selective laser sintering, may be advantageous for creating bone tissue engineering K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 146–152, 2007. © Springer-Verlag Berlin Heidelberg 2007
The Mechanical Properties of Bone Tissue Engineering Scaffold Fabricating
147
scaffolds because it provides a cost-effective, efficient method by which to construct scaffolds to match the complex anatomical geometry of bone defect structure. More important, this method which was different from other RP method can directly sinter biocompatible materials. Here, we report the example of β-TCP scaffolds with self-supporting feature fabricated by selective laser sintering. The feasibility of sintering such powder blends and the influence of SLS processing parameters on the sintering quality and resulting microstructure of the sintered specimens were studied. With the aim of evaluating the influence of porosity on mechanical properties, we have studied the load limits for three specimens of scaffolds which have different porosity. Young’s modulus was computed by determining the slope of the stress - strain curve along the elastic portion of the deformation. In addition, the final element analysis (FEA) module of UG NX4 was used to analyze these scaffolds.
2 Materials and Methods 2.1 Preparation of Scaffolds Cylindrical porous scaffolds (20mm diameter, 10mm height), with three-dimensional orthogonal periodic porous architectures, were designed using Unigraphics NX4 3D solid modeling software (UGS PLM Solution, Plano, TX). The design was exported to a Sinterstation machine (HPRS-IIIA, BinHu, Wuhan, China) in STL file format, then was used to construct scaffolds of β-tricalcium phosphate and binding material mixture powder by SLS processing. The sieve sizes (74μm) of mixture material were grade by using the vibratile-sizer. SLS processing of the mixture powder was conducted by 11W laser power and 2400mm/s scanning speed. Scaffolds were fabricated layer by layer using a powder layer thickness of 0.1mm (Fig.1B). After SLS processing was completed, the excess powder surrounding the scaffolds was brushed off and unsintered powder was removed from the scaffold interstices by insertion of 1mm diameter needle. Finally, the green scaffolds were calcined by the high temperature (Fig.1C).
Fig. 1. Pictures of scaffold molds (A) 3D solid modeling designed by UG NX4. (B) Before calcined bionic scaffold fabricated by SLS (Green part). (C) After calcined bionic scaffold.
148
L. Lin et al.
2.2 Compression Testing The mechanical properties of the scaffold specimens were measured using an Instron Uniaxial Testing System (Instron 5542 UK). In this experiment, three specimens were compressed at a cross head speed of 1mm/min. Load and displacement was noted and converted to stress and stain, from which the slope was used to calculate elastic modulus (E). A stress-strain curve was plotted based on the apparent stress σ (MPa) and stain ε (%) value with the initial cross-sectional area A1 (mm2) of each test specimen and the deformation values with initial specimen height H1 (mm), respectively. 2.3 Final Element Analysis (FEA) The three models which have different porosities were meshed by tetrahedral element (10 notes), respectively. We brought 100N of pressure to bear on the top face of scaffolds in this examination. And these were added the material properties which were obtained in the compressing testing. A finite element model for strength analysis of bone scaffolds mold under compression was presented. Characteristics of stress distribution and location are determined according to the model.
3 Results and Discussions 3.1 Optimization of SLS Parameters In order to establish a set of suitable processing parameters for processing thereafter the biocomposite (β-TCP/binding) powder, various specimens were tested (Table 1). Table 1. Specimen groups for β-TCP/binding biocomposite powder Laser power (W) 6 7 8 9 10 11 14 16
Scanning (mm/s)
↑ 1500 2000 2400 ↓
speed
Molding(Y/N)
Strength
Surface quality
N N N N Y Y Y Y
fragile fine fine fine
smooth smooth adust adust
According to the Andrew’s number theory the energy density received by the power in a specimen is directly proportional to the laser power and scanning speed [11]. When the two parameters were matched suitably, a set of bone scaffolds which had good mechanical properties and fine surface quality were obtained. In sintering the β-tricalcium phosphate and binding material, particles that were rapidly scanned by the laser beam would receive free surface energy. Binding material was formatted neck at local contact point occurred. Neck growth would only occur for a short period, creating
The Mechanical Properties of Bone Tissue Engineering Scaffold Fabricating
149
a porous network due to such a formation. For the biocomposite powder, three specimens were obtained using the following parameter settings: scanning speed 1500, 2000 and 2400mm/s; laser power 6, 7, 8, 9, 10, 11, 14, 16 W (see Table 1). Processing β-TCP/binding powder at 6-9W laser power did not sinter the particles, as there was few necking. So this was improved by increasing the laser power to 10W and scanning speed to 2000mm/s, the scaffold was fragile as compared to the other test specimens. When test sintering was done at a scanning speed of 2400mm/s and laser power of 11W, it can get good properties of scaffolds. Test specimens sintered with this parameter gave a more connectivity between the particles and the necking formations were more available. Test sintering were carried out above 11W laser power and 2400mm/s scanning speed, the scaffolds’ surface were adusted obviously. 3.2 Mechanical Properties Compression tests were performed to characterize the mechanical properties of the prepared scaffolds with an Instron Uniaxial Testing System. Fracture toughness, the material’s resistively to crack propagation, is an important parameter to assess the susceptibility of a scaffold to failure. Fig.2A shows the typical stress strain curve from compression testing. Three specimens of each type were tested for mechanical properties. The Fig.2B shows that the compressive strength of the porous TCP ceramics decreases linearly with increasing macropores. According to the Hook’s law, the Young’s modulus (E) was calculated in this test. The three specimens of scaffolds’ Young’s modulus were 15.38MPa, 28.57MPa,
Fig. 2. Porosity and strength behaviors of the porous scaffold (A)Stress-strain curves of different porosity in the β-TCP scaffolds(S1, S2 and S3 were the various types of scaffolds with different porosity). (B)Young’s modulus variation calculated as a function of porosity.
150
L. Lin et al.
48.7MPa, respectively. The porosity and strength behaviors of the porous β-TCP scaffold with pore volume fraction 60.7, 70.8 and 75.8 were illustrated in Fig.2B, where different symbols refer to different macro-porosity, the compressive strength of the porous β-TCP ceramic appears to be sensitive to the pore volume, and the difference in the porosity and strength behaviors became pronounce as the porosity volume decreased. The apparent stiffness of the scaffolds (calculated as effective Young’s modulus) was found to decrease linearly with the porosity. Porosity and interconnectivity were the key determinants of the porous scaffolds. It seems important to characterize the effect of porosity together with the macropore size on the compressive strength not only for a better understanding of the porosity and strength behavior but also to help in design of the porous β-TCP ceramic scaffolds with desirable mechanical property. Although there is no clearly defined criterion in mechanical properties required by bone tissue engineering, it is generally accepted that the scaffolding material could bear the force in cell implantation experiment. The presence of cells and deposited ECM can enhance the scaffold’s stability [12]. In addition, if the scaffold is implanted in vivo, the new lively bony tissue will instead of it. So the bone scaffolds do not have to own high compressive strength as true bone. The specimens are hard enough to handle in a real surgical situation. 3.3 FEA It is important that Finite Element Analysis (FEA) was used in research of the mechanical property. By analyzing the stresses and strains of the numeric models, the optimizing the processing parameters of bone scaffolds were obtained. The three models which were meshed by tetrahedral element (10 notes) were illustrated in Fig.3 (A, B and C). Model A of around 55,361 nodes could be established as a model with sufficient convergence and limited calculation time. Model B was around 54,940 nodes, and Model C was around 53,300 nodes. This corresponded to a coarsening factor of 1.4 and was used for the creation all the meshes. Then the Young’s modulus (E) was inputted into the FEA module of UGNX4. When there were defined load with 100N and added constraints, the distortion of results were showed as Fig.3 (E, F, and G). In calcinations processing, the binding materials were burn off and formed bubbles which could be destroyed scaffolds and created micropores. The micropores couldn’t design in the CAD software. But it was weaken the influence by the properties of material which got in the compression test. In the Fig.3 (D, E and F), the red areas are the large deformation region. It can be observed that the center of the scaffolds were weaker than other regions (Fig.3). Because some pores were collected in this area. According to the results of FEA models, it clearly saw the mechanical property of scaffolds was related with the porosity and the distribution of pores in scaffold. The comparison between the three pictures (D, E and F), the first scaffold which the Young’s modulus is 28.57MPa and the porosity is 70.8 had a good mechanical property. In addition, the results were used to adjust the SLS processing parameters which would obtain a better mechanical property scaffold.
The Mechanical Properties of Bone Tissue Engineering Scaffold Fabricating
151
Fig. 3. The models of scaffolds were meshed via UGNX4 FEA module (A, B and C). Stresses and strains analysis of different scaffolds (D, E and F) (the Young’s modulus (E) was 28.57MPa, 15.38MPa, 48.7MPa, respectively).
As being a mechanical property study of β-TCP scaffold for tissue engineering this study restrict in the result of the potential biological application. The interrelation between bone scaffolds and materials has not been investigated since the simulation of cell attachment onto the material has not been included. However, in this paper the deformation of the scaffolds were calculated to predict the mechanical process favorable for cell ingrowths.
4 Conclusions Selective laser sinter (SLS), a rapid prototyping technology, was investigated and successfully applied in the research to produce 3D scaffolds with enough mechanical properties. Two main parameters of SLS, namely laser power and scanning speed and sintering material were investigated to study its effect on the integrity of the test specimens, which were fabricated for the purpose of bone scaffold. Moreover, this research confirmed the decrease in compressive strength with increased with porosity of β-TCP ceramics. Examination of the mechanical deformation indicated that the porous β-TCP scaffolds stress-strain behavior highly similar to that of a typical porous material undergoing compression. Acknowledgments. The authors would like to acknowledge the support of Shanghai Education Fund (No. 5A281) and Shanghai Academic Excellent Youth Instructor Special Foundation.
152
L. Lin et al.
References 1. Kim, S.-S., et al.: Poly (lactide-co-glycolide)/hydroxyapatite composite scaffolds for bone tissue engineering. Biomaterials 27, 1399–1409 (2006) 2. Tan, K.H., Chua, C.K., Leong, K.F.: Scaffold development using selective laser sintering of polyetheretherketone-hydroxyapatite biocomposite blends. Biomaterials 24, 3115–3123 (2003) 3. Vacanti, J.P., Vacanti, C.A.: The history and scope of tissue engineering, in principles of tissue engineering, 2nd edn. Academic Press, New York (2000) 4. Ho, S.T., Hutmacher, D.W.: A comparison of micro CT with other techniques used in the characterization of scaffolds. Biomaterials 27, 1362–1376 (2000) 5. Griffith, L.G.: Polymeric biomaterials. Acta Mater 48, 263–277 (2000) 6. Buckley, C.T., O’Kelly, K.U.: Regular scaffold fabrication techniques for investigations in tissue engineering. Topics in Bio-Mechanical Engineering, 147–166 (2004) 7. Peter, X.M., Zhang, R.: Synthetic nano-scale fibrous extracellular matrix. J. Biomed. Mater Res. 46(1), 60–72 (1998) 8. Sherwood, J.K., Riley, S.L., Palazzolo, R., Brown, S.C., et al.: A three-dimensional osteochondral composite scaffold for articular cartilage repair. Biomaterials 23(24), 4739–4751 (2002) 9. Yang, S.F., Leong, K.F., Du, Z.H., Chua, C.K.: The design of scaffolds for use in tissue engineering: Part I - Traditional factors. Tissue Eng. 7, 679–690 (2001) 10. Leukers, B., Irsen, S.H., Tille, C.: Hydroxyapatite scaffolds for bone tissue engineering made by 3D printing. Journal of Materials Science: Materials in Medicine 16, 1121–1124 (2005) 11. Nelson, J.C.: Ph.D. Thesis. Selective laser sintering of calcium phosphate materials for orthopedic implants. The University of Texas at Austin. USA (1993) 12. Malda, J., et al.: The effect of PEGT/PBT scaffold architecture on the composition of tissue engineered cartilage. Biomaterials 26, 63–72 (2005)
Informational Structure of Agrobacterium Tumefaciens C58 Genome Zhihua Liu and Xiao Sun State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, P.R .China {zhliu,xsun}@seu.edu.cn
Abstract. Base-base correlation (BBC) method, based on information theory, translates a DNA sequence into a 16-dimensional vector. It has proven quite effective in distinguishing various functional regions on one chromosome. In this study, we explore the potential use of distinguishing different chromosomes within one species, with particular emphasis on Agrobacterium tumefaciens strain C58. Our findings show that BBC method could effectively distinguish informational structure of Agrobacterium tumefaciens strain C58 genomes. In conclusion, BBC provides a new methodology in post-genome informatics and its applications could be further explored in the further.
1 Introduction The statistical properties of DNA sequences obtained a substantial amount of scientific attention in the last few years. One of the most important findings in this respect is the relation of 10-11 bp periodicities with DNA supercoiling [1]. Several global statistical properties have been developed to analyze DNA sequence and been found to be related with biological function. The variation of word-frequency within genomes has been linked to functional variation [2]. The variation of dinucleotide relative abundance reflects interspecies differences in process such as DNA modification, replication, and repair [3]. The signature of Alu repeats has been identified as peaks in the correlation function [4]. On an evolutionary scale, more closely related species have more similar word compositions [5] and the dinucleotide biases differ between species [6]. Here we developed a novel sequence feature, named as base-base correlation (BBC), which was inspired from using mutual information function (MIF) to analyze DNA sequence. Compared with MIF, BBC emphasized the information of different base pairs within the range of k. It improved the resolving power and provided a more appropriate description of sequence dissimilarity. A sequence, regardless of its length is kilobases, megabases, or even gigabases, corresponded to a unique 16-dimensional vector. Changes in the values of 16 parameters reflected difference between genome content and length. The procedure was a normalization operation to compare genomes of different scales, which are difficult to obtain a good sequence alignment. In recent study [7], BBC was applied to analyze various functional regions of the human chromosome, including exon, intron, upstream, downstream and intergenic regions. The results showed that BBC assisted in distinguishing various functional regions of genome. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 153–161, 2007. © Springer-Verlag Berlin Heidelberg 2007
154
Z. Liu and X. Sun
In this work, BBC was applied to distinguish different chromosomes of certain species, with particular emphasis on Agrobacterium tumefaciens strain C58. Agrobacterium tumefaciens is a plant pathogen capable of transferring a defined segment of DNA to a host plant, generating a gall tumor. The genome of Agrobacterium tumefaciens strain C58 has an unusual structure consisting of a circular chromosome, a linear chromosome and two plasmids. This ability to transfer and integrate is used for random mutagenesis and has been critical for the development of modern plant genetics and agricultural biotechnology. In 2001, the genome of Agrobacterium tumefaciens C58 was sequenced by two different research groups and two papers about the genome of Agrobacterium tumefaciens C58 were published in Science [8, 9]. NCBI Genome project released this two genome sequences and named Agrobacterium_tumefaciens_C58_Cereon and Agrobacterium_tumefaciens_C58_UWash, respectively.
2 Materials and Methods 2.1 Sequences Two genome projects of Agrobacterium tumefaciens strain C58 used in this study were retrieved from NCBI (http://www.ncbi.nlm.nih.gov). The name, accession number and length for Agrobacterium tumefaciens strain C58 genomes were shown in Table 1. Table 1. The name, accession number and length for Agrobacterium tumefaciens strain C58 genome Strain Agrobacterium tumefaciens C58 Cereon
Agrobacterium tumefaciens C58 UWash
Genome chromosome circular chromosome linear plasmid AT plasmid Ti chromosome circular chromosome linear plasmid AT plasmid Ti
Accession NC_003062 NC_003063 NC_003064 NC_003065 NC_003304 NC_003305 NC_003306 NC_003308
Length (nt) 2,841,581 2,074,782 542,869 214,233 2,841,490 2,075,560 542,780 214,234
2.2 Base-Base Correlation (BBC) DNA sequences can be viewed as symbolic strings composed of the four letters (B1 , B2 , B3 , B4 ) ≡ ( A, C , G , T ) . The probability of finding the base Bi is denoted
by pi (i = 1, 2, 3, 4) . Then BBC is defined as the following: k
Tij (k ) =
∑p l =1
ij (l ) ⋅ log 2 (
pij (l ) pi p j
) i, j ∈ {1, 2, 3, 4}
(1)
Here, pij (l ) means the joint probabilities of bases i and j at a distance of l. Tij (k ) represents the average relevance of the two-base combination with different gaps from 1 to k. It reflects a local feature of two bases with a range of k. Here we take
Informational Structure of Agrobacterium Tumefaciens C58 Genome
155
k=2 in BBC calculation in the present study. For each sequences m, BBC has 16 parameters and constitutes a 16-dimensional vector Vmz ( z = 1, 2, L , 16) . Statistical independence of two bases in a distance l is defined by pij (l ) = pi p j . Thus, deviations from statistical independence is defined by Dij (l ) = pij (l ) − pi p j
(2)
We expand Tij (k ) using a Taylor series in terms of equation 2 k
Tij (k ) =
∑
pij (l ) ⋅ log 2 (
l =1
pij (l ) pi p j
=∑ [D (l ) + p p ]⋅ ln ⎡⎢1 + Dp p(l )⎤⎥ k
ij
)
ij
i
j
⎣⎢
l =1
i
j
⎦⎥
=∑ [D (l ) + p p ]⋅ ⎡⎢⎢ Dp p(l ) − 2Dp (pl ) + L⎤⎥⎥ k
i
j
l =1
=∑ D (l ) + k
ij
l =1
2 ij
ij
ij
()
Dij2 l 2 pi2 p 2j
⎣
i
j
[
+ ο Dij3 (l )
i
j
(3)
⎦
]
This mathematical transformation further increases the calculation speed and solves effectively the problem of 0 ⋅ log 2 0 (i.e. pij (l ) = 0 in equation 1). 2.3 The Distance Matrix
Given two sequences m and n, the distance H mn between two sequences m and n is defined as the following: 16
H mn =
∑ (V
z m
z =1
− Vnz ) 2
m, n = 1, 2, L, N
(4)
Here, Vm and Vn represent the 16-dimensional vectors of sequences m and n. N is the total number of all sequences analyzed. According to equation 4, H mn obviously satisfies the definition of distance: (ⅰ) H mn > 0 for m ≠ n; (ⅱ) H mn = 0; (ⅲ) H mn = H nm (symmetric); (ⅳ) H mn ≤ H mq + H nq (triangle inequality). For N sequences, a real symmetric N × N distance matrix is then obtained. 2.4 Clustering
Accordingly, a real symmetric N × N matrix is used to reflect the distance between N sequences. Then, the clustering trees are constructed using original Neighbor-Joining (NJ) algorithm [10], a note on the NJ algorithm [11], BIONJ algorithm [12] and UPGMA algorithm [13], respectively. The reliability of the branches is assessed by performing 100 resamplings. Bootstrap values are shown on nodes.
156
3
Z. Liu and X. Sun
Results
3.1 GC Content of Agrobacterium Tumefaciens Strain C58 Genome
GC content of Agrobacterium tumefaciens strain C58 genome was shown in Figure 1. GC content values for the corresponding chromosome or plasmid appeared almost equal between Agrobacterium_tumefaciens_C58_Cereon and Agrobacterium_tumefaciens_C58_UWash. In addition, GC content value of chromosome was different from that of plasmid. GC content values appeared relatively large difference between plasmid AT and plasmid Ti. While, GC content values for chromosome circular and chromosome linear showed minor difference. Thereforely, it was very difficult to distinguish different chromosomes of Agrobacterium tumefaciens strain C58 genome only by difference on GC content. 0.6
Agrobacterium_tumefaciens _C58_Cereon
Agrobacterium_tumefaciens _C58_UWash
GC content
0.59
0.58
0.57
0.56
pl
as m
id
Ti
AT pl
as m
id
ea r li n os om e
ch ro m
pl
os om e ch ro m
as m
id
ci rc ul a
r
Ti
AT pl
as m
id
ea r li n os om e
ch ro m
ch ro m
os om e
ci rc ul a
r
0.55
Fig. 1. GC content of Agrobacterium tumefaciens strain C58 genome. Agrobacterium_tumefaciens_C58_Cereon and Agrobacterium_tumefaciens_C58_UWash were indicated by red and blue, respectively
3.2
BBC Curves of Agrobacterium Tumefaciens Strain C58 Genome
For each genome sequence, 16 parameters of BBC were calculated and linked to a continuous curve, which was designated as BBC curve. BBC curve was then represented as a unique feature for a given sequence, providing an intuitionistic and general description for DNA sequence. BBC curves of Agrobacterium tumefaciens strain C58 genome were displayed in Figure 2. Each curve represented a chromosome or plasmid of Agrobacterium tumefaciens strain C58 genome. It was found that BBC curve of the same type of chromosome or plasmid between Agrobacterium tumefaciens strain C58 Cereon and
Informational Structure of Agrobacterium Tumefaciens C58 Genome
157
Agrobacterium tumefaciens strain C58 UWash, tended to cluster together very closely. It was shown that the same color BBC curves bring into coincidence regarding the tendance. On the other hand, it was found that BBC curves of chromosome circular and chromosome linear tended to cluster together, and BBC curves of plasmid AT and plasmid Ti tended to cluster together. An interesting observation was the relatively large difference between plasmid AT and plasmid Ti in BBC values of A---T, C---A, G---A and G---C. To further illustrate the difference in informational structure of Agrobacterium tumefaciens strain C58 genome, the distance matrix was calculated and the clustering tree was constructed by several clustering algorithm. 0.3 chromosome circular chromosome linear plasmid AT plasmid Ti
0.2
BBC Value
0.1
0
-0.1
-0.2
-0.3
-0.4 A---A
A---C
A---G
A---T
C---A
C---C
C---G
C---T
G---A
G---C
G---G
G---T
T---A
T---C
T---G
T---T
Fig. 2. BBC curves of Agrobacterium tumefaciens strain C58 genome. Chromosome circular, chromosome linear, plasmid AT and plasmid Ti were indicated by red, green, blue and magenta, respectively.
3.3 Clustering Tree of Agrobacterium Tumefaciens Strain C58 Genomes
Comparison of four reconstruction methods, it had been found that the four phylograms had the same topology structure. Two major groups (plasmid group and chromosome group) could be seen from these four figures (Figure 3-6). In the first branch, plasmid Ti of Agrobacterium tumefaciens strain C58 UWash and that of Agrobacterium tumefaciens strain C58 Cereon tended to cluster together, with a bootstrap value of 99%. Plasmid AT of Agrobacterium tumefaciens strain C58 UWash and that of Agrobacterium tumefaciens strain C58 Cereon tended to cluster together, with a bootstrap value of 99%. This two groups clustered together and formed a bigger group (plasmid group), with a bootstrap value of 100%. In another branch, chromosome linear of Agrobacterium tumefaciens strain C58 UWash and that of Agrobacterium tumefaciens strain C58 Cereon tended to cluster together, with a bootstrap value of 97%. Chromosome circular of Agrobacterium tumefaciens strain C58 UWash and that of Agrobacterium tumefaciens strain C58 Cereon tended to cluster together, with a bootstrap value of 98%. This two groups clustered together and formed a bigger group (chromosome group), with a bootstrap value of 100%.
158
Z. Liu and X. Sun
Fig. 3. The clustering tree of Agrobacterium tumefaciens strain C58 based on original NJ algorithm. Bootstrap values were shown on nodes
Fig. 4. The clustering tree of Agrobacterium tumefaciens strain C58 based on a note on the NJ algorithm. Bootstrap values were shown on nodes.
Agrobacterium tumefaciens strain C58 has an unusual genome structure consisting of a circular chromosome, a linear chromosome, and two plasmids: the tumorinducing plasmid pTiC58 and a second plasmid pAtC58 [14, 15]. An interesting observation of the study was that the clustering-phylogram based on BBC, whether in terms of NJ algorithm or UPGMA algorithm, could not only distinguish between chromosome and plasmid, but also discriminate two chromosomes (chromosome linear, chromosome circular) and two plasmids (plasmid Ti, plasmid AT), respectively. In addition, the corresponding chromosome or plasmid of Agrobacterium tumefaciens strain C58 Cereon and Agrobacterium tumefaciens strain C58 UWash tended to cluster together.
Informational Structure of Agrobacterium Tumefaciens C58 Genome
159
Fig. 5. The clustering tree of Agrobacterium tumefaciens strain C58 based on BIONJ algorithm. Bootstrap values were shown on nodes.
Fig. 6. The clustering tree of Agrobacterium tumefaciens strain C58 based on UPGMA
algorithm. Bootstrap values were shown on nodes.
4 Discussion The biological origin of genome information being present on a large-scale statistical level is far from being understood. Short-range correlations in DNA sequences have proven informative during recent decades. Starting from the early finding that coding and noncoding sequence segments possess mutual information function with striking differences due to codon usage in the coding regions [16], an ever more detailed look at short-range correlation properties is to be related with biological function, such as the relation of 10-11 bp periodicities with DNA supercoiled structures [1]. BBC had proven quite effective in distinguishing coding and noncoding sequence segments. It could be further used to classify various functional regions on the chromosome, such as exon, intron, upstream, downstream and intergenic regions [7]. Here BBC was
160
Z. Liu and X. Sun
applied to distinguish different types of chromosomes and plasmids of Agrobacterium tumefaciens strain C58 genome, including chromosome circular, chromosome linear, plasmid AT and plasmid Ti. Our main finding supporting this view was that BBC, as a sequence feature, could distinguish not only various functional regions on one chromosome, but also different types of chromosomes and plasmids within one species. Usually, one species has more than one chromosome. It is very difficult to distinguish chromosomes or plasmids by certain property, such as GC content. Multiple sequence alignment is an alternative approach to identify different chromosomes within one species. While the procedure of alignment for whole chromosome sequences appear to be time-consuming, and even impossible. In addition, a good sequence alignment is very difficult to be obtained in the case of large sequence divergence among different chromosomes. Moreover, gaps in the sequences will be ignored in sequence alignment. This procedure will throw away the most ambiguous parts of the alignment, which may be very important to distinguish different chromosomes. In contrast to traditional alignment methods, the advantage of BBC method is low computational complexity and easy to implement. A sequence, regardless of its length is kilobases, megabases, or even gigabases, corresponds to a unique 16-dimensional vector. Changes in the values of 16 parameters reflect difference between genome content and length. Intriguingly, BBC curve provides a fast and intuitionistic tool for sequence comparison analysis. BBC was inspired from using MIF to analyze DNA sequence. Compared with MIF, BBC emphasized the information of different base pairs within the range of k. It improved the resolving power and provided a more appropriate description of sequence dissimilarity.
5 Conclusions BBC method, based on information theory, translates sequence data into a 16dimensional vector. In recent work, BBC method has proven quite effective in distinguishing various functional regions on one chromosome. In this study, we explore the potential use of distinguishing different chromosomes within one species. Our findings show that BBC method is capable of revealing the identity of different chromosomes and plasmids of Agrobacterium tumefaciens strain C58 genome. In conclusion, BBC provides a new methodology in post-genome informatics and its applications can be further explored in the further. Acknowledgments. This work is supported by the National High-Tech Research and Development Program (863 Program) of China (No. 2002AA231071), the Natural Science Foundation of China (No. 60671018; 60121101).
References 1. Schieg, P., Herzel, H.: Periodicities of 10-11bp as indicators of the supercoiled state of genomic DNA. Journal of molecular biology 343(4), 891–901 (2004) 2. Nikolaou, C., Almirantis, Y.: Word preference in the genomic text and genome evolution: different modes of n-tuplet usage in coding and noncoding sequences. Journal of molecular evolution 61(1), 23–35 (2005)
Informational Structure of Agrobacterium Tumefaciens C58 Genome
161
3. Karlin, S., Burge, C.: Dinucleotide relative abundance extremes: a genomic signature. Trends Genet. 11(7), 283–290 (1995) 4. Holste, D., Grosse, I., Beirer, S., Schieg, P., Herzel, H.: Repeats and correlations in human DNA sequences. Physical review 67(6 Pt 1), 061913 (2003) 5. Bush, E.C., Lahn, B.T.: The evolution of word composition in metazoan promoter sequence. PLoS computational biology 2(11), e150 (2006) 6. Gentles, A.J., Karlin, S.: Genome-scale compositional comparisons in eukaryotes. Genome research 11(4), 540–546 (2001) 7. Liu, Z.H., Jiao, D., Sun, X.: Classifying genomic sequences by sequence feature analysis. Genomics, proteomics & bioinformatics 3(4), 201–205 (2005) 8. Goodner, B., Hinkle, G., Gattung, S., Miller, N., Blanchard, M., Qurollo, B., Goldman, B.S., Cao, Y., Askenazi, M., Halling, C., et al.: Genome sequence of the plant pathogen and biotechnology agent Agrobacterium tumefaciens C58. Science 294(5550), 2323–2328 (2001) 9. Wood, D.W., Setubal, J.C., Kaul, R., Monks, D.E., Kitajima, J.P., Okura, V.K., Zhou, Y., Chen, L., Wood, G.E., Almeida Jr., N.F., et al.: The genome of the natural genetic engineer Agrobacterium tumefaciens C58. Science 294(5550), 2317–2323 (2001) 10. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution 4(4), 406–425 (1987) 11. Studier, J.A., Keppler, K.J.: A note on the neighbor-joining algorithm of Saitou and Nei. Molecular biology and evolution 5(6), 729–731 (1988) 12. Gascuel, O.: BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Molecular biology and evolution 14(7), 685–695 (1997) 13. Highton, R.: The relationship between the number of loci and the statistical support for the topology of UPGMA trees obtained from genetic distance data. Molecular phylogenetics and evolution 2(4), 337–343 (1993) 14. Allardet-Servent, A., Michaux-Charachon, S., Jumas-Bilak, E., Karayan, L., Ramuz, M.: Presence of one linear and one circular chromosome in the Agrobacterium tumefaciens C58 genome. Journal of bacteriology 175(24), 7869–7874 (1993) 15. Goodner, B.W., Markelz, B.P., Flanagan, M.C., Crowell Jr., C.B., Racette, J.L., Schilling, B.A., Halfon, L.M., Mellors, J.S., Grabowski, G.: Combined genetic and physical map of the complex genome of Agrobacterium tumefaciens. Journal of bacteriology 181(17), 5160–5166 (1999) 16. Grosse, I., Herzel, H., Buldyrev, S.V., Stanley, H.E.: Species independence of mutual information in coding and noncoding DNA. Physical review 61(5 Pt B), 5624–5629 (2000)
Feature Extraction for Cancer Classification Using Kernel-Based Methods Shutao Li and Chen Liao College of Electrical and Information Engineering, Hunan University, 410082 Changsha, China
[email protected]
Abstract. In this paper, kernel-based feature extraction method from gene expression data is proposed for cancer classification. The performances of four kernel algorithms, namely, kernel Fisher discriminant analysis (KFDA), kernel principal component analysis (KPCA), kernel partial least squares (KPLS), and kernel independent component analysis (KICA), are compared on three benchmarked datasets: breast cancer, leukemia and colon cancer. Experimental results show that the proposed kernel-based feature extraction methods work well for three benchmark gene dataset. Overall, the KPLS and KFDA show the best performance, and KPCA and KICA follow them.
1 Introduction Gene expression studies on DNA microarray data provide unprecedented chances for disease prediction and classification. However, gene datasets usually include a huge number of genes, and many of them may be irrelevant to the analysis. This poses a great difficulty to many classifiers. By performing dimensionality reduction, feature extraction is thus often critical in improving both the accuracy and speed of the prediction systems. A good feature extraction method should extract most informative features and construct a new subset with lower dimension. In recent years, several useful kernel-based learning machines, e.g. kernel fisher discriminant analysis (KFDA) [1], kernel principal component analysis (KPCA) [2] etc., have been proposed. These methods have shown practical relevance for classification, regression problem and in unsupervised learning. Well applications of kernel-based algorithms have been applied for a number of fields, such as in the context of optical pattern and object recognition, text categorization, time-series prediction, and so on [3]. The purpose of this study is to propose the kernel-based feature extraction method for cancer classification, as well as evaluate and compare the performance of four kernel-based methods: kernel Fisher discriminant analysis (KFDA) [1], kernel principal component analysis (KPCA) [2], kernel partial least squares (KPLS) [4], and kernel independent component analysis (KICA) [5]. First, the genes are preprocessed by the method of T-Test to filter irrelevant and noisy genes. Then, these kernel-based methods are used to extract highly informative and discriminative features. Finally, the new training set, with the extracted features, is input to a support vector machine (SVM) for classification. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 162–171, 2007. © Springer-Verlag Berlin Heidelberg 2007
Feature Extraction for Cancer Classification Using Kernel-Based Methods
163
2 Kernel Methods 2.1 Kernel Fisher Discriminant Analysis (KFDA) KFDA is the method using the model defined by a linear combination of some specified kernel bases as N
y = ∑ ai K (ωi , x)
(1)
i =1
Here K is the kernel function, and ai is the coefficient vector for the ith kernel base. An
ωi − x ] is typically used as the kernel 2σ 2 2
isotropic Gaussian function K (ωi , x) = exp[−
function. The location of the kernel base ωi is fixed to one of the training samples and the number of kernel bases N equals to the number of training samples. Let k ( x) = ( K (ω1 , x),..., K (ω N , x))T be the vector corresponding to the feature vector x.
Then equation (3) can be written as y = AT k ( x ) , where AT = [a1 ,..., aN ] is the coefficient matrix. The optimal coefficient matrix A is obtained by solving the eigen-equation
∑B
(K )
T A = ∑ W AΛ ( A ∑ W A = I ) . (K )
(K )
Here, Λ is a diagonal matrix of eigenvalues, and I denotes the unit matrix. The matrics
∑W
(K )
and
∑B
(K )
are the within-class covariance matrix and the between-class
covariance matrix of the kernel bases vectors k(x). The dimension of the new feature vector y is limited to min (K-1, N). However, the setting is ill-posed as a result of estimating the N × L coefficients of the matrix A from N samples. So some regularization technique needs to be introduced. One of the simplest methods is to simply add a multiple of the identity matrix to
∑W
(K )
as ~
∑ W( K ) = ∑ W
(K)
+β I
It makes the problem numerically more stable due to the within-class covariance matrix ~
∑ W( K )
becomes positive definite as for large β . This is roughly equivalent to adding
independent noise to each of the kernel bases [1]. 2.2 Kernel Principal Component Analysis (KPCA)
PCA can be expressed as the diagonalization of an n-sample estimate of the covariance n matrix Cˆ = 1 ∑φ ( xi )φ ( xi )T , which represents a transformation of the original data to the n
i =1
new coordinates as defined by the orthogonal eigenvectors V. These eigenvectors V ˆ . This problem is (and the corresponding eigenvalue λ ) are obtained from λ V = CV
164
S. Li and C. Liao
equivalent to nλα = Kα , where α is the column vector with coefficients n
α1 ,..., α n such that V = ∑αiφ ( xi ) , and K is the kernel matrix. Normalizing the solution i =1
v k corresponding to the non-zero eigenvalue λ%k = nλ k of the matrix K translates into the condition λk (α k ⋅ α k ) = 1 . Finally, the projection of φ ( x) onto the eigenvector V k can be computed as [2]: n
β ( x ) k := (V k ⋅ φ ( x)) = ∑ α ik K ( xi , x ) . i =1
2.3 Kernel Partial Least Squares (KPLS) Let φ be the n × m ' matrix of input samples in F, where m ' is the dimensionality of F. Denote its ith row by the vector φ ( xi )T . Let φ ' be the n × m ' deflated dataset and Y ' the n × 1 deflated class label. The rule of deflation is
φ ' = φ − t (t T φ )
(2)
Y ' = Y − t (t T Y )
where t is a score vector (component) which is obtained as follows. Let w and c be the weight vectors. The process starts with random initialization of the Y-score u and then iterates the following steps until convergence: (1) w = XTu/(uTu); (2) ||w|| → 1; (3) t = Xw; (4)c = YTt/tTt;(5) u = Yc/(cTc); repeat steps 1.-5. The process is iterated Fac times. As a result, the deflated dataset can be obtained from the original dataset and the PLS component, while the deflated class labels can be obtained from the original class labels and the PLS component. Denote the obtained sequences of t’s and u’s by the n × 1 vectors t1 , t2 ,...t Fac and u1 , u2 ,...uFac , respectively. Moreover, let T = [t1 , t2 ,...tFac ] and U = [u1 , u2 ,...uFac ] . The “kernel trick” can be utilized and results in K = φφ T , where K stands for the n × n kernel matrix: K (i, j ) = k ( xi , x j ) and k is the kernel function. K can now be directly used in the deflation instead of φ , as K ' = ( I n − tt T ) K ( I n − tt T )
(3)
Here, K ' is the deflated kernel matrix and I n is the n-dimensional identity matrix. Now Eq.(2) takes the place of Eq.(1). So the deflated kernel matrix is obtained from the original kernel matrix and the PLS component. In kernel PLS, the assumption that the variables of X have zero mean in linear PLS should also be held. One can center the mapped data in the feature space F as:
1 1 K = ( I n − l n ln T ) K ( I n − l n ln T ) n n
Feature Extraction for Cancer Classification Using Kernel-Based Methods
165
Here, l n is the n × 1 vector with all elements equal to one. Given a set of test samples
{ zi }i =1 n
(where zi ∈
m
), its projection onto the feature space is D p = KtU (T T KU ) −1
where D p = [d1 , d 2 ,..., d nt ]T is a nt × p matrix, with the p KPLS components as its columns and the nt test samples in the reduced-dimensional space as its rows, K t is the nt × n kernel matrix defined on the test set such that K t (i, j ) = K ( zi , x j ) , T T KU is an upper triangular matrix and thus invertible. The centered kernel matrix K t defined on the test set can be calculated as [4]
1 1 K t = ( K t − l n lnT K )( I n − l n lnT ) n n 2.4 Kernel Independent Component Analysis (KICA)
KICA produces a set of nonlinear features of the input data by performing ICA in the kernel-induced feature space F. The input data X is first whitened in F by using the −
1
whitening matrix WPφ = (Λφ ) 2 (V φ )T , where Λφ and V φ contain the eigenvalues and 1 n eigenvectors of the covariance matrix Cˆ = ∑ i =1φ ( xi )φ ( xi )T . The whitened data is n then obtained as: X Wφ = (WPφ )T φ ( X ) = (Λφ ) −1α T K , where K is the kernel matrix, and α is the eigenvector matrix of K. After the whitening transform, we iterate the ICA learning iteration algorithm: U Iφ = WIφ X Wφ
ΔWWφ = [ I + ( I −
2 1+ e
−U Iφ
)(U Iφ )T ]WIφ
WˆIφ = WIφ + ρΔWIφ → WIφ
until WIφ converges. Here ρ is the learning rate. On testing, the new feature representation of a test pattern y can be computed as:
s = WIφ (Λφ ) −1α T K ( X , y ) where K ( X , y ) = [k ( x1 , y ), k ( x2 , y ),..., k ( xn , y )]T [5].
3 Proposed Method Denote the number of genes (features) and the number of samples (observations) in the gene expression dataset by M and N respectively. The whole data set can also be represented by the matrix:
166
S. Li and C. Liao x12 ⎡ x11 ⎢x x 21 22 X=⎢ ⎢ M M ⎢ x x ⎣ M1 M 2
L x1N ⎤ L x2 N ⎥⎥ O M ⎥ ⎥ L xMN ⎦
where xij is the measurement of the expression level of gene i in sample j. Let xj = (x1j, x2j, ..., xMj) denote the ith sample of X, and yj the corresponding class label (e.g., tumor type or clinical outcome). In the following, we assume that there are only two classes (positive class and negative class) in the sample. The proposed method is as follows: Step 1. Preprocessing using T-test: Large dimensionality increases the complexity and computation load, so the dataset is preprocessed by T-test at first. For each gene i, we compute the mean μi+ (respectively, μi− ) and standard deviation δ i+ (respectively, δ i− ) for the positive (respectively, negative) samples. Then a score T ( xi ) can be obtained as: μi+ − μi− T ( xi ) =
(δ i+ )2 (δ i− ) 2 + n+ n−
where n+ and n− are the numbers of samples in the positive and negative classes respectively. Genes are ranked according to their T, and the top p genes are selected to form a reduced dataset. Step 2. Kernel-based methods, as reviewed in section 2, are used to further extract highly informative and discriminative features to form a new training set.
Fig. 1. Schematic diagram of the whole process
Step 3. Training and classification using the SVM: The SVM is an efficient binary classification algorithm. It computes the hyperplane that maximizes the margin between the training examples and the class boundary in a high-dimensional
Feature Extraction for Cancer Classification Using Kernel-Based Methods
167
kernel-induced feature space. Due to the nonlinear mapping between the input space and the feature space, the linear discriminant function constructed by SVM in the feature space corresponds to a nonlinear function in the original input space. Step 4. Finally, the new training dataset with the extracted features is used to train a SVM, this classifier can be used for predictions on the test set. The schematic diagram of the whole process is shown in Fig.1.
4 Experimental Results 4.1 Setup In this section, we evaluate the performance of the proposed feature extraction method on three benchmark datasets: (1) Breast cancer dataset: It contains 7,129 genes and 38 samples. 18 of these samples are ER+ (estrogen receptor) while the remaining 20 are ER- [6]. (2) Leukemia dataset: It contains 7,129 genes and 72 samples. 47 of these samples are of Acute Myeloid Leukemia (AML) and the remaining 25 are of Acute Lymphoblastic Leukemia (ALL) [7]. (3) Colon cancer dataset: It contains 2,000 genes and 62 samples. 22 of these samples are of normal colon tissues and the remaining 40 are of tumor tissues [8]. The gaussian kernel
k ( x, y ) = exp(− x − y / γ ) 2
where γ is the width parameter, is used in the four kernel methods. The adjustable parameters in the T-test and kernel-based methods are listed in the following: 1. 2. 3. 4. 5.
p associated with the T-test method; Width parameter γ in the Gaussian kernel; Number of score vectors (Fac) used in KPLS; Number of principal components (K) used in KPCA; Regularization constant (mu) added to the diagonal of the within-class scatter matrix used in KFDA.
The linear kernel is used in the SVM. Values of the soft-margin parameter (C) used on the different datasets are shown in Table 1. Table 1. Values of the soft-margin parameter (C) used in the SVM
KPLS, KPCA and KFDA KICA
breast cancer 1 1
leukemia 10 1
colon cancer 100 1
Because of the small dataset size, leave-one-out (LOO) cross-validation is used to obtain the testing accuracy. Both feature extraction and classification are put together in each LOO iteration, i.e., they are performed on the training subset and then the performance is obtained on the left-out example using the extracted features.
168
S. Li and C. Liao
4.2 Results Here, we compare the performance of the four kernel-based procedures: KPLS, KPCA, KFDA and KICA. The testing accuracies on the three benchmarked datasets obtained using different parameter settings are shown in Table 2-5, respectively. Overall, KPLS and KFDA show the best classification performance on all three datasets. Both of them achieve the best accuracy of 100% on the breast cancer and leukemia datasets. On the colon cancer dataset, both attain an accuracy of 91.9%, which is the highest in our experiment. On the other hand, KPCA attains an accuracy of 100% on the breast cancer dataset and 98.6% on the leukemia dataset, but only 88.7% on the colon cancer dataset. KICA performs less well in comparison with the other three. It attains an accuracy of 100% on the breast cancer dataset, but only 97.2% on the leukemia dataset and 88.7% on the colon cancer dataset. On the breast cancer dataset, both KPLS and KPCA attain 100% testing accuracy with only 2 features, while KFDA uses 3 features and KICA uses 7 features to achieve 100%. On the leukemia dataset, KFDA attains the best accuracy with only 3 features while KPLS uses 5 features. KPCA and KICA can not obtain 100%. On colon cancer dataset, KFDA attains the best accuracy 91.9% with only 4 features while KPLS uses 10 features, and KPCA as well as KICA can not get 91.9%. In conclusion, KPLS and KFDA outperform the other two in number of features to get the best accuracy. As can be seen in the tables, the prediction accuracy is highly dependent on the choice of the parameters. Obviously, KPLS is influenced by the parameter p. Also, the effect of Fac cannot be neglected on all three datasets. In comparison, γ has a weaker effect than p and Fac. Table 2. Testing accuracies (%) using T-test and KPLS
γ 100
200
300
100
200
300
Fac 2 5 10 2 5 10 2 5 10 2 5 10 2 5 10 2 5 10
p
50
90
breast cancer 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 97.4 97.4 100.0 97.4 97.4 100.0 97.4 97.4
p
100
500
leukemia 97.2 98.6 93.1 97.2 98.6 93.1 95.8 98.6 93.1 97.2 100.0 100.0 98.6 100.0 98.6 98.6 100.0 97.2
p
50
90
colon cancer 88.7 85.5 82.3 87.1 85.8 79.0 87.1 85.5 79.0 90.3 88.7 91.9 90.3 90.3 91.9 90.3 90.3 90.3
Feature Extraction for Cancer Classification Using Kernel-Based Methods Table 3. Testing accuracies (%) using T-test and KPCA
γ 100
200
300
100
200
300
K 2 5 10 2 5 10 2 5 10 2 5 10 2 5 10 2 5 10
p
50
90
breast cancer 100.0 100.0 100.0 52.6 52.6 52.6 52.6 52.6 52.6 100.0 100.0 100.0 94.7 94.7 92.1 52.6 55.3 52.6
p
leukemia 94.4 94.4 95.8 73.6 100 84.7 86.1 65.3 65.3 65.3 98.6 98.6 97.2 97.2 98.6 500 98.6 95.8 98.6 97.2
p
colon cancer 88.7 88.7 87.1 88.7 50 88.7 88.7 88.7 88.7 88.7 88.7 88.7 88.7 88.7 88.7 90 88.7 88.7 88.7 88.7
Table 4. Testing accuracies (%) using T-test and KFDA (with mu=10-m)
γ 5
10
15
5
10
15
m 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5
p
50
90
breast cancer 100.0 97.3 94.7 100.0 97.4 97.4 100.0 100.0 97.4 97.4 97.4 97.4 100.0 97.4 97.4 100.0 100.0 97.4
p
leukemia 95.8 94.4 94.4 95.8 100 95.8 94.4 97.2 95.8 94.4 100.0 100.0 98.6 98.6 500 100.0 100.0 97.2 100.0 100.0
p
colon cancer 87.1 88.7 82.3 62.9 50 72.6 88.7 64.5 62.9 87.1 90.3 91.9 91.9 62.9 90 90.3 90.3 64.5 59.7 90.3
169
170
S. Li and C. Liao Table 5. Testing accuracies (%) using T-test and KICA
p 5
6
7
8
9
10
γ 100 200 300 100 200 300 100 200 300 100 200 300 100 200 300 100 200 300
breast cancer 97.4 97.4 97.4 97.4 97.4 97.4 100.0 100.0 100.0 97.4 97.4 97.4 97.4 97.4 97.4 100.0 100.0 100.0
leukemia 97.2 97.2 97.2 95.8 95.8 95.8 97.2 97.2 97.2 95.8 95.8 95.8 95.8 95.8 95.8 95.8 95.8 95.8
colon cancer 88.7 88.7 88.7 88.7 88.7 88.7 88.7 88.7 88.7 83.9 83.9 83.9 80.7 80.7 80.7 82.3 82.3 82.3
For KPCA, γ has a strong impact on the breast cancer and leukemia datasets. For example, when γ =100, and p=50, the testing accuracy is 100% on breast cancer. However, when γ changes to 200 (with the same value of p), the accuracy drops to only 52.6%. For KFDA, p has the most obvious impact on all three datasets. mu and γ also have obvious effects on the testing accuracy result. For KICA, p also has the strongest influence, while the effect of γ is less obvious. Due to the slow speed of the iterative procedure, we have to use very small values of p so that the data do not become so large. Otherwise, the running time will be very long and the iterative procedure may also have numerical problems.
5 Conclusions In this paper, we propose kernel-based feature extraction method for cancer classification and discuss the performances of four kernel methods, KPLS, KPCA, KFDA and KICA. Experiments are performed on the breast cancer, leukemia and colon cancer datasets. We also compare them with other methods reported in the literature. The proposed method shows superior classification performance on all three datasets, and thus proves to be reliable for feature extraction.
Feature Extraction for Cancer Classification Using Kernel-Based Methods
171
Acknowledgements. This paper is supported by the Program for New Century Excellent Talents in University and the Excellent Youth Foundation of Hunan Province (06JJ1010).
References 1. Kurita, T., Taguchi, T.: A Modification of Kernel-based Fisher Discriminant Analysis for Face Detection. In: Proceedings of International Conference on Automatic Face and Gesture Recognition, Washington DC, pp. 300–305 (2002) 2. Schölkopf, B., Smola, A., Müller, K.-R.: Kernel Principal Component Analysis. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 327–352. MIT Press, Cambridge, MA (1999) 3. Müller, K., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An Introduction to Kernel-Based Learning Algorithms. IEEE Trans. on Neural Networks, 180–201 (2001) 4. Rosipal, R., Trejo, L.J., Matthews, B.: Kernel PLS-SVC for Linear and Nonlinear Classification. In: Proceedings of the Twentieth International Conference on Machine Learning, Washington DC, pp. 640–647 (2003) 5. Bach, F.R., Jordan, M.I.: Kernel Independent Component Analysis. J. Machine Learning Research 3 (2002) 6. West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Marks, J.R., Nevins, J.R.: Predicting the Clinical Status of Human Breast Cancer Using Gene Expression Profiles. Proceedings of the National Academy of Science 98, 11462–11467 (2001) 7. Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 28, 531–537 (1999) 8. Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proceedings of the National Academy of Science 96, 6745–6750 (1999)
A New Hybrid Approach to Predict Subcellular Localization by Incorporating Protein Evolutionary Conservation Information ShaoWu Zhang1, YunLong Zhang2, JunHui Li, and HuiFeng Yang1, YongMei Cheng1, and GuoPing Zhou3 1
College of Automation, Northwestern Polytechnical University, Xi’an, 710072, China
[email protected] 2 Department of Computer, First Aeronautical Institute of Air Force, Henan, 464000, China
[email protected] 3 Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, Massachusetts 02115, USA
[email protected]
Abstract. The rapidly increasing number of sequence entering into the genome databank has created the need for fully automated methods to analyze them. Knowing the cellular location of a protein is a key step towards understanding its function. The development in statistical prediction of protein attributes generally consists of two cores: one is to construct a training dataset and the other is to formulate a predictive algorithm. The latter can be further separated into two subcores: one is how to give a mathematical expression to effectively represent a protein and the other is how to find a powerful algorithm to accurately perform the prediction. Here, an improved evolutionary conservation algorithm was proposed to calculate per residue conservation score. Then, each protein can be represented as a feature vector created with multi-scale energy (MSE). In addition, the protein can be represented as other feature vectors based on amino acid composition (AAC), weighted auto-correlation function and Moment descriptor methods. Finally, a novel hybrid approach was developed by fusing the four kinds of feature classifiers through a product rule system to predict 12 subcellular locations. Compared with existing methods, this new approach provides better predictive performance. High success accuracies were obtained in both jackknife cross-validation test and independent dataset test, suggesting that introducing protein evolutionary information and the concept of fusing multifeatures classifiers are quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.
1 Introduction One of the fundamental goals in cell biology and proteomics is to identify the functions of proteins in the cellular environment. Determination of protein subcellular location purely using experimental approaches is both time-consuming and expensive. Particularly, the number of new protein sequences yielded by the high-throughput K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 172–179, 2007. © Springer-Verlag Berlin Heidelberg 2007
A New Hybrid Approach to Predict Subcellular Localization
173
sequencing technology in the postgenomic era has increased explosively. Facing such an avalanche of new protein sequences, it is both challenging and indispensable to develop an automated method for fast and accurately annotating the subcellular attributes of uncharacterized proteins. The knowledge thus obtained can help us timely utilize these newly found protein sequences for both basic research and drug discovery [1, 2]. During the last decade, many theoretical and computational methods were developed in an attempt to predict subcellular localization of protein [3-13]. However, all these prediction methods were established basically based on a single classifier, or based on the statistical approach and amino acid physical-chemical character to represent protein sequences. Obviously, the prediction quality would be considerably limited by using only one single classifier, statistical feature and physical-chemical feature information to deal with piled-up complicated protein sequences with extreme variation in both sequence order and length. To further improve the predictive quality, a logical and key step would be to find an effective way to represent protein information and a powerful classifier. Here, by proposing an improved method to calculate protein evolutionary conservation information, the samples of proteins were formulated by hybridizing the multisource information derived from evolutionary conservation scores, weighted auto-correlation functions [14], moment descriptors [12] and multi-scale energy [13]. Based on the hybridization representation, a novel ensemble classifier was formed by fusing many individual classifiers through a product rule system [15]. The success rates obtained by hybridizing the multi-source information of proteins and the fusion classifier in predicting protein subcellular location were significantly improved.
2 Methods 2.1 Residue Conservation The residue ranking function assigns a score to each residue, and according to which they can be sorted in the order of the presumably decreasing evolutionary pressure they experience. Out of many methods proposed in the literature [16-18], Lichtarge research group’s hybrid methods [19](real-valued evolutionary trace method and zoom method) are the two robust methods, that rank the evolutionary importance of residues in a protein family which is based on the column variation in multiple sequence alignments (MSAs) and evolutionary information extracted from the underlying phylogenetic trees. However, the hybrid methods treat the gaps in the multisequences alignment as the 21st amino acid. So, we propose an improved algorithm to estimate the residue evolutionary conservation. The processes of calculation are as follows. Firstly, the initial similarity sequences were created by using three iterations of PsiBlast[20], with the 0.001 E-value cutoff, on the UniProt [21] database of proteins. The PsiBlast resulting sets were aligned by a standard alignment method such as ClustalW 1.8 [22]. So, the multiple sequence alignments (MSA) were obtained. Secondly, an MSA is divided into sub-alignments (that is, n groups) that correspond to nodes in the tree [19]. This subdivision of an MSA into smaller alignments
174
S. Zhang et al.
reflects the tree topology, and therefore the evolutionary variation information within it. Then, the evolutionary score for a residue belong to column i in an MSA is given by the following equation. Ri = 1 +
N−1
∑w n=1
20
n
∑w
f α log ∑ α
group(g)[−
node(n)
g=1
g
i
g 20 f iα
+ fi,ggap]
=1
(1)
where wnode (n) , wgroup (g ) are weights assigned to a node n and a group g, respectively. ⎧1 wnode (n) = ⎨ ⎩0
⎧1 wgroup ( g ) = ⎨ ⎩0
if n on the path to the query protein otherwise
if g on the path to the query protein otherwise
(2)
(3)
f iαg is the frequency of amino acid of type α ( α represents one of the 20 standard amino acids, that is, A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y) within a sub-alignment corresponding to group g at the level in which the sequence similarity tree is divided into n groups. Namely, the nodes (labeled by n) are assumed to be numbered in the order of increasing distance from the root, and each one of them has associated with it a division of the tree into n groups (subtrees). N is the number of alignment sequences.
f i ,ggap is the number of non-standard amino acids (such as“−”,
“X”, “Z”, “B”) of g group in the alignment position i, divided by the number of g group alignment sequences. Further details about division of tree nodes and groups can be found in literature [19]. 2.2 Multi-scale Energy [13] Through residue conservation scores calculating, the protein sequence of English letters can be translated into a numerical sequence. The numerical sequence can be considered as digital signal. Projecting the signal onto a set of wavelet basis functions with various scales, the fine-scale and large-scale conservation information of a protein can be simultaneously investigated. Here, the wavelet basis function used is symlet wavelet [23]. Consequently, the protein can be characterized as the following multi-scale energy (MSE) feature vector. MSE = [d1 ,L, d j ,L d m , am ]
(4)
Here, m is the coarsest scale of decomposition, dj is the root mean square energy of the wavelet detail coefficients in the corresponding j-th scale, and am is the root mean square energy of the wavelet approximation coefficients in the scale m. The energy factors dj and am are defined as: N j −1
dj =
1 Nj
∑[u (n)]
2
j
n =0
am =
N m −1 1 Nm
∑[v n =0
2 m (n)]
j = 1,2, L , m
(5)
A New Hybrid Approach to Predict Subcellular Localization
175
Here, Nj is the number of the wavelet detail coefficients, Nm is the number of the wavelet approximation coefficients, uj(n) is the n-th detail coefficient in the corresponding j-th scale, and vm(n) is the n-th approximation coefficient in the scale m. For the protein sequence with length L, m equals INT(log 2L). Combing with amino acid composition (AAC)which is consisted the 20-D components of the amino acid frequencies, the protein can be represented by the following (20+m+1)-D vector.
[
x = f1 , f 2 , L , fα , L , f 20 , d1 , d 2 , L , d j , L , d m , am
]T
(6)
Here fα ( α = 1,2, L ,20 ) is the occurrence frequencies of 20 amino acid in the protein concerned, arranged alphabetically according to their signal letter codes. Conveniently, the feature set based on the residue evolutionary conservation and MSE approach can be wrote as EMSE. 2.3 Weighted Auto-correlation Functions [14]
In order to calculate the weighted auto-correlation functions, we replace each residue in the primary sequence by its amino acid index PARJ860101, which can be downloaded from http://www.genome.ad.jp/dbget. Consequently, the replacement results in a numerical sequence h1 , h2 ,L, hl ,L, hL . The weighted auto-correlation functions rj are defined as: rj =
w L− j ∑ hl hl + j , L − j l =1
j = 1,2,L, λ
(7)
Here hl is the amino acid index for the l-th residue, w is weighted factor and L is the length of protein sequence. Combing with amino acid composition which is consisted the 20-D components of the amino acid frequencies, the protein can be represented by the following (20+ λ )D vector.
[
x = f1 , f 2 , L , fα , L , f 20 , r1 , r2 , L , r j , L , rλ
]T
(8)
Conveniently, the feature set based on the weighted auto-correlation functions approach can be wrote as PARJ. 2.4 Moment Descriptor [12] According to the literature [12], the protein can be represented by the following vector: X = [ f1 , f 2 , L , f α , L f 20 , μ1 , μ 2 , L μ i , L μ 20 , ν1 , ν 2 , L , ν i L , ν 20 ]
(9)
Here,
μi =
1 L ∑ xij • j L j =1
νi =
1 L ( xij • j − μi )2 ∑ L j =1
(10)
176
S. Zhang et al.
if amino acid α i appears at position j in the sequence ⎧1 xij = ⎨ ⎩0 if amino acid α i does not appear at position j in the sequence
(11)
Conveniently, the feature set based on the Moment descriptors approach can be wrote as MD.
3 Result and Discusion 3.1 Results with Different Feature Extraction Methods The training dataset and independent dataset taken from Chou [4] were used to validate the current method. The prediction quality was examined by the standard testing procedure in statistics, which are the jackknife (JACK) and independent dataset tests (INDE). Of these two, the jackknife test is regarded as the most objective and effective [24-25]. The results of four feature extraction methods based on support vector machine (SVM) [26] and “one-versus-one” classifying policy [14] are shown in table 1. Table 1. Results (in percentage) of four feature extraction methods with SVM and “one-versusone” classification strategy
Chloroplast Cytoplasm Cytoskeleton Endoplasmic reticulum Extracellular Golgi apparat Lysosome Mitochondrial Nuclear Peroxisomal Plasma membrane Vacuoles Overall accuracy
AAC JACK IND 59.1 60.6 85.9 83.9 41.2 94.7 32.7 70.8 69.6 84.2 16 0.50 56.8 87.1 26.5 12.9 80.8 76.4 22.2 43.5 92.7 96.3 33.3 77.1 80
EMSE JACK IND 67.9 59.6 88.4 87.6 44.1 100 38.8 84.9 67.9 83.1 20 25 51.4 96.8 43.4 20.9 87.1 80.4 14.8 39.1 92.7 96.5 25 79.5 82.8
PARJ JACK IND 59.1 65.1 89 88.8 47.1 100 36.7 70.8 73.2 88.4 24 25 54.1 96.8 41 17.8 84.1 77.5 22.2 47.8 96 99 33.3 80.5 83.3
MD JACK IND 66.4 77.1 90.5 89.1 50 94.7 34.7 69.8 67.9 87.4 16 50 51.4 87.1 38.6 14.1 81.9 83.1 7.4 30.4 94.3 97.1 20.8 79.4 83.5
Table 1 shows that protein evolutionary conservation information can be used to predict subcellular location. The overall accuracies of EMSE, PARJ and MD are almost equal, but they are all higher than that of AAC in jackknife and independent tests. For EMSE, the predictive accuracy is critically dependent on the input selection of sequences and also on the breadth and the depth of the associated sequence similarity tree. That is, how many initial similarity sequences were selected, and how to prune these sequences to form multiple alignment sequences? If the optimal two parameters were selected, we can obtain better results. Considering the computer power, the cutoff of initial similarity sequences was defined as 250, and we did not prune the initial similarity sequences. These results indicate the performance of predictive system can be improved by using different feature extraction methods. EMSE, PARJ and
A New Hybrid Approach to Predict Subcellular Localization
177
MD are effective to represent protein sequence and good robust for predictiing subcellular localization.
3.2
Comparison with Other Prediction Methods
The performance of the hybrid method developed in this study was compared with existing methods such as Pan’s [6], Gao’s [9] and Xia’s [10,11], which were also developed from the same dataset. The results demonstrated that overall prediction accuracies of our hybrid method are higher than that of other four methods both in the Jackknife and independent tests. For example, the overall accuracy of the hybrid method is 8.1%, 4.7% greater than that of Xia’s method [10] in the Jackknife and independent tests respectively. Table 2. Overall accuracy (in percentage) obtained by different methods Method
Jackknife test
Independent test
Pan et al [6] Gao et al [9] Xia et al [10] Xia et al [11] Hybrid (AAC+ EMSE+PARJ+MD)
67.7 69.6 73.6 72.6 81.7
73.9 79.8 74.8 85.1
4 Conclusions A new kind of protein evolutionary feature extraction method and a hybrid approach to fuse the four feature classifiers were proposed in this study. The results show that using residue evolutionary conservation and multi-scale energy to represent protein can better reflect protein evolutionary information and predict the subcellular locations. Weighted auto-correlation function and Moment descriptor methods can optimally reflect the sequence order effect. It is demonstrated that the novel hybrid approach by fusing four feature classifiers is a very intriguing and promising avenue.
Acknowledgements. This paper was supported in part by the National Natural Science Foundation of China (No. 60372085 and 60634030), the Technological Innovation Foundation of Northwestern Polytechnical University (No. KC02), the Science Technology Research and Development Program of Shaanxi Province (No. 2006k04-G14).
References 1. Chou, K.C.: Review: Structural bioinformatics and its impact to biomedical science. Curr. Med. Chem. 11, 2105–2134 (2004) 2. Lubec, G., Afjehi-Sadat, L., Yang, J.W., John, J.P.: Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog. Neurobiol. 77, 90–127 (2005)
178
S. Zhang et al.
3. Chou, K.C., Elrod, D.W.: Protein subcellular location prediction. Protein Engineering 12, 107–118 (1999) 4. Chou, K.C.: Prediction of protein subcellular locations by incorporating quasi-sequenceorder effect. Biochem. Biophys. Research Commun. 278, 477–483 (2000) 5. Chou, K.C.: Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Structure, Function, and Genetics 43, 246–255 (2001) 6. Pan, Y.X., Zhang, Z.Z., Guo, Z.M., Feng, G.Y., Huang, Z.D., He, L.: Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach. J. Protein Chem. 22, 395–402 (2003) 7. Zhou, G.P., Doctor, K.: Subcellular location prediction of apoptosis proteins. PROTEINS: Struct. Funct. Genet. 50, 44–48 (2003) 8. Park, K.J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acid and amino acid pairs. Bioinformatics 19, 1656– 1663 (2003) 9. Gao, Y., Shao, S., Xiao, X., Ding, Y., Huang, Y., Huang, Z., Chou, K.C.: Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acid 28, 373–376 (2005) 10. Xia, X., Shao, S., Ding, Y., Huang, Z., Huang, Y., Chou, K.C.: Using complexity measure factor to predict protein subcellular location. Amino Acid 28, 57–81 (2005) 11. Xia, X., Shao, S., Ding, Y., Huang, Z., Huang, Y., Chou, K.C.: Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acid 30, 49–54 (2006) 12. Shi, J.Y., Zhang, S.W., Liang, Y., Pan, Q.: Prediction of Protein Subcellular Localizations Using Moment Descriptors and Support Vector Machine. In: PRIB: 2006, Hong Kong,China, pp. 105–114. Springer, Heidelberg (2006) 13. Shi, J.Y., Zhang, S.W., Pan, Q., Cheng, Y.M., Xie, J.: SVM-based Method for Subcellular Localization of Protein Using Multi-Scale Energy and Pseudo Amino Acid Composition. Amino Acid (2007) DOI 10.1007/s00726-006-0475-y 14. Zhang, S.W., Pan, Q., Zhang, H.C., Shao, Z.C., Shi, J.Y.: Prediction Protein Homooligomer Types by Pesudo Amino Acid Composition: Approached with an Improved Feature Extraction and Naive Bayes Feature Fusion. Amino Acid 30, 461–468 (2006) 15. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On Combining Classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence 20, 226–239 (1998) 16. Lichtarge, O., Bourne, H., Cohen, F.: An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996) 17. Valdar, W.S.: Scoring residue conservation. Proteins 48, 227–241 (2002) 18. Soyer, O.S., Goldstein, R.A.: Predicting functional sites in proteins: Site-specific evolutionary models and their application to neurotransmitter transporters. J. Mol. Biol. 339, 227–242 (2004) 19. Mihalek, I., Reš, I., Lichtarge, O.: A Family of Evolution–Entropy Hybrid Methods for Ranking Protein Residues by Importance. J. Mol. Biol. 336, 1265–1282 (2004) 20. Altschul, S., Madden, T., Schffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997) 21. UniProt (2005), http://www.expasy.org/ 22. Thompson, J., Higgins, D., Gibson, T.: Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)
A New Hybrid Approach to Predict Subcellular Localization
179
23. Pittner, S., Kamarthi, S.V.: Feature extraction from wavelet coeffi-cients for pattern recognition tasks. IEEE Trans. Pattern Anal. Mach. Intell. 21, 83–88 (1999) 24. Zhou, G.P.: An intriguing controversy over protein structural class prediction. J. Protein Chem. 17, 729–738 (1998) 25. Zhou, G.P., Assa-Munt, N.: Some insights into protein structural class prediction. Proteins: Structure, Function, and Genetics 44, 57–59 (2001) 26. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Support Vector Machine for Prediction of DNA-Binding Domains in Protein-DNA Complexes Jiansheng Wu, Hongtao Wu, Hongde Liu, Haoyan Zhou, and Xiao Sun* State Key Laboratory of Bioelectronics Southeast University, Nanjing 210096, China
[email protected],
[email protected]
Abstract. In this study, we present a classifier which takes an amino acid sequence as input and predicts potential DNA-binding domains with support vector machines (SVMs). We got amino acid sequences with known DNAbinding domains from the Protein Data Bank (PDB), and SVM models were designed integrating with four normalized sequence features(the side chain pKa value, hydrophobicity index , molecular mass of the amino acid and the number of isolated electron pairs) and a normalized feature on evolutionary information of amino acid sequences. The results show that DNA-binding domains can be predicted at 74.28% accuracy, 68.39% sensitivity and 79.76% specificity, in addition , at 0.822 ROC AUC value and 0.549 Pearson’s correlation coefficient.
1 Introduction Essential functions are performed by many proteins through interactions with nucleic acid molecules. For instance, transcription factors binding to specific cis-acting elements in the promoters result in regulation of gene expression[1]. Therefore, it is important for understanding a serial of biological processes to identification of the amino acid residues binding DNA or RNA. It is much helpful to understand the mechanisms of protein–nucleic acid interactions by analyzing structural data. The information provided from analyzing structural data has been used to predict DNAbinding residues in solved protein structures from amino acid sequence data, which have rapid increasement from many organisms [2,3,4,5,6]. In the past, artificial neural networks was been constructed coding with sequence information and residue solvent accessibility for prediction of DNA-binding residues, and got the performance which is 40.3% sensitivity and 81.8% specificity [2]. Evolutionary information, that is a position-specific scoring matrix (PSSM), was been shown to improve the predictive performance to 68.2% sensitivity and 66.0% specificity [5]. Recently, support vector machines (SVMs) combining with three simple sequence features was for the prediction of DNA and RNA-binding residues ,and the performance was at 70.31% accuracy,69.40% sensitivity,70.47% specificity and 0.7542 ROC AUC value [6]. As we known, protein–nucleic acid interactions are indeed that nucleic acid interact with some specified domains of proteins ,not just residues. It is actually more important to predict DNA-binding domains than DNA-binding residues. But till *
Corresponding author.
K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 180–187, 2007. © Springer-Verlag Berlin Heidelberg 2007
Support Vector Machine for Prediction of DNA-Binding Domains
181
now, no work have been done to predict nucleic acid -binding domains. In the present study , we developed a support vector machine based algorithm to predict the DNA-binding domains and got significantly high performance. We show that our SVM models can predict DNA-binding domains at 74.28% accuracy, 68.39% sensitivity and 79.76% specificity, in addition , at 0.822 ROC AUC value and 0.549 Pearson’s correlation coefficient.
2 Materials and Methods 2.1 Data Set PDNA-62, an amino acid sequence dataset, was utilized to construct SVM models for predicting DNA-binding domains. The PDNA-62 dataset was from 62 structure files of typical protein–DNA complexes and had less than 25% identity among the sequences [2,5,6].A segment of residues was appointed as a binding domain if it existed one or more amino acid residues in which any atoms had the distance less than a cutoff of 3.5 Å from any atoms of the DNA molecule in the complex [2, 5,6]. All the other segments of residues were designated as non-binding domains. we developed a Perl program which input a set of structure files and output a result file of amino acid sequences in which each segment of residues were labeled as a binding or non-binding domain according to the above cutoff scale. The PDNA-62 dataset contains 3667 DNA-binding domains and 3966 non-binding domains. 2.2 Feature of DNA-Binding Domains As in the previous study [6], the length of each domain was assigned as 11 in this study. The sum of (n-10) domains were extracted from an amino acid sequence with n residues. A domain that was DNA-binding was labeled with 1 (positive), or -1 (negative) was labeled if the target domain was non-binding. Each residue among every domain was coded with four biochemical features, where three were described in preference [6](the side chain pKa value, hydrophobicity index , molecular mass of the amino acid) and another new feature (the number of isolated electron pairs) was presented in this study. The three features described in preference [6] were normalized to get that the average is 0 and the standard deviation is 1 by the following standard logistic functions [7]:
Sα(i ) =
σα= P
( Pα(i ) − Pα)
σα
(1)
P
20 ⎛ 2 0 ∑ Pα ( i ) 2 − ⎜ i=1 ⎝ 400
20
∑
i=1
⎞ Pα ( i ) ⎟ ⎠
2
(2)
where S is normalized property values, α is the index of the property and i stands for the amino acid. P is the property value, and σPα are the average and the standard deviation of property α.
182
J. Wu et al. Table 1. The number of isolated electron pairs of each amino acid
Amino acid Isolated electron pairs Amino acid Isolated electron pairs
ALA
CYS
ASP
GLU
PHE
GLY
HIS
ILE
LYS
LEU
0
1
2
2
0
0
2
0
1
0
MET
ASN
PRO
GLN
ARG
SER
THR
VAL
TRP
TYR
1
2
1
2
3
1
1
0
1
1
The number of isolated electron pairs is related to the potential for hydrogen bond , which is the main force for DNA-protein binding, by residue in protein-DNA complex. The list of the sum of isolated electron pairs of each kind of amino acid are provided in table 1. In this work, PSI-BLAST program was utilized to get multiple sequence alignment profiles. We firstly downloaded the updated non-redundant (NR) protein sequence database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/). Position-specific score matrices (PSSMs) were obtained using PSI-BLAST with three rounds and a cutoff E-value of 0.001 against the filtered NR database through masking out low-complexity regions and coiled coil segments. The PSSMs elements were scaled to 0–1 range by the standard logistic function [8]:
f ( x) =
1 1 + exp( x)
(3)
For a domain with 11 residues, the input vector consists of 264 values, including 44 biochemical feature values and 220 PSSMs values. 2.3 Support Vector Machine The SVM, introduced by Vapnik [9], is a learning algorithm for two- or multi-class classification problems and is known for its good performance. The basic principle of SVM is: for a given data set xi Rn (i = 1,... N) with corresponding labels yi (yi = +1 or -1, representing the two classes to be classified), SVM gives a decision function (classifier):
f(x) = sgn(
N
∑
i=1
yα i iK ( x, xi ) + b )
(4)
where αi are the coefficients to be learned and K is a kernel function. Parameters αi are trained through maximizing function : N
1 N ai − ∑ ai a j yi y j K ( xi , x j ) ∑ 2 i , j =1 i =1 N
where subject to 0≤ai≤C(i=1,…N) and
∑a i=1
i
yi = 0 .
(5)
Support Vector Machine for Prediction of DNA-Binding Domains
183
For this study, libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm) was used for data training and classifying[10].To ensure that the parameter estimation and model generation of SVM are independent of the test data, a 5-fold cross-validation approach was used to evaluate the classifier performance. The original data set were randomly divided into five parts, then alternately use one subset for testing and the other four sets for training in each of the five iterative steps. We used different kernel functions(linear function, polynomial function and radial basis function) and different values for libsvm parameters to optimize the prediction accuracy in our experiments. The best results were obtained by using the radial basis function kernel with C = 1 and γ=0.009. 2.4 Measurement of Algorithms Performance The predictions for the test data instances are compared with the corresponding class labels (binding or non-binding) to evaluate the classifiers. The overall accuracy, sensitivity , specificity for assessment of the prediction system are defined as
TP + TN TP + TN + FP + FN
(6)
Sensitivity=
TP TP+FN
(7)
Specificity=
TN TN+FP
(8)
Accuracy =
, ,
where TP TN FP and FN are the number of true positives, true negatives, false positives and false negatives respectively. To give a better comparison and balance of sensitivity and specificity of the models , the net prediction [11]is defined as
Net Prediction=
Sensitivity +Specificity 2
(9)
The algorithms performance in this study is also evaluated by receive operating characteristic curve (ROC curve). ROC curve is a useful technique for organizing classifiers and visualizing their performance[11]. ROC graphs are two-dimensional graphs in which true positive rates (sensitivity) is plotted on the Y axis and false positive rate (1- specificity) is plotted on the X axis. Random guessing would generate identical false positive and true positive rates on average. Therefore, the diagonal (y = x) in the ROC plot is the performance of random guessing. The ROC curves move towards the upper left corner, indicating rising accuracy of performance. The area under the ROC curve (Area under curve, AUC) can be used to characterize the performance of a classifier for siRNA sequences. The AUC value ranges from 0.5 to 1, and a rising AUC value indicates higher accuracy of performance.
184
J. Wu et al.
Pearson’s correlation coefficient (r value) based on the classifiers’ output values and the labels (1 or -1) were also applied to evaluate algorithms performance. The higher r value indicates the better algorithms performance [12].
3 Results and Discussion The performance of our SVM classifiers in 5-fold cross validations were shown in the classifier named features 1 of table 2. The results showed that our classifier named features 1 for DNA-binding domains achieves 74.28% overall accuracy (SD=0.87)with 68.39% sensitivity(SD=0.39) ,70.47% specificity( SD=1.89) and 74.07% net prediction(SD=0.97)(table 2). Table 2. Performance of difference features by SVM for prediction of DNA binding residues in proteins Classifier Features 1 a Features 2 b
Accuracy±SD c (%) 74.28±0.87 64.95±1.10
Sensitivity±SD (%) 68.39±0.39 60.39±2.18
Specificity±SD (%) 79.76±1.89 69.16±1.57
Net prediction d ±SD(%) 74.07±0.97 64.77±1.14
Features 1: including the side chain pKa value, hydrophobicity index , molecular mass of the amino acid , the number of isolated electron pairs and PSSMs; Features 2: considering the same feature (the side chain pKa value, hydrophobicity index , molecular mass of the amino acids) as preference [6]; SD: Standard deviation of five iteratives’ performances in 5-fold cross-validation; Net prediction: the average of sensitivity and specificity; The ROC curve our classifier named features 1 was shown in Figure 1. The AUC value is 0.822 and the Pearson’s correlation coefficient is 0.549 of our classifier named features 1 for prediction of DNA-binding domains ,and both of them were shown in table 3. These AUC values are significantly higher than that of random guessing (0.5). The algorithm output has significant correlation with the sample labels (1 or -1) (p<0.01, table 3). Table 3. Comparison of predictive performance of SVM with difference features on ROC AUC and Correlation coefficient Classifier Features 1 a Features 2 b
ROC AUC c 0.822 0.711
Correlation coefficient 0.549** d 0.368**
Features 1: including the side chain pKa value, hydrophobicity index , molecular mass of the amino acid , the number of isolated electron pairs and PSSMs; Features 2: considering the same feature (the side chain pKa value, hydrophobicity index , molecular mass of the amino acids) in preference [6]; ROC AUC: area under curve of receive operating characteristic curve (ROC curve) ; ** : p<0.01,significant;
Support Vector Machine for Prediction of DNA-Binding Domains
185
Our SVM classifier (named Features 1)for DNA-binding domains combining with more normalized features (the side chain pKa value, hydrophobicity index , molecular mass of the amino acid , the number of isolated electron pairs and PSSMs) appears to be better than the SVM classifier (named Features 2),considering as the same features in preference [6], in the same dataset (PDNA-62). Accuracy and net prediction of our SVM classifier(named Features 1) are 74.28% and74.07%, whereas 64.95% and 64.77% for the SVM classifier (named Features 2) (Table 2). In addition, the AUC values and the Pearson correlation coefficient of our SVM classifier(named Features 1) are 0.822 and 0.549 which are obviously higher than that of the other classifier (named Features 2), 0.711 and 0.368 (Table 3). Analysis of structural data has provided valuable information about the mechanisms of protein–nucleic acid interactions. Many algorithms that predict DNA binding residues directly from amino acid sequence data have been published recently [2,3,4,5,6].BindN predicted potential DNA-binding residues relying on support vector machines with three sequence features, including the side chain pKa value, hydrophobicity index and molecular mass of an amino acid, and got the performance of 69.40% sensitivity and 70.47% specificity[6]. Meanwhile, protein–nucleic acid interactions are actually that nucleic acid interact with some specified domains of proteins ,not just residues. Therefore, it is actually more important to predict DNAbinding domains than DNA-binding residues. Thus, in this paper, we put forward that predicting the DNA-binding domains would help us to further understand the mechanisms of how protein and nucleic acid interact. We developed a support vector machine based algorithm to predict the DNA-binding domains and got significantly high performance. At the atomic level, hydrogen bonds are the main interactions between amino acid residues and nucleotide bases in the protein–nucleic acid complex [13,14]. The 1.0
Sensitivity
0.8
0.6
0.4
Source of the Curve 0.2
Features 1 Features 2 Reference Line
0.0 0.0
0.2
0.4
0.6
0.8
1.0
1 - Specificity
Fig. 1. ROC curves for prediction of DNA-binding residues by SVMs with difference features
186
J. Wu et al.
number of isolated electron pairs of a domain is much related to the potential for its ability of binding to nucleic acids. We believe that this feature, the number of isolated electron pairs of amino acids, gave great contribution to the significant performance of our classifier. Here we performed a SVM based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a prediction of DNA-binding domains. When these residues among a domain are conserved, it is likely to cause by a specified purpose, for example, biological function. The prediction of nucleic acid binding residues was significantly improved by the use of PSSMs by compare with predictions based on informations of amino acid sequence and its environment cited in many studies [5]. The results also suggest that position-specific score matrices (PSSMs) had obvious contribution to the great achievement of our classifier to predict DNA-binding domains in proteins. One main problem in utilizing PSSM-derived prediction is time-consuming because of PSIBLAST iterations alignments against large sequence databases.
4 Conclusion In this study, we have presented a SVM-based approach for prediction of DNA binding domains based on amino acid sequence data and evolutionary information of amino acid sequences. The overall accuracy, net prediction reached 74.28% and 74.07% respectively. Meanwhile, the classifier’ s AUC value achieved 0.822 and the Pearson’s correlation coefficient is 0.549 for prediction of DNA –binding domains . We have believed that the prediction results would provide useful clues on how protein and nucleic acid interact. In the future, work will be performed to further improve the prediction performance by adding more features for the input encoding and exploring the contribution ratio of each feature. In addition, the current classifiers used a fixed cutoff distance (3.5 s) to discriminate binding ones from non-binding ones. Different cutoff distances and window lengths will be taken to construct a list of SVM models to predict DNA -binding domains . Acknowledgments. Foundation item: National Natural Science Foundation (No. 60671018 No. 60121101).
;
References 1. Ptashne, M.: Regulation of transcription: from lambda to eukaryotes. Trends Biochem. Sci. 30, 275–279 (2005) 2. Ahmad, S., Gromiha, M.M., Sarai, A.: Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 20, 477–486 (2004) 3. Jones, S., Shanahan, H.P., Berman, H.M., Thornton, J.M.: Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucleic Acids Res. 31, 7189–7198 (2003)
Support Vector Machine for Prediction of DNA-Binding Domains
187
4. Tsuchiya, Y., Kinoshita, K., Nakamura, H.: PreDs: a server for predicting dsDNA-binding site on protein molecular surfaces. Bioinformatics 21, 1721–1723 (2005) 5. Ahmad, S., Sarai, A.: PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics 6, 33 (2005) 6. Wang, L., Brown, S.J.: BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res. 34, 243–248 (2006) 7. Venkatarajan, M.S., Braun, W.: New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical-chemical properties. J. Mol. Modeling 7, 445–453 (2001) 8. Wang, Y., Xue, Z., Xu, J.: Better Prediction of the Location of -Turns in Proteins With Support Vector Machine. PROTEINS: Structure, Function, and Bioinformatics 65, 49–54 (2006) 9. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995) 10. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (Version 2.3) (2001), http://www.csie.ntu.edu.tw/~cjlin/papers/ libsvm.pdf 11. Egan, J.P.: Signal Detection Theory and ROC Analysis. Series in Cognitition and Perception. Academic Press, New York (1975) 12. Sætrom, P., Snøve, O.J.: A comparison of siRNA efficacy predictors. Biochem. Biophys. Res. Commun. 321(1), 247–253 (2004) 13. Jones, S., Daley, D.T.A., Luscombe, N.M., Berman, H.M., Thornton, J.M.: Protein-RNA interactions: a structural analysis. Nucleic Acids Res. 29, 943–954 (2001) 14. Luscombe, N.M., Laskowski, R.A., Thornton, J.M.: Amino acid-base interactions: a threedimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res. 29, 2860–2874 (2001)
Feature Extraction for Mass Spectrometry Data Yihui Liu School of Computer Science and Information Technology, Shandong Institute of Light Industry, Jinan, Shandong, China, 250353
[email protected]
Abstract. Mass spectrometry is being used to generate protein profiles from human serum, and proteomic data obtained from mass spectrometry have attracted great interest for the detection of early-stage cancer. However, high dimensional mass spectrometry data cause considerable challenges. In this paper a set of wavelet detail coefficients at different levels is used to characterize the localized changes of mass spectrometry data and reduce dimensionality of mass spectra. The experiments are performed on high resolution ovarian dataset. A highly competitive accuracy compared to the best performance of other kinds of classification models is achieved.
1 Introduction Mass spectrometry is being used to generate protein profiles from human serum, and proteomic data obtained from mass spectrometry have attracted great interest for the detection of early-stage cancer. Surface enhanced laser desorption/ionization time-offlight mass spectrometry (SELDI-TOF-MS) in combination with advanced data mining algorithms, is used to detect protein patterns associated with diseases [1,2,3,4,5]. As a kind of MS-based protein Chip technology, SELDI-TOF-MS has been successfully used to detect several disease-associated proteins in complex biological specimens such as serum [6,7,8]. The researchers [9] employ principle component analysis (PCA) for dimensionality reduction and linear discriminant analysis (LDA) coupled with a nearest centroid classifier [10] for classification. In [11], the researchers compare two feature extraction algorithms together with several classification approaches on a MALDI TOF acquired data. The T-statistic was used to rank features in terms of their relevance. Support vector machines (SVM), random forests, linear/quadratic discriminant analysis (LDA/QDA), knearest neighbors, and bagged/boosted decision trees were subsequently used to classify the data. More recently, in [12], both the GA approach and the nearest shrunken centroid approach have been found inferior to the boosting based feature selection approach. The researcher [13] examines the performance of the nearest centroid classifier coupled with the following feature selection algorithms. Student-t test, Kolmogorov-Smirnov test, and the P-test are univariate statistics used for filter-based feature ranking. Sequential forward selection and a modified version of sequential backward selection are also tested. Embedded approaches included K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 188–196, 2007. © Springer-Verlag Berlin Heidelberg 2007
Feature Extraction for Mass Spectrometry Data
189
shrunken nearest centroid and a novel version of boosting based feature selection. In addition, several dimensionality reduction approaches are also tested. Yu et al. [14] develop a novel method for dimensionality reduction and test on a published ovarian high-resolution SELDI-TOF dataset. They use a four-step strategy for data preprocessing based on: (1) binning, (2) Kolmogorov–Smirnov test, (3) restriction of coefficient of variation and (4) wavelet analysis. They use approximation coefficients. They indicated that “For the high-resolution ovarian data, the vector of detail coefficients contains almost no information for the healthy, since SVMs identify all the data as cancers”. They concluded the detail coefficients do not work on high-resolution mass spectrometry data using their four-step strategy method, and instead of using approximation coefficients in their research. However it is wavelet detail coefficients that characterize the localized change of mass spectrometry data and approximation coefficients only compress mass spectra. In our research we perform multi-level wavelet analysis on high dimensional mass spectrometry data. A vector of detail coefficients in wavelet subspace is extracted to characterize the localized changes of mass spectrometry data and reduce dimensionality. Finally wavelet features are input into SVM classifier to distinguish the diagnostic classes.
2 Wavelet Analysis For one dimensional wavelet analysis, a signal can be represented as a sum of wavelets at different time shifts and scales (frequencies) using discrete wavelet analysis (DWT). The DWT is capable of extracting the features of transient signals by separating signal components in both time and frequency. According to DWT, a timevarying function (signal) f (t ) ∈ L2 ( R) can be expressed in terms of φ (t ) and ψ (t ) as follows:
f (t ) = ∑ c 0 (k )φ (t − k ) + ∑ ∑ d j k j =1
k
= ∑ c j 0 (k )2 k
− j0 2 φ (2 − j 0 t
−j (k )2 2 ψ (2 − j t
− k)
−j
− k ) + ∑ ∑ d j (k )2 2 ψ (2 − j t − k ) k j= j0
where φ (t ),ψ (t ), c 0 , and d j represent the scaling function, wavelet function, scaling coefficients at scale 0, and wavelet detailed coefficient at scale j , respectively. The variable k is the translation coefficient for the localization of a signal for time. The scales denote the different (high to low) frequency bands. The variable symbol j0 is scale number selected. The wavelet filter-banks approach was developed by Mallat [15]. The wavelet analysis involves two compounds: approximations and details. For one dimensional wavelet decomposition, starting from signal, the first step produces two sets of coefficients: approximation coefficients (scaling coefficients) c1 , and detail coefficients (wavelet coefficients) d1 . These coefficients are computed by convolving signal with the low-pass filter for approximation, and with the high-pass filter for detail. The
190
Y. Liu
convolved coefficients are downsampled by keeping the even indexed elements. Then the approximation coefficients c1 are split into two parts by using the same algorithm and are replaced by c 2 and d 2 , and so on. This decomposition process is repeated until the required level is reached. c j +1 (k ) = ∑ h(m − 2k )c j (m) m
d j +1 (k ) = ∑ h1 (m − 2k )c j (m) m
where h(m − 2k ) and h1 (m − 2k ) are the low-pass filters and high-pass filters. c j and d j are approximation and detail coefficients at j level decomposition. Fig.1 shows wavelet decomposition tree at 5 levels. The mass spectrometry data is actually composed of five separate exponentials at different time. The transient change occurs in the five derivatives. The presence of noise, which is after all a fairly common situation in mass spectrometry data processing, makes identification of transient or localized changes more complicated. If the first levels of the decomposition can be used to eliminate a large part of the noise, the localized features are more significant at deeper levels in the decomposition. In our study detail coefficients at 3rd, 4th, and 5th level are employed to characterize the localized changes of mass spectra.
3 Support Vector Machine (SVM) The SVMs originate from the idea of the structural risk minimization developed by Vapnik [16]. SVMs are an effective algorithm to find the maximal margin hyperplane to separate two classes of patterns. Most methods for training a classifier (e.g., Bayesian, neural networks, and RBF) are based on minimizing the training error, i.e., empirical risk, while SVMs aim to minimize an upper bound on the expected generalization error, called structural risk minimization. SVM is thought as a linear algorithm in a high-dimensional feature space (dot product space) nonlinearly corresponding to input space. A separating hyperplane with maximum margin is constructed in a highdimensional feature space and a nonlinear decision boundary in input space corresponds to the separating hyperplane. Using a kernel function, it is possible to compute the separating hyperplane directly in input space [17]. The maximal margin hyperplane, which is defined only by the support vectors, may give the best separation between the classes. The support vectors can be regarded as selected representatives out of the training wavelet features, and are most critical for the separation of the two classes. In this study radial basis functions (RBF) K ( xi , x j ) = e
− xi − x j
2
/ r1
is used as kernel
function, where r1 is a strictly positive constant and we set its value to 1. Apparently the linear kernel is less complex than the polynomial and the RBF kernels. The RBF kernel usually has better boundary response as it allows for extrapolation, and most high-dimensional data sets can be approximated by Gaussian-like distributions similar to those used by RBF networks [18].
Feature Extraction for Mass Spectrometry Data
191
Fig. 1. Wavelet decomposition tree of mass spectrometry data and detail at 1st level. Symbol s represents mass spectrometry data; ai , d i represent approximation coefficients and detail coef-
ficients at different levels respectively, where i = 1,2,3,4,5 . It is shown that the detail at 1st level contains “small changes” or “noise” hidden in mass spectra.
4 Experiments and Results In this study we use classification accuracy, sensitivity, specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV) and Balanced Correct Rate (BACC) TP ; Specificity is defined to evaluate the performance. Sensitivity is defined as TP + FN TP TN ; Positive Predictive Value (PPV) is defined as ; Negative Preas TP + FP TN + FP TN dictive Value (NPV) is defined as ; Accuracy is defined as TN + FN TP + TN . Where TP, TN , FP, and FN stand for the number of true posiTP + TN + FP + FN tive (cancer), true negative (control), false positive and false negative samples.
192
Y. Liu
Balanced Correct Rate (BACC) is defined as
1 TP TN ( + ) , which is the 2 TP + FN TN + FP
average of sensitivity and specificity. Experiments are performed on the raw ovarian high-resolution SELDI-TOF dataset, which compose of 95 control samples and 121 cancer samples, and the dimensionality of the original feature space is 368750. They are provided by National Cancer Institute (http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp). • Preprocessing mass spectrometry data Resampling mass spectrometry data homogenizes the mass/charge (M/Z) vector in order to compare different spectra under the same reference and at the same resolution. In high resolution dataset, high resolution spectrometry data contains redundant information. By resampling the signal can be decimated into a more manageable M/Z vector, preserving the information content of the spectra. Resampling mass spectrometry data select a new M/Z vector and also applies an antialias filter that prevents high frequency noise from folding into lower frequencies[19]. We resample the mass spectrometry data to 15000 M/Z points between 710 and 11900. Because mass spectra has high dimensionality and a lot of the information corresponds to the mass/charge (M/Z) vector that do not show any key changes during the experiment. Then we filter out mass spectrometry data with 60% variance over time to keep the significant mass/charge (M/Z) vector. The dimensionality reduces to 6000 after filtering.
Fig. 2. Wavelet features of high resolution ovarian mass spectrometry data. This Figure shows wavelet detail coefficients at 3rd level, 4th level and 5th level.
Feature Extraction for Mass Spectrometry Data
193
• The performance of SVM Classifier We use Daubechies wavelet of order 7 (db7) in wavelet decomposition of mass spectrometry data and the boundary values are symmetrically padded. Multilevel discrete wavelet transform (DWT) is performed on mass spectrometry data. Fig.2 shows detail coefficients at 3rd level, 4th level and 5th level. The dimensionality of wavelet features is shown in Table 1. The red color represents the cancer tissue and the blue one represents the control tissue. The purpose of detail coefficients is to detect a transient feature in one of a mass spectrometry signal's derivatives based on multilevel wavelets decomposition. Detail coefficients determine the position of the change, the type of the change (a transient change in which derivative) and the amplitude of the change, using the compactness and finite energy characteristic of wavelet function. In our study we use K fold cross validation experiments. K fold cross validation randomly generates indices, which contain equal (or approximately equal) proportions of the integers 1 through K that define a partition of the N observations into K disjoint subsets. In K fold cross validation, K-1 folds are used for training and the last fold is used for evaluation. This process is repeated K times, leaving one different fold for evaluation each time. We evaluate the classification performance based on detail coefficients at 3rd level, 4th level and 5th level decomposition respectively. For wavelet features of each level, we run K fold cross validation where K=1, 2,…, 10. We run 20 times for each K fold cross validation experiments. Table 2, 3, and 4 show the performance of detail coefficients at 3rd level, 4th level and 5th level decomposition respectively. Yu et al. [14] indicated that “For the high-resolution ovarian data, the vector of detail coefficients contains almost no information for the healthy, since SVMs identify all the data as cancers”. They use a four-step strategy for data preprocessing based on: (1) binning, (2) Kolmogorov–Smirnov test, (3) restriction of coefficient of variation and (4) wavelet approximation coefficients. Their experimental results are shown in Table 5. They achieved 95.34% BACC for 2 fold cross validation experiment and 96.12% BACC for 10 fold cross validation experiment. Our experimental results show that wavelet detail coefficients are efficient way to characterize the features contained in mass spectrometry data. For 2 fold cross validation experiment, we achieve 97.78% BACC using 761 wavelet detail coefficients at 3rd level decomposition; 99.02% BACC are obtained for 10 fold cross validation experiment. All our detail coefficients at 3rd level, 4th level and 5th level decomposition outperform their method of four-step strategy. Our results also outperform other methods, which are shown in Table 6. This Table shows the dimensionality of detail coefficients at 3rd level, 4th level and th 5 level wavelet decomposition. Table 1. Feature number of high resolution ovarian mass spectra
Feature number
Original 368750
resample 15000
filtered 6000
3rd level 761
4th level 387
5th level 200
194
Y. Liu Table 2. Performance of 761 detail coefficients at 3rd level
K fold 2 3 4 5 6 7 8 9 10
Accuracy
Sensitivity
Specificity
0.9781 0.9861 0.9877 0.9898 0.9884 0.9861 0.9830 0.9846 0.9907
0.9805 0.9894 0.9945 0.9950 0.9917 0.9890 0.9862 0.9890 0.9945
0.9751 0.9820 0.9789 0.9832 0.9842 0.9825 0.9789 0.9789 0.9860
PPV
NPV
BACC
0.9805 0.9859 0.9837 0.9869 0.9877 0.9863 0.9835 0.9836 0.9890
0.9751 0.9864 0.9929 0.9936 0.9894 0.9859 0.9824 0.9859 0.9929
0.9778 0.9857 0.9867 0.9891 0.9880 0.9857 0.9826 0.9840 0.9902
This Table shows the performance of high resolution ovarian dataset. PPV stands for Positive Predictive Value; NPV stands for Negative Predictive Value. BACC stands for Balanced Correct Rate. Table 3. Performance of 387 detail coefficients at 4th level K fold 2 3 4 5 6 7 8 9 10
Accuracy 0.9668 0.9702 0.9676 0.9741 0.9745 0.9722 0.9769 0.9738 0.9738
Sensitivity 0.9737 0.9717 0.9738 0.9752 0.9731 0.9780 0.9752 0.9752 0.9725
Specificity 0.9579 0.9684 0.9596 0.9726 0.9763 0.9649 0.9789 0.9719 0.9754
PPV 0.9672 0.9751 0.9685 0.9784 0.9812 0.9726 0.9833 0.9779 0.9806
NPV 0.9662 0.9641 0.9664 0.9686 0.9661 0.9717 0.9688 0.9685 0.9653
BACC 0.9658 0.9700 0.9667 0.9739 0.9747 0.9714 0.9771 0.9736 0.9739
This Table shows the performance of high resolution ovarian dataset. PPV stands for Positive Predictive Value; NPV stands for Negative Predictive Value. BACC stands for Balanced Correct Rate. Table 4. Performance of 200 detail coefficients at 5th level K fold 2 3 4 5 6 7 8 9 10
Accuracy 0.9621 0.9656 0.9769 0.9731 0.9803 0.9846 0.9799 0.9753 0.9753
Sensitivity 0.9715 0.9681 0.9835 0.9802 0.9855 0.9890 0.9862 0.9835 0.9807
Specificity 0.9502 0.9624 0.9684 0.9642 0.9737 0.9789 0.9719 0.9649 0.9684
PPV 0.9613 0.9704 0.9754 0.9721 0.9795 0.9836 0.9781 0.9728 0.9753
NPV 0.9631 0.9595 0.9787 0.9745 0.9814 0.9859 0.9823 0.9786 0.9753
BACC 0.9608 0.9653 0.9759 0.9722 0.9796 0.9840 0.9791 0.9742 0.9746
Feature Extraction for Mass Spectrometry Data
195
This Table shows the performance of high resolution ovarian dataset. PPV stands for Positive Predictive Value; NPV stands for Negative Predictive Value. BACC stands for Balanced Correct Rate. Table 5. Performance of a four-step strategy [14] K fold 2 3 4 5 6 7 8 9 10
Control 0.9330 0.9393 0.9409 0.9425 0.9411 0.9412 0.9414 0.9423 0.9406
SD 0.0174 0.0188 0.0200 0.0203 0.0223 0.0210 0.0222 0.0231 0.0226
Cancer 0.9738 0.9783 0.9786 0.9794 0.9806 0.9805 0.9815 0.9817 0.9819
SD 0.0125 0.0115 0.0118 0.0119 0.0118 0.0118 0.0117 0.0117 0.0113
BACC 0.9534 0.9588 0.9597 0.9609 0.9608 0.9608 0.9615 0.9620 0.9612
This Table shows performance of high-resolution ovarian dataset. SD stands for standard deviations. BACC stands for Balanced Correct Rate. Table 6. Performance of different methods [14] of high resolution ovarian dataset Methods VP QDA LDA MDA NB Bagging 1-NN 2-NN ADtree J48tree
2-fold cross validation Control (Mean) Cancer (Mean) 0.9393 0.9583 0.9202 0.9429 0.9179 0.9467 0.9392 0.9154 0.8803 0.9190 0.8835 0.9174 0.8575 0.8889 0.7260 0.9641 0.8238 0.8878 0.7818 0.8507
10-fold cross validation Control (Mean) Cancer( Mean) 0.9482 0.9691 0.9255 0.9647 0.9255 0.9522 0.9591 0.9267 0.8979 0.9249 0.8977 0.9232 0.8902 0.9018 0.8063 0.9745 0.8498 0.9025 0.8245 0.8825
5 Conclusions For transformations a new basis is normally chosen for the data. The selection of the new basis determines the properties that will be held by the transformed data. Principle component analysis (PCA) is used to extract the main components from mass spectra [9]; Linear discriminant analysis (LDA) is used to extract discriminant information from mass spectra [10]. However in these transform feature space, localized features or transient features of mass spectrometry data can not be detected. Detail coefficients can reveal the transient or localized changes of mass spectrometry data, using the compactness and finite energy characteristic of wavelet functions. Experimental results show that detail coefficients achieve good performance.
196
Y. Liu
Acknowledgements. This study is supported by research funds of Shandong Institute of Light Industry (12041653).
References 1. Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359, 572–577 (2002) 2. Sorace, J.M., Zhan, M.: A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinform. 4 (2003) 3. Michener, C.M., Ardekani, A.M., Petricoin, E.F., Liotta, L.A., Kohn, E.C.: Genomics and proteomics: application of novel technology to early detection and prevention of cancer. Cancer Detect Prev. 26, 249–255 (2002) 4. Petricoin, E.F., Zoon, K.C., Kohn, E.C., Barrett, J.C., Liotta, L.A.: Clinical proteomics: translating benchside promise into bedside reality. Nat. Rev. Drug. Discov. 1, 683–695 (2002) 5. Srinivas, P.R., Verma, M., Zhao, Y., Srivastava, S.: Proteomics for cancer biomarker discovery. Clin. Chem. 48, 1160–1169 (2002) 6. Herrmann, P.C., Liotta, L.A., Petricoin, E.F.: Cancer proteomics: the state of the art. Dis. Markers 17, 49–57 (2001) 7. Cazares Jr., G.W., Leung, L.H., Nasim, S.M., Adam, S., Yip, B.L., Schellhammer, T.T., Gong, P.F., Vlahou, L.: A Proteinchip surface enhanced laser desorption/ionization (SELDI) mass spectrometry: a novel protein biochip technology for detection of prostate cancer biomarkers in complex protein mixtures. Prostate Cancer Prostatic Dis. 2, 264–276 (1999) 8. Vlahou, A., Schellhammer, P.F., Mendrinos, S., Patel, K., Kondylis, F.I., Gong, L., Nasim, S., Wright Jr.: Development of a novel proteomic approach for the detection of transitional cell carcinoma of the bladder in urine. Am. J. Pathol. 158, 1491–1520 (2001) 9. Lilien, R.H., Farid, H., Donald, B.R.: Probabilistic disease classification of expressiondependent proteomic data from mass spectrometry of human serum. Computational Biology 10 (2003) 10. Park, H., Jeon, M., Rosen, J.B.: Lower dimensional representation of text data based on centroids and least squares. BIT 43, 1–22 (2003) 11. Wu, B., Abbott, T., Fishman, D., McMurray, W., Mor, G., Stone, K., Ward, D., Williams, K., Zhao, H.: Comparison of statistical methods for classifcation of ovarian cancer using mass spectrometry data. BioInformatics 19 (2003) 12. Jeffries, N.O.: Performance of a genetic algorithm for mass spectrometry proteomics. BMC Bioinformatics 5 (2004) 13. Levner, I.: Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics 6 (2005) 14. Yu, J.S., Ongarello, S., Fiedler, R., Chen, X.W., Toffolo, G., Cobelli, C., Trajanoski, Z.: Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21, 2200–2209 (2005) 15. Mallat, S.: A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11, 674–693 (1989) 16. Vapnik, V.N.: Statistical learning theory. Wiley, New York (1992) 17. Hearst, M.A., Dumais, S.T., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems 13, 18–28 (1998) 18. Burges, C.: A Tutorial on Support Vector Machines for Pattern Recognition. Kluwer Academic Publishers (1998) 19. IEEE: Programs for Digital Signal Processing. ch. 8. IEEE Press, John Wiley & Sons, New York (1979)
An Improved Algorithm on Detecting Transcription and Translation Motif in Archaeal Genomic Sequences Minghui Wu1,2, Xian Chen3, Fanwei Zhu1, and Jing Ying1,2 1 Computer College, Zhejiang University, Hangzhou, P.R. China Dept.of Computer, Zhejiang University City College, Hangzhou, P.R. China
[email protected] 3 Dept. of Computing Science, University of Alberta, Edmonton Alberta, Canada
[email protected] 2
Abstract. Identifying the binding sites and promoters in the genomes remains one of the most research topics in computational biology in past ten years. In the upstream region of the start codon, there exist transcription and translation motifs whose distances and patterns vary among different genomic sequences. However, the existing computational approaches for detecting them are mostly general-purpose. For archaea, binding motifs are almost undiscovered as they are more hidden than bacteria or eukaryotes. In this report, an improved algorithm based on PWM (position-weight matrix) and Gibbs Sampler was proposed for finding any number of patterns in a given range. Experiments using this algorithm were done to detect the potential binding motifs in archaeal genomic sequences, and the results were analyzed among different settings and species. The comparison with biological experiments result shows that the improved algorithm is feasible to find more significant patterns for archaea than the traditional approaches.
1 Problem Description Identifying the binding sites and promoters in the genomes remains one of the most research topics in computational biology in past ten years. Usually there exist binding sites (motifs) in the DNA sequences that tell RNA polymerase where to start transcription. Sometimes there are also translation motifs in the DNA sequences. These motifs are present separately near the start codon of genes, and might be recognized by their patterns. However, the distances between these short sequences and start codon are not same, and sometimes their mutation rate is not very low. There have been lots of algorithms addressing this problem from different viewpoints, ranging from pattern recognition to neural network: 1. CONSENSUS[5]: Use a greedy algorithm to iteratively build up motifs by adding more and more pattern instances. 2. Gibbs Sampler[6]: Start from a random initial solution, and use the Gibbs sampling approach to make a series of local moves for the best solution. 3. MEME[1]: Use the expectation maximization (E-M) algorithm. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 197–207, 2007. © Springer-Verlag Berlin Heidelberg 2007
198
M. Wu et al.
4. ANN-Spec[9]: Use the Neural Network. 5. Co-Bind[4]: Use Gibbs sampling strategy, but discover DNA target sites for cooperatively acting transcription factors. However, most of these methods are for general purpose, which might not work well in all cases. For example, in archaeal genomes, few methods were proposed since binding motifs are largely undiscovered in biological experimental level as they are usually more hidden than bacteria or eukaryotes. We need some methods and algorithms to detect motifs in such specific cases. The purpose of this project tries to propose methods finding motifs in archaeal genomes, and analyzes the feasibility and significance of detecting archaeal motifs by computational methods.
2 Materials and Methods 2.1 Genomes All genomic sequences of Archaea were downloaded from GenBank, NCBI (http://www.ncbi.nlm.nih.gov). Up to December 2004, there are totally 19 whole genomic sequences submitted in GenBank (see Table 1). They may be divided into three types: Crenarchaeota, Euryarchaeota and Nanoarchaeota. Table 1. Archaeal genomic sequences tested and analyzed Genome name Aeropyrum Pernix Kl Pyrobaculum aerophilum Sulfolobus solfataricus Sulfolobus tokodaii Archaeoglobus fulgidus Halobacterium sp. NRC-1 Methanobacterium thermoautotrophicus Methanocaldococcus jannaschii Methanococcus maripaludis S2 Methanopyrus kandleri AV19 Methanosarcina acetivorans strain C2A Methanosarcina mazei Goel Picrophilus torridus DSM 9790 Thermoplasma acidophilum Thermoplasma volcanium Pyrococcus abyssi Pyrococcus furiosus Pyrococcus horikoshii Nanoarchaeum equitans Kin4-M
Gene ber 1893 2706 3030 2874 2486 2127 1920
num-
Order of Species Crenarchaeota Crenarchaeota Crenarchaeota Crenarchaeota Euryarchaeota Euryarchaeota Euryarchaeota
GenBank Access No. NC 000854 NC 003364 NC 002754 NC 003106 NC 000917 NC 002607 NC 000916
1773 1772 1729 4662
Euryarchaeota Euryarchaeota Euryarchaeota Euryarchaeota
NC 000909 NC 005791 NC 003551 NC 003552
3438 1535 1529 1548 1895 2125 2004 552
Euryarchaeota Euryarchaeota Euryarchaeota Euryarchaeota Euryarchaeota Euryarchaeota Euryarchaeota Nanoarchaeota
NC 003901 NC 005877 NC 002578 NC 002689 NC 000868 NC 003413 NC 000961 NC 005213
An Improved Algorithm on Detecting Transcription and Translation Motif
199
2.2 Position-Weight Matrix and Information \Content The PWM (Position-weight matrix) has been widely used in sequence pattern matching. In order to define PWM, we consider a given set of sequences S = {S1, S2, S3, ..., Sn} with the same length l. The alignment matrix is described by the number of appearance of each residue in each position (For DNA sequence, it could be one of A, C, G and T). As an example, in DNA sequences with length 10, the alignment matrix could look like Table 2: Table 2. Alignment matrix Pos A C G T
1 1 2 1 4
2 3 2 2 1
3 2 3 3 0
4 0 8 0 0
5 8 0 0 0
6 0 8 0 0
7 0 0 8 0
8 0 0 0 8
9 0 0 5 3
10 0 2 4 2
From this example, we can easily find the consensus for the set of sequences, which is TAC (G) CACGTGG. However, it is hard to describe the quality of alignment by alignment matrix since the probability of appearance for each nucleotide varies among different sequence contents. Therefore we introduce the probability to calculate the weight of nucleotide in each position. The PWM entry for a base i at position j is defined as following:
W i , j = ln(
f i, j pi
)
(1)
where pi is the expected probability of the base i in the background sequence, and fi,j is the corrected frequency of base i at position j, which can be given by:
f i, j =
ni , j + p i
∑i =1 ni, j + 1 A
.
(2)
here ni,j is the correspondent value in the alignment matrix. The PWM matrix corresponding to Table 2 is given in Table 3: Table 3. Position-weight Matrix Pos A
pi 0.32
1 -0.79
2 0.13
C G T
0.18 0.18 0.32
0.32 0.29 0.39
0.32 0.32 -0.79
3 0.23 0.7 0.7 -2.2
4 -2.2
5 1.05
6 -2.2
7 -2.2
8 -2.2
9 -2.2
10 -2.2
1.65 -2.2 -2.2
-2.2 -2.2 -2.2
1.65 -2.2 -2.2
-2.2 1.65 -2.2
-2.2 -2.2 1.05
-2.2 1.19 0.13
0.32 0.97 -0.23
200
M. Wu et al.
Finally, the information content is calculated for each nucleotide in each position. It is defined as following:
I i , j = f i , j • Wi , j .
(3)
Then we may use the sum of information contents to represent the alignment quality: l
A
I matrix = ∑∑ I i , j .
(4)
j =1 i =1
The information contents corresponding to Table 2 is shown in Table 4: In this example, 7.03 is the sum of information contents, which can represent the quality of sequence alignment. Table 4. Information Content Pos A
pi 0.32
1 0.12 0.08
2 0.05
C
0.18
G
0.18
T Sum
0.08
3 0.06 0.25
4 0.08 1.5
0.32
0.04 0.19
0.08
0.25
0.12 0.09
0.08 0.36
0.04 0.08 1.29
7.03
0.11
5 0.97 0.04 0.04 0.08 0.8
6 0.08 1.5 0.04 0.08 1.29
7 0.08 0.04 1.5 0.08 1.29
8 0.08 0.04 0.04 0.97
9 0.08 0.04 0.68
10 -0.08
0.05
-0.06
0.8
0.61
0.39
0.08 0.45
2.3 Definition of Entropy Profile The weakness of information content is that its upper bound increases when pi decreases. Therefore we have to define the entropy profile, which has an upper bound of 2 in a specific position: A
S = S max − S obs = 2 − (−∑ bi log 2 bi )
(5)
i =1
where bi is the probability of appearance for base i in the position
bi =
ni
∑
A j =1
nj
(6)
here ni is the number of appearance for base i. The total entropy profile for a motif is given by:
S tot = where l is the length of sequences.
l
∑S j =1
j
(7)
An Improved Algorithm on Detecting Transcription and Translation Motif
201
2.4 Algorithm Traditional Gibbs Sampler Algorithm is a special case of Markov Chain method. It has shown good performance in some experimental cases. However, in practice, some problems exist: 1. The detected motifs in the sequences might overlap. 2. It is hard to detect some relatively weak patterns in a large range. 3. Some detected patterns might not be motifs because of the random mutation. In order to solve these problems, we improve the algorithm from two aspects: First, allow detecting multiple motifs in one iteration. Given that same motif patterns should not be too far away from each other [8], and different motifs appear in the same order in the upstream regions, we may divide a region into a number of subregions. For example, we divide a region (say -60 to -10) into 2 parts (2 equal parts in default case). Each time, we detect a pattern from -60 to -35, and another pattern from -35 to -10. So each sub-region can extend a bit into its neighbor as long as no overlapping of patterns happens. Second, we use the whole genomic sequence as the background sequence for a given genome, and count the probability of each base as value of pi. Since we use information content as the alignment score, this adjustment is expected to achieve better results. For implementation, we use an array to store the offsets of motifs for each sequence. Initially the offsets are randomly set in all sub-regions. In each iteration, we picked out one sequence S for each sub-region, and calculate PWM and information content for all possible subsequences in the sub-region; find one pattern that makes its corresponding information content highest, and update the offset of motif for S and replace PWM and information content with the new value. Since the length of the whole region is fixed, we can even find more than one patterns simultaneously in a given region. The details of results are discussed in next two sections.
3 Experiments and Results 3.1 Testing in Genomic Sequences In experiments, all the archaeal genomic sequences were classified into 12 groups according to their species (see Table 5). Potential motif patterns were detected using the new algorithm with detected motif number of 1, 2 and 3. All the detected region is set as upstream [-60, -10] to the start codon and the length of patterns is set 6. The running time never exceeded 30 seconds for any sequence (iterations range from 50000 to 120000). This report will focus on the data with motif number of 3.
202
M. Wu et al. Table 5. Groups of Archaeal Species for Testing
Group No. I II III IV V VI VII VIII IX
X
XI XII
Genome name
Species
Aeropyrum Pernix Kl Pyrobaculum aerophilum Sulfolobus solfataricus Sulfolobus tokodaii Archaeoglobus fulgidus Halobacterium sp. NRC-1 Methanobacterium thermoautotrophicus Methanocaldococcus jannaschii Methanococcus maripaludis S2 Methanopyrus kandleri AV19 Methanosarcina acetivorans strain C2A Methanosarcina mazei Goel Picrophilus torridus DSM 9790 Thermoplasma acidophilum Thermoplasma volcanium Pyrococcus abyssi Pyrococcus furiosus Pyrococcus horikoshii Nanoarchaeum equitans Kin4-M
Aeropyrum Pyrobaculum Sulfolobus Sulfolobus Archaeoglobus Halobacterium
GenBank Access No. NC-000854 NC-003364 NC 002754 NC 003106 NC 000917 NC 002607
Methanobacterium
NC 000916
Methanocaldococcus Methanococcus Methanopyrus
NC 000909 NC 005791 NC 003551
Methanosarcina
NC 003552
Methanosarcina Picrophilus Thermoplasma Thermoplasma Pyrococcus Pyrococcus Pyrococcus Nanoarchaeum
NC 003901 NC 005877 NC 002578 NC 002689 NC 000868 NC 003413 NC 000961 NC 005213
3.2 Results For each genomic sequence, the tests were run 10 times with each detected motif number. The patterns with highest entropy values were recorded (results with entropy below 4 are not shown in this table) in Table 6 and Table 7.
4 Discussion 4.1 Significance of Patterns Since the entropy profile has the upper bound of 12 for a 6-base long pattern, the higher entropy value we get, the more significant the pattern is. However, in practice, the entropy profile of most patterns never exceeds 8. Also, the entropy of a pattern is highly related to total number of genes in a genomic sequence. Therefore, we say a pattern is significant if it has an entropy profile higher than 5. In some cases, stable patterns with entropy > 4 are also considered as potential motifs.
An Improved Algorithm on Detecting Transcription and Translation Motif
203
Table 6. Results of Testing with Motif Number = 1 and 2 Group No. I
Access No. NC 000854
II
NC 003364
III
NC 002754
NC 003106
IV
NC 000917
V
NC 002607
VI
NC 000916
VII
NC 000909
NC 005791
VIII
NC 003551
IX
NC 003552
NC 003901
X
NC 002578
Motif = 1[-60,-10] Pattern Entropy GAGGCT 5.573 AGGCTT 5.515 TAGAGG 5.365 TTTAAA 6.529 TATAAA 6.505 TTAAAA 6.362 AAAAGA 6.455
Motif = 2[-60,-30],[-40,-10] Pattern Entropy Pattern CTATAG 4.259 GAGGCT TTTAGA 4.203 TAGAGG CTATAG 4.179 GGATGT TATAAA 4.291 TTTAAA TTTTAA 4.222 TATAAA TTTAAA 4.177 TTTAAA GAAAAA 5.001 TTAAAA
Entropy 4.508 4.334 4.269 5.551 5.534 5.513 5.803
AAAGAT AAAGAT AAGAAA
6.323 6.263 6.693
TAGAAA TTATAA AGAAAA
4.971 4.939 5.361
TAAATT AAGATT TTTTAA
5.452 5.276 6.212
AGAAAA TAGAAA AAAGAT TGAAAA GAAAAA TCGACG AACGCC GTCGAC GATAAA AGATAA ATGAAA ATAAAA
6.428 6.47 5.986 5.961 5.85 5.144 4.872 4.61 5.867 5.861 5.835 7.91
TAGAAA GAAAAA TGAAAA TGAAAA TGAAAA
5.303 5.276 4.919 4.919 4.76
TTTTTA AAGATT AAAGTT AAAGTT AAAGTT
6.086 5.414 4.848 4.823 4.798
CATGAA ATGAAA TTGATA TATAAA
4.707 4.683 4.678 6.486
ACATAA ACATAA ACATAA AAAAAT
4.827 4.799 4.793 6.613
AAATAA AAAAAA TTTAAA
7.602 7.484 7.844
ATATAA TAAAAA AAAAAT
6.363 6.298 6.508
ATAAAA AAATAT AAATAG
6.418 6.161 5.505
AAAATA AAAAAT CGGGGG GGGGGA GGGGGA AGAAAA
7.708 7.696 6.888 6.865 6.856 6.256
AAAATA TTTAAA CGTGGA CGTGGA CGAGGA AGAAAA
6.462 6.458 4.04 4.018 4.004 5.035
AAGGTG AAAAGG CGGGGG GGGGTG CGGGGG AAAAGA
5.079 4.914 6.139 6.013 5.977 4.975
AAAAAA AAAATA AAAAAA
6.2 6.184 6.311
AAAATA AAAAAT AAAAAT
4.97 4.896 5.162
AAAAAT ATAAAA AGAAAA
4.955 4.918 4.995
AAAAAA AAAATA ATAAAT
6.263 6.243 6.295
AAAAAT TAAAAA AGAAAA
5.071 5.036 4.902
AAAGGA AGGAAA TTAAAT
4.758 4.54 5.359
AATAAA AAATAT
6.254 6.254
TGAAAA ATGAAA
4.878 4.834
AAATAT AAATAT
5.338 5.301
204
M. Wu et al. Table 6. (continued) NC 002689
XI
NC 000868
NC 003413
NC 000961
XII
NC 005213
AAAAAT
6.697
AAGAAA
5.264
TAAATA
5.686
AAATAT AAATAT GAGGTG
6.695 6.652 6.105
ATAAAA ATAAAA TTTTAA
5.13 5.1 4.885
TAAATA ATAAAT GAGGTG
5.628 5.549 5.147
GAGGTG AGGAGG AGGAGG
6.101 5.947 5.926
TTAAAA TGAAAA TAAAAG
4.883 4.836 5.072
GAGGTG GAGGTG GAGGTG
5.133 5.106 5.104
GAGGAG GAGGTG AAGGAT
5.797 5.789 5.81
TGAAAA TGAAAA TTAAAA
5.011 5.007 4.985
GAGGTG AGGAGG TAAGGG
5.102 4.993 4.841
GAGGTG AAGGAG TTTAAA TTAAAA TTATAA
5.728 5.697 8.22 7.827 7.78
TTAAAA AAAGTT CTATAG AAAAGC AAAAGG
4.916 4.903 4.495 4.362 4.219
GGAGGT TGAGGT TTTAAA TTTAAA TTTAAA
4.713 4.679 7.535 7.521 7.474
Table 7. Results of Testing with Motif Number = 3 Group No.
Access No.
I
NC 000854
II
NC 003364
III
NC 002754
NC 003106
IV
NC 000917
V
NC 002607
Motif = 3[-60,-40],[45,-25],[-30,-10] Pattern Entropy Pattern Entrop y
Pattern GGGGTG GGGGTG GGGGTG
Entrop y 4.151 4.146 4.073
TAAAAT
4.392
TTTAAA TTTAAA TTTATA TTATAA
4.459 4.457 4.452 5.393
AAGATT
4.387
TAAAAT TAAAGT AAGATT
4.385 4.341 4.534
TTAAAA TTTATA TTTTAA
5.379 5.3 5.601
AATATA AAGATA TTTAAG
4.346 4.246 4.876
AAATAT AAAAGA AAAGTT TGAAAA
4.471 4.278 4.057 4.033
ATTTAA TAAAAA TGAAAA TGAAAA TGAAAA
5.481 5.227 4.337 4.266 4.259
AATTTA TAAATA TTAAGG AGGTGA TTTGAG
4.717 4.71 4.129 4.042 4.02
An Improved Algorithm on Detecting Transcription and Translation Motif
205
Table 7. (continued) VI
NC 000916
VII
NC 000909
NC 005791
VIII
NC 003551
IX
NC 003552
NC 003901
X
NC 002578
NC 002689
XI
NC 000868
NC 003413
NC 000961
XII
NC 005213
GATGAA ATGAAA ATGAAA TATAAA
4.115 4.104 4.107 5.681
TTGAAA ATGAAA ATGAAA TTTAAA
4.147 4.133 4.066 5.921
AGGAGA AGGAGA AGGAGA AGGTGA
4.39 4.386 4.323 4.623
ATATAA TTTAAA TTAAAT
5.634 5.632 5.951
TTTAAA ATTAAA ATTTAA
5.852 5.835 5.915
AAGGTG AGGTGA AGGTGA
4.588 4.577 5.199
TTAAAA TTAAAA CCCCGA CCCGAA CCCGAA AAAAAT
5.944 5.882 4.119 4.052 4.051 4.42
ATTTAT ATTTAA
5.873 5.808
TTGAAA
4.499
AGGTGA AGGTGA CGGGGG CGGGGG CGGGGG AGGAAA
5.197 5.197 6.025 5.993 5.948 4.069
TGAAAA GAAAA A TGAAAA
4.358 4.349
ATTTAA ATAAAA
4.39 4.386
AAAGAG AGGAAA
4.048 4.041
4.593
TTTAAA
4.545
AGGAGA
4.041
AAAAAT TGAAAA AAAAAT
4.52 4.517 4.352
ATTAAA TAAAAA ATATAA
4.522 4.49 4.644
AGGAGA AAGGAG AAATAT
4.001 4.001 4.523
AAAAAT AAAATA TAAATA
4.342 4.314 4.636
ATATAA ATTTAA TAAATA
4.625 4.504 5.197
TATAAT AATAAT TAAATG
4.474 4.393 4.459
AAATAT AAATAT AAGATT
4.576 4.552 4.315
ATTTAA AAATAT TTTTAA
5.159 5.126 4.734
ATAAAG ATATTG AGGTGA
4.226 4.217 4.729
AAGGTT AAGGTT AAAAA G TGAAAA AAAGTT TGAAAA
4.262 4.258 4.395
TTTTAA TTTTAA TTTAAA
4.721 4.715 4.739
AGGTGG AGGTGA AGGTGG
4.709 4.709 4.907
4.337 4.334 4.286
TTTAAA TTTAAA TTTTAA
4.731 4.705 4.668
AGGTGG AGGTGG AGGTGG
4.876 4.862 4.193
AAAAGT TTTAAA
4.275 4.227
TTAAAA TTTTAA TTTTAA TTTTAA TTTTAA
4.629 4.623 6.298 6.297 6.21
AGGAGG GAGGTG TTTAAA TTTAAA TTTAAA
4.186 4.123 6.802 6.518 6.234
206
M. Wu et al.
4.2 Comparison Between Different Motif Numbers From Table 6 and 7, we may find that although the entropy is usually high when motif number is 1, the consensus patterns don’t show stability. But when motif number is added to 2 or 3, some patterns that haven’t been found become stable. Most of them are weak and hidden in local region that it is hard to be found in the whole region. Hence, it is necessary to focus on the data with motif number of 2 and 3. 4.3 Homology of the Archaeal Motif Patterns When we focus on the data with motif number 3(or some with number 2), we usually find an AT-rich pattern near [-40,-30] from archaeal gene sequences. The pattern looks like TTTATA or TTTAAA, which is part of TATA-box [7]. At the upstream of TATA-box ([-60,-40]), there are few significant patterns. Although some similar patterns including TGA or AGG are found in some sequences, they are not widespread in different archaeal genomic sequences. On the contrary, at the downstream of TATA-box ([-25.-10]), there is a significant pattern with different contents in different genomic sequences. This pattern always appears around -15, which turns out to have particularly high entropy when motif number = 3. Data in the same group show the similarity of the same or close species, especially when motif number = 3. Some groups have two AT-rich patterns near both -35 and 10. As little evidence shows the number of motifs in this region, we can not tell if these two patterns are actually referred to the same motif. Many groups include at least two completely different patterns when motif number = 3.One typical example is group XI, the second and the third pattern is almost invariable. The third pattern turns out to be AGGTGG with an entropy around 4.8, which is still a relatively positive result. 4.4 Discussion on Some Specific Cases Here are two specific cases verified by other people’s work. The first case is in Pyrococcus furious (Fig.1). The third pattern in the test of motif number 3 gives a stable AGGTGG with an entropy close to 5.
Fig. 1. Patterns from Pyrococcus furiosus with Motif Number = 3(Generated by WebLogo)
An Improved Algorithm on Detecting Transcription and Translation Motif
207
This pattern exactly matches the transcription motif pattern given in [3]. Another exciting case is Pyrococcus abyssi (see [2] [8]). When motif number is 3, the second pattern shows an AT-rich pattern near -30, and the third one shows a Grich pattern near -10. These two patterns keep stable with entropy between 4.5 and 5.
5 Conclusions and Future Work The algorithm proposed in this report improves the traditional Gibbs Sampler Algorithm in the following aspects: 1. Finding non-overlapping motifs simultaneously. 2. Detecting some relatively weak patterns in a large upstream region. 3. Reducing the possibility of randomness and improving stability The experimental results for archaeal sequences show that it is feasible to find more motif patterns in this relatively unknown domain. Therefore, we may have confidence to predict some motifs (or patterns) by this improved algorithm. The future work is to implement a display tool based on this algorithm, which can view the significance clearly and visualize the comparison. More tests will be taken in a larger region to detecting more patterns. We may even apply this algorithm into other two main domains (i.e. bacteria and eukaryotes) for further study.
References 1. Bailey, T.L., Elkan, C.P.: Fitting a mixture model by expectation maximization to discover motifs in biopoly- mers. Intelligent System Mol. Biol. 2, 28–36 (1994) 2. Boyle, A.P., Boyle, J.A.: Global alignment of microbial translation initiation regions. J. MS Acad. Sci. 48, 138–150 (2003) 3. Dahlke, I., Thomm, M.: A pyrococcus homolog of the leucine-reponsive regulatory protein, lrpa, inhibits transcription by abrogating rna polymerase recruitment. Nucleic Acids Research 30(3), 701–710 (2002) 4. GuhaThakurta, D., Stormo, G.D.: Identifying target sites for cooperatively binding factors. Bioinformatics 17, 608–621 (2001) 5. Hertz, G.Z., Stormo, G.D.: Identifying dna and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999) 6. Lawrence, C.E., Altschul, S.F., Bogusky, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993) 7. Vierke, G., Engelmann, A., Hebbeln, C., Thomm, M.: A novel archaeal transcriptional regulator of heat shock response. Biological Chemistry 278(1), 18–26 (2003) 8. Wan, X., Bridges, S.M., Boyle, J.A.: Revealing gene transcription and translation initiation patterns in archaea, using an interactive clustering model. Extremophiles 8, 291–299 (2004) 9. Workman, C.T., Stormo, G.D.: Ann-spec: a method for discovering transcription factor binding sites with improved specificity. Pac. Symp. Biocomput. 5, 464–475 (2000)
Constructing Structural Alignment of RNA Sequences by Detecting and Assessing Conserved Stems Xiaoyong Fang1, Zhigang Luo1, Bo Yuan2, Zhenghua Wang1, and Fan Ding1 1
School of Computer Science, National University of Defense Technology 410073 Changsha, China {ZhigangLuo,
[email protected]} 2 Department of Biomedical Informatics, College of Medicine and Public Health, Ohio State University 43210-1239 Columbus Ohio, USA {Bo Yuan,yuan.33}@osu.edu
Abstract. The comparative methods for predicting RNA secondary structure can be facilitated by taking structural alignments of homologous sequences as input. However, it is very difficult to construct a well structural alignment of RNA sequences without knowing the secondary structures. In this paper, we present a stem-based method for constructing structural alignment of RNA sequences with unknown structures. The method can be summarized by: 1) we detect possible stems in the RNA sequence using the so-called position matrix with which some possibly paired positions are uncovered; 2) we detect conserved stems across multiple sequences by multiplying the position matrices; 3) we assess the conserved stems using the Signal-to-Noise and the new SCFG model; 4) we construct structural alignment of RNA sequences by incorporating conserved stems with Clustal W which is a popular program for multiple sequence alignment. We tested our method on data sets composed of known structural alignments which are downloaded from the Rfam database. The accuracy, measured as sensitivity and true positive rate, of our method is much greater than alignments by Clustal W.
1 Introduction Non-coding RNA genes are genes for which RNA, rather than protein, is the functional end product [1]. Recent research shows that ncRNA is far more widespread than was previously anticipated [2]. The functions of ncRNAs are strongly related to their secondary structures. The gold standard for determining RNA secondary structure is comparative sequence analysis [3], in which a large number of sequences are aligned to reveal the common base pairing pattern. So far, a number of methods based on comparative sequence analysis have been implemented to predict RNA secondary structure [4-7]. However, these approaches all require an initial multiple alignment as input and thus are very vulnerable to structural errors in the alignment. The accuracy of comparative methods can be improved by taking structural alignment of RNA sequences as input. But it is very difficult to construct a well K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 208–217, 2007. © Springer-Verlag Berlin Heidelberg 2007
Constructing Structural Alignment of RNA Sequences
209
structural alignment without knowing the secondary structures. In general, the structure of an RNA is often more conserved than its sequence. Hence, one can not use standard multiple sequence alignment techniques such as Clustal W [8] or T. Coffee [9] since they completely neglect structural information. For this reason, several methods for simultaneous sequence alignment and structure prediction have been proposed to better this problem [10-12]. But algorithms based on these methods are too computationally taxing to be practical. Recently, some heuristic stem-based methods have been proposed to construct structural alignment of RNA sequences [1316]. But all of these methods lack of explicit algorithms for assessing the stems and thus reduce the accuracy of structural alignments. In this paper, we present a new method for constructing structural alignment of RNA sequences by detecting and assessing conserved stems. We detect possible stems in an RNA sequence using the so-called position matrix which is described in Section 2.1. We detect conserved stems across multiple sequences by the multiplying of position matrices which is defined in Section 2.2. We assess conserved stems using the Signal-to-Noise and the new SCFG (Stochastic Context-Free Grammar) [17] model which are introduced in Section 2.3. Finally, we construct structural alignment of multiple sequences by incorporating selected conserved stems with the popular program Clustal W. We tested our method on data sets composed of known structural alignments taken from the Rfam database [18]. The results show that our method can construct structural alignment of RNA sequences with much greater accuracy than Clustal W.
2 Methods We present the concept of position matrix for the first time in [19]. In this paper, we apply it to constructing structural alignment of RNA sequences. 2.1 Detecting Possible Stems in RNA Sequence Using the Position Matrix Given an RNA sequence of length N (denoted by Seq), the N×N position matrix for Seq (denoted by MSeq) is built by following steps: 1)
2)
3)
We first compute the reverse complement of Seq (denoted by Seq’ ) according to following rules: • The complement of ‘G’ is not ‘C’ but the set of {C, U}. For simplicity, we denote φ = {C, U}. • The complement of ‘U’ is not ‘A’ but the set of {A, G}. For simplicity, we denote ψ = {A, G}. • If the character is a gap (denoted by ‘_’), then the complement for it should be a gap too. We build one N×N matrix (this is not the position matrix) containing Seq’ in the first row. The ith (0 ≤ i ≤ N-1) row contains the sequence generated from Seq’ by shifting i position to the left (circular left shift). The position matrix MSeq is computed by comparing Seq with the matrix generated by 2) row by row. 0 or 1 or -1 is assigned to the i, j (0 ≤ j ≤ N-1)
210
X. Fang et al.
element of MSeq by comparing the jth base of Seq with the i, j (0 ≤ j ≤ N-1) element of the matrix in 2). Here, ‘0’ means the corresponding position is unpaired and ungapped, ‘1’ means the position is paired and ungapped, and ‘1’ means the position is gapped. The following rules should be obeyed when two bases are compared with each other: • If b equals b’ and neither of them is ‘_’, then 1 is assigned. • If b or b’ is ‘_’ or both of them are ‘_’, then -1 is assigned. • If b does not equal to b’ and neither of them is ‘_’, then 0 is assigned. Here, b is a base from Seq and b’ is an element from the matrix in 2). Specially, when b’ is φ or ψ, the word “equal” means that b belongs to b’. As shown in Figure 1, One N×N matrix (the left) is firstly built from the original sequence. Then the position matrix, MSeq, is computed by comparing Seq with the left matrix row by row. 1 Seq = A A U U G A U G 2
φψUφψψUU ψUφψψUUφ UφψψUUφψ φψψUUφψU ψψUUφψUφ ψUUφψUφψ UUφψUφψψ UφψUφψψU
MSeq =
01111110 10101010 00000011 01010000 11110110 10111011 00100001 00010100
1
2
Fig. 1. The construction of the position matrix for an RNA sequence. The stem and its mapping zone are labeled by same number.
We detect possible stems in the sequence by scanning the position matrix row by row. The key is to find all zones of continuous non-zero in the matrix. There is a oneto-one mapping between the stems in the sequence and the zones of continuous nonzero in the position matrix. As shown in Figure 1, two stems in the sequence (Seq) are mapped to two zones of continuous non-zero in the position matrix (MSeq). Finally, we point that the time complexity of the approach mentioned above is O (N2). 2.2 Detecting Conserved Stems by Multiplying the Position Matrices We first define the multiplying of position matrices and then apply it to detecting conserved stems across multiple sequences. Suppose both M1 and M2 are L×L position matrices. Then the resulting matrix (denoted by M) for M1 × M2 is computed by: M [i, j] = M1 [i, j] × M2 [i, j]
(1)
Constructing Structural Alignment of RNA Sequences
211
Here, M [i, j], M1 [i, j] and M2 [i, j] are respectively the i, j element of M, M1 and M2. Specially, the multiplying of the elements must obey following rules: • • •
0×0 = 0; 0×1 = 0; 0× (-1) = 0. 1×1 = 1; 1× (-1) = 1. (-1)× (-1) = -1.
The multiplying of n position matrices is computed by:
M =
n
∏
M
i
(2)
i =1
Here, M is the resulting matrix and all original matrices for multiplying should have the same dimension. Obviously, the multiplying of position matrices satisfies the commutative law and the associative law. We detect conserved stems across n sequences by following steps: 1) 2)
3) 4)
5)
We detect all possible stems in each sequence using the approach described in Section 2.1. We select n different stems from n sequences (one stem for one sequence), and then build the position matrix for each stem using the approach described in Figure 1. We multiply these n matrices according to the equation (2). This is done by multiplying two matrices each time. We detect conserved stems by finding zones of continuous non-zero in the resulting matrix (still denoted by M) generated by 3). There is still a one-toone mapping between the conserved stems and the zones of continuous nonzero in M. We repeat the steps through 2) to 4) until all the stems detected by 1) are selected.
For the step 3), it is easy to be disposed when the length of two stems is the same one. In other cases, for simplicity, we just add some gaps into the shorter sequence at the beginning and the end. Then we construct a new matrix for the new sequence. Finally, we point that the time complexity of equation (2) is O ((n-1) L2), where L is the dimension of the matrix. Suppose m stems are selected from each sequence on the average and the average length of the stem is still L. Then the time complexity for detecting conserved stems across n sequences of length N is approximately O (mn (n1) L2). In practical application, the condition L<< N makes the time and space cost be low. 2.3 Assessing Conserved Stems Using the Signal-to-Noise and the SCFG We assess the conserved stems by computing the Signal-to-Noise. The major steps are: 1) 2)
Given a conserved stem, we record the number of base pairs in the stem as the Signal. We generate a randomized alignment from the original alignment of the conserved stem.
212
X. Fang et al.
3)
We detect possibly conserved stems in the new alignment using the method described in Section 2.2. We record the number of base pairs in the conserved stem newly detected by 3) as the Noise. The Signal-to-Noise (i.e. the ratio of the Signal to the Noise) is thus computed by Signal / Noise. Note that, we set Noise to be 1 if there are no conserved stems in the randomized alignment, and we set Noise to be the maximum if there are more than one conserved stems.
4)
To generate a randomized alignment from the original alignment, we repeatedly permute the columns of each sequence until the common difference between the probability of the new alignment and the probability of the original alignment is less than some given threshold. To compute the probability of an RNA sequence, we introduce a new SCFG model. The SCFG can be defined by the five-tuple Mol = {W, T, Al, E}, where Mol is the SCFG model, W is the set of non-terminals, T is the set of transition distributions, Al is the set of terminals, and E is the set of emission distributions. We define them as follows: i. ii. iii. iv.
W = {start, bifurcation, single, pair, end}. T = [t (w, w’)], where w and w’ belong to W, and t (w, w’) is the transition probability from w to w’. Al = {A, C, G, U, _}, where ‘_’ symbolizes the gap. E = [ew], where w belongs to W. If w is the non-terminal pair, then ew should be e (β, β’). Here, both β and β’ belong to Al, and they exactly form a base pair. If w is the non-terminal single, then ew should be e (γ). Here, γ is a single base which belongs to Al. Specially, the non-terminals, start, bifurcation and end don’t produce any terminal.
Specially, the SCFG presented here decomposes productions into two independent parts: non-terminal transitions and terminal emissions. The production rules can be categorized three classes: the pair rules, the single rules and the others. We describe them in Figure 2. We use the inside-outside [17] method to estimate the parameters of the SCFG. And we change the original inside algorithm to compute the probability of an RNA sequence. Actually, we use the following equation to compute the probability of a derivation tree [17]: l1
P (Tree | Mol) =
∏
l3
l2
ti (pair, w) e (β, β’)
∏
tj (single, w’) e (γ)
j =1
i =1
∏
tk
k =1
(3)
∈W
w, w’
Here, tk is the probability of one rule from (c) of Figure 2, and Tree is the derivation tree which uses l1 pair rules, l2 single rules and l3 others. We use following equation to compute the probability of an RNA sequence: l4
P (Seq| Mol) =
∑
P (Treei | Mol)
(4)
i =1
Here, Seq is the RNA sequence which can be parsed by l4 derivation trees in total given the SCFG, Mol.
Constructing Structural Alignment of RNA Sequences
213
We use following equation to compute the probability of an alignment of n sequences: n
P (D| Mol) =
∏
P (Seqi | Mol)
i =1
(5)
Here, D is the alignment of n sequences and Seqi is the ith sequence of D.
pair
ȕ w’ ȕ’
(probability: t (pair, w’) e (ȕ, ȕ’)) (a)
single single
Ȗ w’ w’ Ȗ
start bifurcation
(probability: t (single, w’) e (Ȗ)) (probability: t (single, w’) e (Ȗ)) (b) w (probability: t (start, w), wę{pair, single}) start start (probability: 1) (c)
Fig. 2. The production rule for Mol. (a) is for the pair rule. (b) is for the single rule. (c) is for the others.
2.4 Constructing Structural Alignment by Incorporating Conserved Stems with Clustal W The central idea for constructing structural alignment is to select some reliable and compatible conserved stems as the constraint for Clustal W. In more detail, we first remove some conserved stems with very low Signal-to-Noise. Then we remove some incompatible conserved stems. This is done by analyzing the relationship between two conserved stems. There are three cases for the relationship between two conserved stems.
1) 2) 3)
Two conserved stems are in tandem. One conserved stem is included by the other. Two conserved stems are overlapped.
It is intractable when one stem from a sequence is in the case of 1). In this case, these two conserved stems are incompatible because we do not exactly know which conserved stem should be selected to include the stem of the sequence. For simplicity, we here remove the conserved stem with lower Signal-to-Noise. As for 2) and 3), it is easily to be disposed because no stems are incompatible with any conserved stem. In these two cases, for any stem in the sequence, it either belongs to only one conserved stem, or belongs to both of them at the same time. After removing unreliable or incompatible conserved stems, we construct an initial alignment on the basis of remaining conserved stems. This is done by aligning sequences according to the alignment of the conserved stems. Then, we align other fragments of the sequences using Clustal W. As shown in Figure 3, (b) is for
214
X. Fang et al.
constructing the initial alignment, (c) is for aligning the fragments which are not included by the conserved stems, and (d) is the final alignment.
Seq1 = A A A C G C G U C C Seq1 = A A U G G G C C C G C C Seq1 = A A C U U U G G G A C C (a) AA _ AA U AA C
…
AC GCGU GGGCCC UUUGGG
(b) _CC GCC AC C
(c)
AA _ A C G C G U _ C C AA U G G G C C C G C C AA C U U U G G GA C C
(d)
Fig. 3. The construction of structural alignment. (a) is the sequence set, (b) is the conserved stems detected by our method, (c) is the alignment built by Clustal W, and (d) is the final structural alignment by incorporating (b) with (c).
3 Results 3.1 Test Data Sets and Test Approach We build the test data sets on the basis of Rfam 8.0. In more detail, we select 107 ncRNA families with sequence identity ranging from 40% to 99%. We download the corresponding alignment for each family and then respectively extract three, four and five different sequences from each alignment. For example, we can obtain 10 (i.e. C 35 ) three-sequence members, 5 (i.e. C 54 ) four-sequence members, and 1 (i.e. C 55 ) five-sequence member in total from one family of 5 sequences. In this way, we construct one three-sequence data set, one four-sequence data set and one fivesequence data set based on Rfam 8.0. To test the performance of our method and compare it with Clustal W, we evaluate the alignments generated by both methods using RNAalifold [5], which is a popular program for predicting RNA secondary structure. The reason for using RNAalifold is that we can not directly acquire the structure information from the alignment. For each sequence set for test, we first generate two different alignments respectively using our method and Clustal W. Then we use these two alignments as well as the original alignment downloaded from Rfam 8.0 as input for RNAalifold. And then we evaluate three different outputs of RNAalifold by computing the sensitivity and the true positive rate of predicted base pairs. Actually, we compute the sensitivity as the number of true positives divided by the sum of true positives + false negatives, and the true positive rate as the number of true positives divided by the sum of true positives + false positives. To compute the sensitivity and the true positive rate, we
Constructing Structural Alignment of RNA Sequences
215
also downloaded the consensus secondary structure annotated by Rfam 8.0 for each alignment. Specifically, Rfam 8.0 contains consensus secondary structures for each alignment either taken from a previously published study or predicted using some covariance-based methods. To make the test data set more reasonable, we remove some seed alignments with only predicted secondary structures. 3.2 Tests for Multiple RNA Sequences We test our method on data sets constructed in Section 3.1. The results are compared with Clustal W, using the parameters suggested by the authors of [8], and are reported in Table 1, Table 2 and Table 3. The second and third columns in the tables are the results for original alignments of Rfam and the last two columns are the results for our method. As shown in the tables Table 1-3, our method exhibits remarkable higher both sensitivity and true positive rate than Clustal W. As for the original alignments of Rfam, our method exhibits comparable accuracy for the tests with Id > 90% and a little lower average accuracy for all tests. On the whole, our method exhibits marked better performance than Clustal W especially for the alignments with Id < 80%. One interesting thing about the tests is that we can’t get any valuable results with Clustal W for all the tests with Id < 70%. We tried to change the parameters of both Clustal W and RNAalifold but still got nothing. Another important finding about the tests is that our method has nearly equivalent accuracy with Clustal W for the tests with Id > 90%. But both our method and Clustal W exhibits lower accuracy than the original alignments of Rfam. Table 1. Sensitivity and true positive rate on three-sequence data set (Id, percentage identity; Se, sensitivity; Tpr, true positive rate) Id (%) <60 60-70 70-80 80-90 90-100 Total
Se.Rfam (%) 61.78 72.11 55.99 56.28 40.08 57.25
Tpr.Rfam (%) 64.57 64.66 43.12 40.91 51.43 52.94
Se.Clustal W (%) 0 0 8.24 51.27 39.9 19.88
Tpr.Clustal W (%) 0 0 7.78 36.81 49.72 18.86
Se. (%) 50.65 56.11 54.17 54.13 39.93 51.00
Tpr. (%) 54.13 48.85 38.47 38.88 50.11 46.09
Table 2. Sensitivity and true positive rate on four-sequence data set (Id, percentage identity; Se, sensitivity; Tpr, true positive rate) Id (%) <60 60-70 70-80 80-90 90-100 Total
Se.Rfam (%) 59.04 73.99 61.39 52.95 48.82 59.24
Tpr.Rfam (%) 65.66 55.24 41.06 39.03 43.82 48.96
Se.Clustal W (%) 0 0 8.24 50.66 47.70 21.32
Tpr.Clustal W (%) 0 0 7.78 37.71 41.21 17.34
Se. (%) 49.91 58.90 57.12 51.13 47.99 53.01
Tpr. (%) 51.22 45.69 38.61 38.74 42.18 43.29
216
X. Fang et al.
Table 3. Sensitivity and true positive rate on five-sequence data set (Id, percentage identity; Se, sensitivity; Tpr, true positive rate) Id (%) <60 60-70 70-80 80-90 90-100 Total
Se.Rfam (%) 60.87 75.98 61.99 51.43 48.61 59.78
Tpr.Rfam (%) 69.10 72.78 40.01 45.68 42.26 53.97
Se.Clustal W (%) 0 0 10.10 49.38 47.93 21.74
Tpr.Clustal W (%) 0 0 9.32 44.72 39.63 19.00
Se. (%) 51.16 68.32 55.89 50.82 48.07 54.85
Tpr. (%) 49.15 59.98 38.99 44.79 41.28 46.84
4 Discussion In this paper, we present and evaluate a new method for constructing structural alignment of RNA sequences. Our method improves Clustal W by detecting and assessing conserved stems across multiple sequences. The valuable idea in our method is incorporating some useful structure information (e.g. the stem) into sequence alignment. The fact that our method takes RNA sequences with unknown structures as input makes it suitable for practical application. Furthermore, the approach for detecting possible stems using the position matrix makes our method greatly differ from other methods. Finally, the algorithm for assessing conserved stems using the Signal-to-Noise and SCFG makes our method robust. Despite the limited amount of data, we have shown in the experiments that our method can construct structural alignments with a much better performance than Clustal W. This should be the correct behavior since Clustal W does not consider any structure information when aligning the sequences. Actually, it is disadvantageous for Clustal W to align the RNA sequences with not conserved primary sequence but conserved structure. Another reason is possibly because the RNAalifold program is vulnerable to alignment errors and thus can’t detect all possible structures in the alignment. But the same issue is still in the tests for our method. In short, we conclude that our method performs much better than Clustal W when constructing structural alignment of RNA sequences. In future work, improving the performance of our method will involve four ways. One potential improvement could be changing the algorithm for generating randomized alignment. In this paper, we accomplish this by permuting the columns of each sequence. Another alterative approach for this could be directly permuting the columns of the alignment and controlling the permutation process using the profile SCFG [17]. Another way to improve our method might be computing the Signal-toNoise by incorporating the free energy with the number of base pairs. Third, the relationship of multiple conserved stems should be analyzed in more detail. In this paper, we ignored some complicated situation and thus possibly reduced the performance. Finally, more data sets and more existing alignment methods should be selected to benchmark the method. Acknowledgments. This work has been supported by the National Natural Science Foundation of China under Grant No. 60673018.
Constructing Structural Alignment of RNA Sequences
217
References 1. Eddy, S.R.: Non-coding RNA genes and modern RNA world. Nat. Rev. Genet. 2, 919–929 (2001) 2. Huttenhofer, A., Schattner, P., Polacek, N.: Non-coding RNAs: hope or hype? TRENDS in Genetics 21, 289–297 (2005) 3. Pace, N.R., Thomas, B.C., Woese, C.R.: Probing RNA structure, function, and history by comparative analysis. In: Gesteland, R.F., Cech, T.R., Atkins, J.F. (eds.) The RNA World, 2nd edn., pp. 113–141. Cold Spring Harbor Laboratory Press, NY (1999) 4. Knudsen, B., Hein, J.: Pfold: RNA secondary structure prediction using stochastic contextfree grammars. Nucleic Acids Research 31, 3423–3428 (2003) 5. Hofacker, I., Fekete, M., Stadler, P.: Secondary structure prediction for aligned RNA sequences. Journal of Molecular Biology 319, 1059–1066 (2002) 6. Ruan, J., Stormo, G., Zhang, W.: An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics 20, 58–66 (2004) 7. Knight, R., et al.: BayesFold: rational 2 degrees folds that combine thermodynamic, covariation, and chemical data for aligned RNA sequences. RNA 10, 1323–1336 (2004) 8. Thompson, J., Higgins, D., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994) 9. Notredame, C., Higgins, D., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302, 205–217 (2000) 10. Sankoff, D.: Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM Journal on Applied Mathematics 45, 810–825 (1985) 11. Hofacker, I., Bernhart, S., Stadler, P.: Alignment of RNA base pairing probability matrices. Bioinformatics 20, 2222–2227 (2004) 12. Mathews, D., Turner, D.: Dynalign: An algorithm for finding the secondary structure common to two RNA sequences. Journal of Molecular Biology 317(2), 191–203 (2002) 13. Perriquet, O., Touzet, H., Dauchet, M.: Finding the common structure shared by two homologous RNAs. Bioinformatics 19, 108–116 (2003) 14. Ji, Y., Xu, X., Stormo, G.: A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics 20, 591–602 (2004) 15. Bafna, V., Tang, H., Zhang, S.: Consensus folding of unaligned RNA sequences revisited. Journal of computational biology 13, 283–295 (2006) 16. Tabei, Y., et al.: SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments. Bioinformatics 22, 1723–1729 (2006) 17. Durbin, R., et al.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University press, Cambridge (1998) 18. Sam, G.J., Alex, B., et al.: Rfam: an RNA family database. Nucleic Acids Research 31, 439–441 (2003) 19. Fang, X., et al.: The detection and assessment of possible RNA secondary structure using multiple sequence alignment. In: The 22nd Annual ACM Symposium on Applied Computing, Seoul, Korea (March 11-15, 2007)
Iris Verification Using Wavelet Moments and Neural Network Zhiqiang Ma1, Miao Qi1,2, Haifeng Kang1,2, Shuhua Wang1,2, and Jun Kong1,2 1
Computer School, Northeast Normal University, Changchun, Jilin Province, China 2 Key Laboratory for Applied Statistics of MOE, China {mazq,qim801,kongjun}@nenu.edu.cn
Abstract. In this paper, a novel and robust verification approach using iris features is presented. Contrasting with conventional approaches, only two iris subregions instead of entire iris, where are nearly not occluded by useless parts such as eyelash and eyelid, are segmented for verification. Gabor filtering and wavelet moments methods are used to extract the iris texture features. In the verification stage, the principal component analysis (PCA) technique and oneclass-one-network (Back-Propagation Neural Network (BPNN)) classification structure are employed for dimensionality reduction and classification, respectively. The experimental results show that the correct verification rate can reach 98.65% using our proposed approach.
1 Introduction Biometrics is a field of knowledge that concerned with the unique, stable and reliable personal characteristics. Many biometric verification techniques dealing with various human physiological features such as finger-print, hand-print, face and retina pattern have been widely researched and used to the security access control systems. A human iris has a unique structure given by pigmentation spots, furrows and other tiny features that are stable throughout the life and complex enough to be used as a biometric feature. Furthermore, iris is an internal organ as well as externally visible, which makes the personal verification systems non-invasive to users. There have been some methods for iris feature extraction which is a key step for verification. For example, Daugman [1] used multi-scale quadrature wavelets to extract texture phase structure information of the iris to generate a 2048-bit iris code. Boles and Boashash [2] calculated a zero-crossing representation of 1D wavelet transform at various resolution levels of a concentric circle on an iris image to extract the texture features. Gaussian-Hermite Moments are employed to characterize the iris features in literature [3]. Balaji et al. applied the ratio of the limbus diameter and the pupil diameter for first stage verification, and use LoG filter to extract hierarchical texture features for final confirmation [4]. Byungjun Son et al. presented one of the major discriminative learning methods, namely, Direct Linear Discriminant Analysis (DLDA) and applied the multi-resolution wavelet transform to extract the unique feature from the acquired iris image and to decrease the complexity of computation K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 218–226, 2007. © Springer-Verlag Berlin Heidelberg 2007
Iris Verification using Wavelet Moments and Neural Network
219
when using DLDA [5]. In our proposed approach, Gabor filter firstly is used to extract the texture of iris images. Then, the wavelet moments method is used extract the features of filtered image. Finally, PCA and BPNN techniques are employed for dimensionality reduction and verification. The rest of this paper is organized as follows. Section 2 introduces the location producer of preprocessing. Section 3 briefly describes the feature extraction methods. The verification process is illuminated in Section 4. Section 5 demonstrates the experimental results to verify the efficiency of the proposed approach. Finally, conclusions are given in Section 6.
2 Preprocessing Image preprocessing is usually the first and essential step in pattern recognition. In our proposed approach, there are mainly two aspects including iris region of interest (ROI) localization and enhancement in the preprocessing module. To increase the verification accuracy and reliability, we compare the important features extracted from the same region in different iris images. The iris of human eye (shown in Fig. 1) is the annular part between the pupil (inner boundary) and sclera (outer boundary). Both the boundaries are taken as circles and concentric. A captured iris image contains not only iris texture region, but also contains a lot of useless parts such as pupil, eyelid and eyelash. Contrasting with most conventional method, we only segment two iris sub-regions (where are nearly not occluded by eyelash and eyelid) for verification. The procedure of segmenting and preprocessing the two iris sub-regions is described as following: 1. As seen in Fig. 1, the intensity of pupil is darker than the intensity of iris and scalar. Find a region with fixed size which has least sum of grey value and crop it. The selection of size is crucial as it should be common for all iris images and encircle entire pupil. Eyelash
Iris Eyelid Fig. 1. Original iris image
2. Binarize the cropped image with an appropriate threshold value. Morphological operations are applied to remove non-pupil regions. 3. Find the basic rectangle of the pupil region. The center of basic rectangle is regard as the center of pupil and the length of longer border is regard as the diameter of pupil. 4. The radius of outer circle (labeled by white broken line in Fig. 1) is two times of inner circle (labeled by white real line in Fig. 1). Iris from different people may be
220
Z. Ma et al.
captured in different size and even from the same eye due to the environmental factors such as distance variations. Therefore, we clockwise unwrap the iris ring (starting from the bottom) to a rectangular block image with a fixed size 48 × 360 shown in Fig. 2(a). 5. Extract two iris sub-regions. We use the texture information of two sub-regions that are only between 61 to 108 and 253 to 300 column, respectively, which are regard as ROIs with the size of 48 × 48 ( shown in Fig. 2(b)). 6. The intensity values of ROIs have low contrast, which don’t reflect the texture information clearly. In order to improve the contrast of the image, intensity transformation, top-hat and bottom-hat transformation [4] are used for contrast enhancement. The enhanced images are shown in Fig. 2(c), from which we can see that the finer texture characteristics of the ROIs become clearer than those in Fig. 2(b).
(a)
(b)
(c)
Fig. 2. (a) Normalized partial iris image. (b) Two ROIs of an iris image. (c) Enhanced ROIs.
3 Features Extraction After the normalized ROIs are obtained from the preprocessing stage, we extract important features from the ROI for authentication task. Gabor filtering and wavelet moments methods are employed for feature extraction in our iris authentication system. 3.1 Gabor Filtering Gabor filter is wildly used to feature extraction [6, 7], which has been already demonstrated to be a powerful tool in texture analysis. The main advantage of traditional Gabor filters (depicted in [6] is the detection of texture direction. Generally, the direction information of iris image seems not significant. Thus, traditional Gabor filters are not suitable for the texture extraction of iris images. Therefore, the circular Gabor filter (detailed in [8] is used form used in our work, which is defined as:
G ( x, y , F ,σ ) =
1 2 πσ
2
{
⎧ x2 + y2 ⎫ exp ⎨ − ⎬ 2σ 2 ⎭ ⎩
exp 2 π iF (
}
x + y ) , 2
2
(1)
Iris Verification using Wavelet Moments and Neural Network
(a1)
(b1)
(c1)
(a2)
(b2)
(c2)[
221
Fig. 3. (a1, a2) The normalized ROIs. (b1, b2) The enhanced ROIs. (c1, c2) The binarized ROIs after using Gabor filter
where i= − 1 , F is the frequency of the sinusoidal wave and σ is the standard deviation of the Gaussian envelope. The filtered image is employed by the real part. Then an appropriate threshold value is selected to binarize the filtered image. Fig. 3(c) shows results of the binarized images. 3.2 Cubic B-Spline Wavelet Moments
The wavelet transform provides analysis functions that are obtained by dilation and translation of an image so that the image can be analyzed in multiple resolutions. The characteristic of wavelet moments method is particularly suitable for extracting local discriminative features of normalized images. Its translation, rotation and scale invariance promote itself as a widely used feature extraction approach [9]. The family of wavelet basis function is defined as:
ψ a,b (r ) =
1 a
ψ(
r −b ), a
(2)
where a is dilation and b is shifting parameters. The cubic B-spline in Gaussian approximation form is:
ψ β (r ) = n
4a n +1 2π (n + 1)
⎧⎪ (2r − 1) 2 ⎫⎪ ⎬, ⎪⎩ 2σ w2 (n + 1) ⎪⎭
σ w cos(2πf 0 (2r − 1)) exp⎨−
(3)
where n =3, a = 0.697066, f 0 = 0.409177, and σ w2 = 0.561145. Since the size r of an image is always restricted in a domain [0, 1], let both parameters be set to 0.5, and the domains for m and n can be restricted as follows: a = 0.5m , m = 0, 1, ..., M
, b = n ⋅ 0.5.0.5m , n = 0, 1, ..., 2m +1 .
(4)
Then the wavelet defined along a radial axis in any orientation can be rewritten as:
ψ mβ , n (r ) = 2m / 2ψ β (2 m r − 0.5n). n
n
(5)
222
Z. Ma et al.
And the cubic B-spline Wavelet moments (WMs) are defined as:
Wm, n, q = ∫∫ f (r ,θ )ψ mβ , n (r )e − jqθ rdrdθ . n
(6)
If N is the number of pixels along each axis of the image, then the cubic B-spline WMs for a digital image f (r , θ ) can be defined as:
W m, n, q =
∑∑ x
r =
x + y 2
f ( x , y )ψ
β
n
m ,n
( r )e − jq θ Δ x Δ y , (7)
y 2
≤ 1,
θ = arctan( y x )
The cubic B-spline wavelet is near-optimal in terms of its space-frequency localization, in the sense that its function takes advantage of the wavelet inherent property of multi-resolution analysis.
4 Verification In this paper, we study the 68 dimensions feature vector (computed by wavelet moments with m=0, 1, 2) for each ROI. Thus, total 136 (68 × 2) dimension feature values are computed from every iris image. To obtain the main information and accelerate the classification speed, popular PCA technique (detailed in literature [10]) is first used for dimensionality reduction and then BPNN is employed for classification. BPNN is one of the simplest and most general methods for supervised training of multilayer neural networks, which has been widely utilized in the field of pattern recognition. We use this well-known approach to perform the verification task. The architecture of our proposed network is designed to be a three-layer-based network which includes an input, a hidden and an output layer as depicted in Fig. 4. Theselayers are interconnected by modifiable weights, which are represented by links
Output Output layer
… … v1 v2 v3
Hidden layer Input layer
v28 v29 v30 Fig. 4. The architecture of BP neural network
Iris Verification using Wavelet Moments and Neural Network
223
between layers. In training stage of BPNN, M image samples of a specific individual X called positive samples and N image samples of other K persons called negative samples are collected to train the BPNN. In order to reduce the training computation time and the size of neural network, each neural network corresponds to only one iris owner. For practical systems, this approach offers another significant advantage: if there is a new person added to database, we only have to train a new BP neural network and needn’t to retain a very large neural network.
5 Experimental Results In our experiments, the image database is come from CASIA Iris Database, which contains 756 iris images of 108 individuals (7 images per individual). Some samples
Fig. 5. Iris samples from the CASIA Iris Database
from the CASIA Iris Database are shown in Fig. 5. The size of every image is 320 × 280 (pixels). The authentication system consists of enrollment and verification stages. In the enrollment stage, three iris images of am individual are collected as the training sample. These samples should be processed by preprocessing and feature extraction modules to generate the matching templates. In the verification stage, a query sample is also processed by preprocessing and feature extraction modules, and then is matched with the templates to decide whether it is a genuine sample or not. The feature vectors of length 136 are projected to a vector of length 30 using PCA experimentally. The results obtained in this paper are evaluated in term of verification rate which is investigated using false rejection rate (FRR) and false acceptation rate (FAR), which are variable depending on the threshold value t set in the output layer of BPNN. The results with the various threshold t and false rates are shown in Fig. 6. The lower the threshold of t, the higher is the probability of accepting the imposter persons. Similarly, the higher the threshold of t, the higher is the probability of rejecting the authentic persons. The system performance measurement is obtained from FAR and FRR which is called correct verification rate (CVR) and calculated as follow:
CVR = (1 − FAR − FRR ) × 100%
(8)
224
Z. Ma et al.
From the results in Fig. 6, the CVR can reach 98.65% obtained by our proposed method, where FAR=0.62%, FRR=0.73% and t=0.83. Comparing with the current existing techniques for iris recognition and classification, our approach employs Gabor filter to extract the texture information and wavelet moments method to extract texture features. PCA technique and BPNN are used for dimensionality reduction and classification, respectively. Table 1 summarizes the results generated by our approach and the other techniques.
Fig. 6. The distributions of FAR and FRR Table 1. Performance comparisons of several methods for iris verification Paper reference [11] [12] [13] [14] [15] [16] [17] Proposed
Accurate rate (%) 100 99.43 99.19 99.05 97.90 95.45 94.91 98.65
6 Conclusions An iris verification system is designed to authenticate the identity of an individual based on biometric iris features. In our experiments, two iris sub-regions are segmented instead of entire iris region, where are nearly not occluded by useless parts such as eyelash and eyelid, which also result in lower computational demands. Gabor filter and wavelet moments methods are employed for feature extraction. Popular PCA technology is used for dimensionality reduction. The samples are verified by the BPNN algorithms. The experimental results demonstrated that our proposed approach
Iris Verification using Wavelet Moments and Neural Network
225
is reliable and effective. Synchronously, we gave some conclusions from researching the experimental results: 1. Wavelet moments methods based on statistic is very suitable for iris authentication task. How many feature values should be computed is an important issue. 2. The CVR is effect directly by the dimension of reduced feature vector taken order with PCA. 3. Failure verification occurred in some irises. The main reason of the failure verification is the severe occlusion by eyelashes or eyelids. The computations were carried out by the software MATLAB 7.0 using the efficient algorithms mentioned in the text of this paper. Although the experimental results demonstrated that the proposed approaches are feasible, we will do more experimentation in a bigger iris database. Novel filters for extracting the texture of iris image and feature extraction methods should be proposed for better recognition performance in our future work. Acknowledgments. This work is supported by science foundation for young teachers of Northeast Normal University, No. 20061002, China.
References 1. Daugman, J.: High Confidence Visual Recognition of Persons by a Test of Statistical Independence. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 1148– 1161 (1993) 2. Boles, W., Boashansh, B.: A human Identification Technique Using Images of the Iris and Wavelet Transform. IEEE Trans. Signal Processing 46(4), 1185–1188 (1998) 3. Ma, L., Tan, T., Wang, Y., Zhang, D.: Local intensity variation analysis for iris recognition. Pattern Recognition 37, 1287–1298 (2004) 4. Ganeshan, B., Theckedath, D., Young, R., Chatwin, C.: Boimetric iris recognition system using a fast and robust iris location and alignment procedure. Optics and Lasers in Engineering 44, 1–24 (2006) 5. Son, B., Won, H., Kee, G., Lee, Y.: DISCTIMINANT IRIS FEATURE AND SUPPORT VECTOR MACHINES FOR IRIS RECOGNITION. ICIP, 865–868 (2004) 6. Kong, W.K., Zhang, D., Li, W.: Palmprint feature extraction using 2-D Gabor filters. Pattern Recognition 36, 2339–2347 (2003) 7. Sanchez-Avila, C., Sanchez-Reillo, R.: Two different approaches for iris recognition using Gabor filters and multiscale zero-crossing representation. Pattern Recognition 38, 231–240 (2005) 8. Zhang, J., Tan, T.: LiMa: Invariant Texture Segmentation Via Circular Gabor Filters 9. Pan, H., Xia, L.Z.: Exact and fast algorithm for two-dimensional image wavelet moments via the projection transform. Pattern Recognition 38, 395–402 (2005) 10. Kumar, A., Zhang, D.: Personal authentication using multiple palmprint representation. Pattern Recognition 38, 1695–1704 (2005) 11. Daugman, J.G.: Biometric Personal Identification System Based on Iris Analysis, U.S. Patent No. 5, 291, 560 (1994) 12. Ma, L., Tan, T., Wang, Y., Zhang, D.: Personal identification Based on Iris Texture Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 1519–1533 (2003)
226
Z. Ma et al.
13. Ma, L., Wang, Y., Tan, T.: Iris recognition using circular symmetric filters. In: Proceedings of the 16th International Conference on Pattern Recognition, pp. 414–417 (2002) 14. Dobes, M., Machala, L., Tichavsky, P., Pospisil, J.: Human eye iris recognition using the mutual information. International Journal for Light and Electron Optics 9, 399–404 (2004) 15. Liam, L.W., Chekima, A., Fan, L.Ch., Dargham, J.A.: Iris recognition suing selforganizing neural network. In: Proc. Student Conference on Research and Development, Shah Alam, Malaysia, pp.169–172 (2002) 16. Sanchez-Avila, C., Sanchez-Reillo, R., Martin-Roche, D.: Iris-based biometric recognition using dyadic wavelet transform. IEEE Aerosp. and Electron. Syst. Mag. 17(10), 3–6 (2002) 17. Ma, L., Wang, Y., Tan, T.: Iris recognition based on multichannel gabor filtering. In: Proceedings of the Fifth Asian Conference on Computer Vision, pp. 279–283 (2002)
Comprehensive Fuzzy Evaluation Model for Body Physical Exercise Risk Yizhi Wu1, Yongsheng Ding1,2,*, and Hongan Xu3 1
College of Information Sciences and Technology, Donghua University, 201620 Shanghai, China {yz_wu,ysding}@dhu.edu.cn 2 Engineering Research Center of Digitized Textile & Fashion Technology, Ministry of Education, Donghua University, 201620 Shanghai, China 3 College of Information Sciences and Technology, East China Normal University, 200062 Shanghai, China
[email protected]
Abstract. It is very important and difficult to determine whether the person who is undertaking physical exercise is safe. A unique approach based on both analytic hierarchy process (AHP) and fuzzy comprehensive evaluation (FCE) is presented to evaluate exercise risk. First, a new evaluation model combining the above two methods, AHP-FCE, is established by applying the fuzzy mathematics into hierarchical assessment. Then, various physical exercises relevant factors are analyzed and represented. In addition, steps of the AHP-FCE based physical exercise evaluation are set up. Finally, an actual calculation example is used to verify the feasibility of the method.
1 Introduction Human body is a complex system. Proper exercise or sport is beneficial to people’s health [1-2], but, excessive and unsuitable exercises may induce serious results, such as fatigue, disease deterioration and even sudden death [3]. The level of health risk is different for different person and in different circumstances. Thus it is necessary to do risk evaluation to prevent harmful results when people do physical training or take up exercises. There have been some researches in studying the effect of exercise to human body and in exercise prescription to people with various diseases [3-5], but there are rare comprehensive models for health risk evaluation pertaining to sports and exercises. There are great challenges to set up comprehensive and systematic exercise risk analysis model. Firstly, the risk of exercise is influenced by so many factors that it is very difficult to synthesize them into a judgment in a common way. Secondly, it is difficult to quantify the health level of a person, the seriousness degree of diseases, the effect of sports, etc, therefore vagueness is introduced. A number of systematic models have been proposed for use in the risk evaluation process. These methods can be classified into two overall categories [6]: 1) classical K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 227–235, 2007. © Springer-Verlag Berlin Heidelberg 2007
228
Y.Wu, Y. Ding, and H. Xu
models (i.e., probabilistic analysis); and 2) conceptual models (i.e., fuzzy set analysis). Examples of classical models include Monte Carlo simulation and influence diagrams. Probabilistic models suffer from two major limitations: 1) Some of these models require detailed quantitative information which is not normally available in the real application world. 2) The applicability of such models to real risk analysis is limited. This is mainly due to the fact that many of the decision problems are imprecise, ill-defined, and vague in nature. Such characteristics are mostly subjective in nature while classical models cannot handle subjectivity. Typical example of conceptual models is the analytic hierarchy process (AHP) [7] and the fuzzy comprehensive evaluation (FCE) [8]. The AHP developed by Saaty is a hierarchical and flexible decision analysis methodology. The FCE is a fuzzy logic framework that provides an appropriate method for representing the fuzzy relationship between factors and the risk evaluation remarks. In this paper, firstly, AHP-FCE based risk evaluation model is proposed. Then, sources of risks of people doing exercises are analyzed and steps of AHP-FCE based physical exercise risk evaluation are set up. Finally, an actual calculation example is given.
2 AHP-FCE Based Comprehensive Evaluation Model In this section, AHP-FCE based model, in which FCE is mainly used at every level of AHP to get composite risk assessment considering both the priorities and fuzzy decisive role of the factors, is set up to deal with the complexity and vagueness in exercise risk analysis. 2.1 Hierarchical Structure of the Risk Evaluation Problem Formulating the risk evaluation problem in a hierarchical structure is the first and probably the most important step. In a typical hierarchy, the top level reflects the overall objective of the risk evaluation problem. The factors affecting the risk are represented in intermediate levels and grouped in cluster. The lowest level comprises the fundamental and detailed sub factors. O F1 F11
F12
F2 F13
F21
F22
F3 F23
Fig. 1. The AHP hierarchical structure
F31
F32
F33
Comprehensive Fuzzy Evaluation Model for Body Physical Exercise Risk
229
2.2 The Relative Weights of the Subfactors Once the hierarchy has been constructed, the relative weights of the elements relating to the same subject of each level must be determined. Prioritization procedure is started to determine the relative importance, namely comparison matrix, of the factors in each level of the hierarchy. Sub factors of the same factor at next high level are pairwise compared with respect to their importance in evaluating the subject under consideration. The descriptive preferences between every two factors would be translated into 9 scale indicated by absolute numbers 1 to 9, where the larger the number, the more important the first factor. The relative weights of the elements of each level with respect to a subject in the adjacent upper level, represented by A={a1,a2,…,am} (0≤ai≤1), are computed as the components of the normalized eigenvector associated with the largest eigenvalue of their comparison matrix. 2.3 Fuzzy Comprehensive Evaluations in Each Level After setting up the laying structure and computed the weights of subfactors, FCE is used to do the evaluation in each level of the structure. Since an important issue in the risk analysis is handling imprecise or incomplete human expertise and knowledge the evaluation involves, FCE is used here to deal with the vagueness of risk factor assessment better. When evaluating a subject, let U={u1,u2,…,um} the set of evaluation factors of FCE, and V={v1,v2,…,vn} is the set of evaluation remarks. Let A={a1,a2,…,am}(0≤ai≤1), acquired in AHP, is the relative weights of the elements of factor set U respect to a subject. Let B={b1,b2,…,bn}(0≤bj≤1) is a fuzzy subset on V and B is the membership degree of the remark vj. In fact B is measures of the possibilities of various remarks in the remark reference set V. FCE can be implemented by a fuzzy transformation: B=A○R.
∈
(1)
Where R is a fuzzy relation on U×V. μR(ui,vj)= rij(i=1,2,…,m;j=1,2,…,n); rij [0,1] is the membership degree of the subject to remark vj from the viewpoint of factor ui. R = (rij)m×n is a fuzzy relation matrix that is called an evaluation matrix of FCE. In order to obtain the fuzzy matrix R, suitable membership functions of different evaluation factors to risk remarks are used according to the characteristic of the factors. When the vector A and fuzzy relation matrix R are known, FCE can be fulfilled by Eq. (1) and also may be expressed as ⎛ r11 ⎜ r (b1 ,b 2 ,...,bn ) = (a1 ,a2 ,...,am ) o ⎜⎜ 21 M ⎜ ⎜r ⎝ m1
Where bj= min(1,
∑
m
i =1
(ai ⋅ rij )) (j=1,2,…,n),
r12 r22 M rm2
L r1n ⎞ ⎟ L r2n ⎟ . K M ⎟ ⎟ L rmn ⎟⎠
(2)
230
Y.Wu, Y. Ding, and H. Xu
Here, the composition ○ can be obtained depending on the mathematical model M(⋅, ⊕), in which, “⋅” is the operation of the ordinary real multiplication and “⊕” is the operation of the bounded sum. 2.4 Overall Fuzzy Evaluation at Top Level The composite evaluation of the risk remarks are then determined by aggregating the results through the hierarchy. This is done by following a path from the bottom of the hierarchy to each subject at the above level and multiplying the weights vector by the fuzzy relation matrix along each segment of the path. The outcome of this aggregation is a normalized vector of the overall weights of the risk remarks. Let V={v1, v2,…, vn} is the set of evaluation remarks, then the normalized overall evaluation value is:
O = ∑i =1 vibi / ∑i =1 bi . n
n
(3)
We will follow the nearest principle to select the risk remarks with the nearest value of O as the result of the evaluation.
3 AHP-FCE Based Physical Activity Risk Evaluation In this section, the factors of health risk when people doing physical exercises are the first concern. Main risk factors are analyzed and classified. Further, the method of quantification of these factors is discussed. Finally, exercise risks are evaluated step by step with the illustration of an example. 3.1 The Factors of Physical Activity Risk and Fuzzy Assessment There are many sources of risks when people doing exercises. The proposed classification scheme is composed of three risk categories: (1) Body medical condition Most risks evolved in physical exercises are body medical condition related. Body medical condition can be further divided into three main sub factors: basic physical information, disease condition, and current physical status. People’s basic physical information (represented as Bas-inf below), such as age, sex, occupation, is a potential factor. Disease condition (Dis-con) is obviously an important exercise risk source for a patient. For example, the American Heart Association (AHA) committee grade the exercise recommendations for patients with heart disease on a scale of 1 (no restriction) to 5 (extreme limitation) [9]. Current physiological status (Phy-sta), including heart rate, blood pressure, respiration rate, is another important indications to whether the current exercise level is safe in real time monitor. In some study the physiological status is acquired by self assessment. In our system, various body vital signs are monitored in real time by physiological sensors and transferred wirelessly to the processing module.
Comprehensive Fuzzy Evaluation Model for Body Physical Exercise Risk
231
(2) Activity load The activity load can be measured by intensity, duration, and frequency. Intensity can be defined by maximum oxygen consumption (VO2max), heart rate, heart-rate reserve, oxygen uptake, metabolic equivalents, or perceived exertion [9]. The percentage of VO2max is equivalent to that of maximum heart rate, which we can figure out from ECG sensor. Usually the suitable activity load can be gotten straightforwardly from guideline and the exercise advice prescribed by the doctor [10]. (3) Environment condition Another potential risk, which is not frequently mentioned, is the environment condition. The common risks factors under this category are those related to air temperature, humidity, wind, noise etc [11]. Actually, only when air temperature is rather high does the effect of humidity on man's heat balance become important. Strong winds can produce unfavorable effects in cold air temperature and cause dangerous effects in very cold weather when below -10 . So, we recombine the factors and considering 3 environmental factors: temperature/humidity (T/H), temperature/wind(T/W) and noise. The assessment of each factor constitutes vague estimates rather than crisp values and we use fuzzy sets in fuzzy logic to evaluate these factors. Membership functions might take on any form, and here we use a simple linear membership function. The risk degree can be expressed as:
r=
0 .5 × (cur _ val − M ) + 0.5 . H −M
(4)
where cur-val, M and H is current, medium and maximum value of the item measured respectively and the membership value of M and H are 0.5 and 1. For example, considering a 45 year old woman patient with light degree cardiac disease (AHA scale 2), whose physical state evaluated is not very well. There are such exercise guideline that the normal safe dose of jogging is in terms of intensity (40–85% VO2max), duration (20–30 mins), and frequency (generally three to five times per week). According to the safe dose range given by the guideline, 62.5% VO2max (M) is the degree of “medium safe” with a membership value 0.5. Then, the risk degree (r) can be calculated by Eq. (4) with the current value. So, the judgment of activity load sub factors for the patient doing jogging with 80% VO2max , 25 mins of duration, and five times per week, can be computed as (0.73,0.5,0.67). Using similar methods, the value of the environment factors ( T/H, T/W, Noise) is set to (0.6, 0.2, 0.5). Using medical guide lines and judgments of the medical experts, estimated risk degree values of this patient’s medical condition sub factors (Dis-con, Phy-con, Bas-inf) may be (0.6, 0.6, 0.3). 3.2 Steps of Physical Exercise Risk Evaluation In this section, the steps followed by applying the AHP-FCE to assess the risk of a man doing exercise are discussed. In addition, an experiment is conducted to testify the feasibility of the model, in which, real time exercise related information is collected and applied into the model to evaluate potential risk. Physiological data acquisition systems PowerLab 8/30 from ADInstruments (8-channel data recording unit with single-channel Bio Amp, MLS360 ECG Analysis Module, temperature sensors,
232
Y.Wu, Y. Ding, and H. Xu
Sphygmomanometer, Biopotential Accessory Kit) is used to record the various exercise physiological data such as heart rate, blood pressure, body temperature, respiration rate, etc. Furthermore, heart rate is used to measure the intensity of exercise. Meanwhile environmental data including temperature, humidity and noise is also recorded. Step I: Structure the factors of the exercise risks into a hierarchy. Fig. 2 is a proposed classification scheme that shows the different sources of risk. Risk Level of Physical Activity Medical Condition Dis-con
Phy-sta
Activity Load
Bas-inf
Intens.
Dur.
Environmental Condition Freq.
T/H
T/W
Fig. 2. The physical exercise risk evaluation hierarchical structure
Ⅱ
Step : Develop the relative weights of the various factors. The importance of the factors and subfactors is determined next. For illustration, a 3 × 3 judgment matrix and weights of sub factors of body medical condition (Dis-con, Phy-sta, Bas-inf) is formed as shown in Table 1. Table 1. Judgments matrix and weights of factors With Respect to Goal Discon Phy-sta Bas-inf
Pairwise Comparing Matrix Dis-con Phy-sta Bas-inf 1 1/2 3 2 1 4 1/2 1/3 1
Relative Importance 0.311 0.539 0.150
Ⅲ
Step : Fuzzy comprehensive evaluations in each level. Firstly fuzzy relation R should be obtained from the membership functions of various evaluation factors to the evaluation remarks. Let V={v1, v2, v3, v4, v5}={very dangerous, moderately dangerous, medium, moderately safe, very safe} is the set of evaluation remarks, and fuzzy value of V is {0.9, 0.7, 0.5, 0.3, 0.1}. For the convenience of calculation, Kth parabolic functions are used to calculate the degree of membership in Eq. (2), where K is set to 1.2: (1) v1 membership function:
⎧0, x < 0.7, ⎪ 1.2 ⎪⎛ x − 0.7 ⎞ F1 ( x) = ⎨⎜ ⎟ ,0.7 ≤ x ≤ 0.9, . ⎪⎝ 0.2 ⎠ ⎪1, x > 0.9 ⎩
(5)
Comprehensive Fuzzy Evaluation Model for Body Physical Exercise Risk
233
(2) v2, v3, v4 membership function:
⎧0, x < 0.5, ⎪ 1.2 ⎪⎛ x − 0.5 ⎞ ,0.5 ≤ x ≤ 0.675, ⎜ ⎟ ⎪⎝ 0.175 ⎠ . ⎪ F2 ( x ) = ⎨1,0.675 ≤ x ≤ 0.725, ⎪ 1.2 ⎪⎛⎜ 0.9 − x ⎞⎟ ,0.725 ≤ x ≤ 0.9 ⎪⎝ 0.175 ⎠ ⎪ ⎩0, x ≥ 0.9 ⎧0, x < 0.3, ⎪ 1.2 ⎪⎛ x − 0.3 ⎞ ,0.3 ≤ x ≤ 0.475, ⎪⎜⎝ 0.175 ⎟⎠ . ⎪ F3 ( x ) = ⎨1,0.475 ≤ x ≤ 0.525, ⎪ 1.2 ⎪⎛⎜ 0.7 − x ⎞⎟ ,0.525 ≤ x ≤ 0.7 ⎪⎝ 0.175 ⎠ ⎪ ⎩0, x ≥ 0.7
⎧0, x < 0.1, ⎪ 1. 2 ⎪⎛ x − 0.1 ⎞ ,0.1 ≤ x ≤ 0.275, ⎪⎜⎝ 0.175 ⎟⎠ . ⎪ F4 ( x) = ⎨1,0.275 ≤ x ≤ 0.325, ⎪ 1.2 ⎪⎛⎜ 0.5 − x ⎞⎟ ,0.375 ≤ x ≤ 0.5 ⎪⎝ 0.175 ⎠ ⎪ ⎩0, x ≥ 0.5
(6)
(7)
(8)
(3) v5 membership function:
⎧1, x < 0.1, ⎪ 1 .2 ⎪⎛ 0.3 − x ⎞ F 5( x) = ⎨⎜ ⎟ ,0.1 ≤ x ≤ 0.3 . ⎪⎝ 0.2 ⎠ ⎪0, x > 0.3 ⎩
(9)
Table 2 shows a calculation illustration for the example we used above. The fuzzy values of factors have been calculated. The weight matrix of various subject have been gotten in step 2. We use the membership function above to compute the evaluation matrix of FCE, R.
Ⅳ
Step : Overall physical exercise risk remark evaluation. The composite weights of the risk remarks are then determined by aggregating the weights through the
234
Y.Wu, Y. Ding, and H. Xu
hierarchy. Using the Eq. (2) from bottom level to the top, we get the overall weights of exercise risk remark, B={0.02,0.5,0.4,0.1,0.01}. Then, the normalized overall evaluation value is computed by Eq. (3) to be: 0.6. The value is between the remark values of moderately danger and that of medium danger, so this person doing current exercise condition has a slight danger. The conclusion is consistent with the evaluation from the subject’s medical expert. Table 2. Evaluation matrix of human exercise risk calculation example R
Factor
Weights
Sub factor
Weights
Factor value
v1
Medical Condition
0.569
Dis-con
0.311
0.6
0.00
Phy-sta
0.539
0.6
0.00
Bas-inf
0.150
0.3
0.00
Intens.
0.557
0.73
0.10
Dur.
0.320
0.5
0.00
Freq.
0.123
0.67
0.00
T/H
0.623
0.6
0.00
T/W
0.239
0.2
0.00
Noise
0.137
0.5
0.00
0.334 Activity load
Environmental condition
0.097
v2 0.5 1 0.5 1 0.0 0 0.9 7 0.0 0 0.9 7 0.5 1 0.0 0 0.0 0
v3 0.5 1 0.5 1 0.0 0 0.0 0 1.0 0 0.1 2 0.5 1 0.0 0 1.0 0
v4 0.0 0 0.0 0 1.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.5 1 0.0 0
v5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.44 0.00
4 Conclusions This is the first time to combine AHP and FCE to analysis and assess the risk level to people’s health when doing exercises. It is full of significant theoretical and practical meaning in healthcare monitoring. There are great advantages using these two techniques together. AHP deals with complex problem hierarchically and give a concrete method to acquire the comparison matrices. On the other hand, FCE takes into account vague and imprecise medical expert’s knowledge, patient information, exercise load and environment condition scientifically and express them concretely in mathematical form. Compared with other study in exercise risk research [3-5], mostly by testing or by field experiment, our comprehensive model can be implemented in wearable embedded system to monitor body’s current status, to assess the risk and to sound an alarm in time when needed. Of course, the theory and methodology presented in the paper need to be further studied and verified in practice. Acknowledgements. This work was supported in part by Program for New Century Excellent Talents in University from Ministry of Education of China (No. NCET-04415), the Cultivation Fund of the Key Scientific and Technical Innovation Project
Comprehensive Fuzzy Evaluation Model for Body Physical Exercise Risk
235
from Ministry of Education of China (No. 706024), International Science Cooperation Foundation of Shanghai (No. 061307041), and Specialized Research Fund for the Doctoral Program of Higher Education from Ministry of Education of China (No. 20060255006).
References 1. Jang, S.J., Park, S.R., Jang, Y.G., et al.: Automated Individual Prescription of Exercise with an XML-based Expert System. In: IEEE-EMBS, Shanghai, China, pp. 882–885 (2005) 2. Hara, M., Mori, M., Nishizumi, M.: Differences in Lifestyle-related Risk Factors for Death by Occupational Groups: A Prospective Study. Journal of Occupational Health 41, 137– 143 (1999) 3. Jouven, X., Empana, J.P., Schwartz, P.J., et al.: Heart rate profile during exercise as a predictor of sudden death. The new England journal of medicine 352, 1951–1958 (2005) 4. Yun-jian, Z., Ji-rao, W., Song-bo, Z.: The Application of HRV in the Healthy and Sports’ Field. Sichuan Sports Science 2, 47–49 (2004) 5. Singh, M.A.F.: Exercise comes of age: rationale and recommendations for a geriatric exercise prescription. J. Gerontol. A Biol. Sci. Med. Sci. 57, 262–282 (2002) 6. Kangary, R., Riggs, L.S.: Construction risk assessment by linguistics. IEEE Trans. Eng. Manag. 36, 126–131 (1989) 7. Saaty, T.L.: The Analytic Hierarcy Process. McGraw-Hill, New York (1980) 8. Feng, S., Xu, L.D.: Fuzzy Sets and Systems. 105, 1–12 (1999) 9. Speed, C.A., Shapiro, L.M.: Exercise prescription in cardiac disease. THE LANCET 356, 1208–1209 (2000) 10. Heyward, V.H.: Advanced fitness assessment & exercise prescription. Human Kinetics, 1– 6 (2006) 11. Donatelle, R.T., et al.: Access to Health, Benjamin Cummings, pp. 20–30 (1996)
The Effect of Map Information on Brain Activation During a Driving Task Tao Shang1, Shuoyu Wang2, and Shengnan Zhang1 1 School of Information Science and Engineering, Shenyang University of Technology No.58, Xinghuanan Street, Tiexi District, Shenyang, 110023, Liaoning Province, P.R. China
[email protected] 2 Department of Intelligent Mechanical System Engineering, Kochi University of Technology, 185 Miyanokuchi, Kami, Kochi 782-8502, Japan
[email protected]
Abstract. Until now, GPRS/GPS/GIS based on vehicle navigation and monitoring systems have been popularly developed to satisfy the demand for the intelligent transportation system (ITS). Such systems provide the large traffic convenience to drivers, but at the same time attach more burdens to drivers for learning about map information. Hence it is worth further verifying the negative effect of vehicle navigation and monitoring systems on drivers. Considering that human driving behavior is strongly relevant to cognitive characteristics, this study will address to the effect of vehicle navigation systems on drivers by means of measuring and analyzing the cognitive state inside brain. In this paper, a relatively new method of multi-channel nearinfrared spectroscopy (NIRS) was used to investigate the brain activation by independently manipulating the cognitive demand in the different cases of a driving simulator. Experimental results indicated that, compared with the case of no map information available, there is no more obvious priority of activation for left brain and right brain in the case of map information available. Meanwhile, there seems to be a complete activation for the prefrontal cortex of left and right brain, suggesting that GPRS/GPS/GIS based vehicle navigation systems may exhaust drivers more easily so as to bring about more danger than traffic convenience under driving environment.
1 Introduction The Intelligent Transport System (ITS) has been comprehensively paid attention to so far. Many modern technologies and products have been popularly developed to satisfy the demand for ITS [1-2]. As a kind of typical product, GPRS/GPS/GIS based on vehicle navigation and monitoring systems now have been applied on automobile for assisting safe and comfortable driving especially within the developed countries such as USA and Japan. Such systems provide the large traffic convenience to drivers, but at the same time attach more burdens to drivers for learning about map information. Hence it is worth further verifying the negative effect of vehicle navigation and monitoring systems on drivers. Considering that human driving
,
K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 236–245, 2007. © Springer-Verlag Berlin Heidelberg 2007
The Effect of Map Information on Brain Activation During a Driving Task
237
behavior is strongly relevant to cognitive characteristics, this study will address to the effect of vehicle navigation systems on drivers from the viewpoint of human cognitive characteristics. If such characteristics could be clarified, it will contribute to the disclosure of the negative effect of vehicle navigation systems on drivers and further the development of modern traffic tools. A few attempts involving human’s driving model [3-6] have been reported so far. However, most of those researches focused on external modeling for driving behavior and remained in lower level of human cognition process [3-5]. Comparatively, the literature [6] adopted predictive method to explore cognitive process of driving from the viewpoint of vision model, but it was difficult to verify the resulting model due to the limitation of owning measure device. Now with the rapid improvement of neurophysiology and electronic technology, some advanced measure devices are developed and the access to finding the internal signal of brain becomes possible. For example, multi-channel near-infrared spectroscopy (NIRS) is a relatively new method to investigate the brain activation, based on changes in oxygenated haemoglobin (O2Hb) and deoxygenated haemoglobin (HHb). Recently, it has been shown that NIRS seems to be able to detect even small changes in O2Hb and HHb concentration due to cognitive demands[7]. With respect to higher cognitive functions, NIRS has been successfully used to assess prefrontal brain activation during various tasks[8][9]. Based on these positive results and on the fact that NIRS is relatively easy to apply, this study will adopt the changes in O2Hb and HHb concentrations of the cerebral cortex to analyze human cognitive characteristics during a driving task. In order to investigate the negative effect of vehicle navigation systems on drivers from the viewpoint of human cognitive characteristics, here, we shall not only construct a driving simulator to induce human’s adaptive driving action, but also measure the changes in O2Hb and HHb concentrations of the cerebral cortex under the different environments of no map available and map available, respectively. The remaining part of this paper is organized as follows: in the section 2, the introduction to experimental system, including a developed driving simulator and used spectrometer. In the section 3, the experimental procedure is proposed. In the section 4, the experimental result and analysis is summarized. The paper finishes with a conclusion in the section 5.
2 Experimental System During the driving process, drivers need to avoid obstacle and arrive at a goal. As a result, driving behavior usually involves huge danger for human life. Considering the minimized economic damage and maximized safe guarantee, alternatively, a driving simulator was developed by using virtual reality technology. Not only driving behavior data can be acquired for further analysis, but also the limit state of human driving can be measured, which is the most distinctive feature compared with real driving. Here, we used the driving simulator as visual stimulus source to elicit a specific cerebral activation pattern. Besides, the measure device of brain activation is also necessary. Near-infrared spectroscopy (NIRS) is a non-invasive technique to measure concentration changes of oxygenated (O2Hb) and deoxygenated hemoglobin (HHb). With respect to higher
238
T. Shang, S. Wang, and S. Zhang
cognitive functions, NIRS has been adopted in many research [7-9], however, no specific NIRS activation paradigm has been reported for the measurement of frontal regions during a driving task. 2.1 Driving Simulator A driving simulator was developed just as shown in Figure 1. Virtual driving environment is implemented on the 17-inch LCD (Resolution 1280×1024). Keyboard is mounted as interaction control device for the driving direction and speed. The specification for computing environment is described as follows: Pentium4-2.66GHz CPU, 512MB memory, windows XP Professional, and the program developed by Visual C++ 6.0 and OpenGL. Refresh time is 100ms for animation.
(a)Driving scene
(b) Keyboard control
(c) Reference route (circle point denotes starting point, arrow denotes end point) Fig. 1. Driving simulator
The Effect of Map Information on Brain Activation During a Driving Task
239
Map information was designed on the right-bottom side of the driving simulator, just as shown in Figure 1-(a). When the middle scroll bar of main window is drawn to the right margin, the driving case becomes the case of no map information available for experiment. Meanwhile considering the predictive performance for practical traffic construction, the traffic scene was designed according to the 216 Route of Shenyang Public Transportation Company of Liaoning Province. Figure 1-(c) shows the reference route. All objects are drawn according to the proportion of 1 pixel: 2 meters. The initial velocity of a virtual car is zero. According to the motion equation (1), the motion trajectory of virtual car can be calculated. ⋅⋅ ⋅ m Y + D Y = F ( force, direction )
(1)
Where m : the quality of the car, 10 kilogram; Y = ( y1 , y 2 )T : the position of the car; ⋅
⋅⋅
Y , Y : the velocity and acceleration of the car, respectively;
⎡5 0 ⎤ D : the coefficient matrix ⎢ ⎥; ⎣0 5⎦ F ( force, direction ) : the composition control based on force and direction ; force : the control value from forward key, 10N; o o direction : the control value from left and right keys, -30 and +30 .
2.2 Hitachi ETG-7100 For near-infrared optical topography, we used 66(22*3) channel spectrometer (Hitachi ETG-7100). A 3*5 array of 8 laser diodes and 7 light detectors was applied resulting in 22 channels of a probe. This was realized by the fact that the near-infrared laser diodes were coded by different wavelengths each in two ranges around 695 and
Fig. 2. ETG-7100
240
T. Shang, S. Wang, and S. Zhang
830 nm (ranges: 675–715 nm and 810–850 nm). The photodiodes (light detectors) were located 30mm from an emitter. Fibers were mounted on a plastic helmet that was held by adjustable straps on the experimenter’s head. The measurement covered an area of 6.75 cm*11.25 cm centered over the electrode position. Light emitting was time coded with 10 kHz. The ETG-7100 monitor measured changes of O2Hb and HHb from a starting baseline. Data were measured with a sampling rate of 10 Hz and further analyzed using the ETG-7100 software. The scale of the hemoglobin quantity is mmol*mm, which means that changes in the concentration of O2Hb and HHb depend on the path length of the near-infrared light in the brain.
3 Experimental Procedure As the frontal lobe and occipital lobe seemed to be relevant for desired cognitive function, three probes were localized over the left prefrontal cortex, the right prefrontal cortex and occipital visual cortex, respectively, just as shown in Figure 3. Oxygenation changes with 22 channels of each probe can be measured during the driving process.
Fig. 3. Position of three probes
Stimuli were presented with the above experimental driving simulator. The stimuli come from a computer monitor placed 60 cm in front of an experimenter. The experimenter has to respond adaptively. Then the response data was recorded and used as a result of behavioral performance. With the ETG-7100 software, the changes of the concentrations of HHb and O2Hb were calculated over the experimental session. For task repetition, we defined a 30 s “baseline”, preceding each active task period (lasting 180 s) and a 30 s “rest” time period following each active task period. For our data, the time course of the measured data was corrected by the ETG-7100 software. After this correction, the resulting data was exported by the ETG-7100 program into ASCII data format and
The Effect of Map Information on Brain Activation During a Driving Task
241
video format. The concentration changes of HHb and O2Hb in the three experimental phases are further analyzed for each channel by means of two-tailed cut. In more detail, we first compared the initial condition separately to the corresponding baseline for each channel. This is done for [O2Hb] and [HHb] to ensure that the conditions lead to signs of cortical activation, that is, to an increase of [O2Hb] and a corresponding decrease of [HHb].
4 Measure Experiments 4.1 Subject One healthy and right-handed experimenter (male, age = 30 years old) was arranged to drive the car from the starting point to end point according to the specific 216 Route twice. The experimental cases include the case of no map information and map information available. Experimenter was free of medication, with no former or actual neurological or psychiatric disorder. 4.2 NIRS Data for No Map Information Available For the prefrontal cortex, the typical tracings for the changes in concentrations of O2Hb during the driving task are displayed in Figure 4. The concentration of O2Hb Time:30s
Time:44s
Time:88s
Time:74s
Time:92s
Fig. 4. Activation change of O2Hb for left and right prefrontal cortex
242
T. Shang, S. Wang, and S. Zhang
increased in partial channels during the active phase compared to the baseline, and subsequently the concentration area of O2Hb for left prefrontal cortex expanded in a smaller scope, while the concentration area of O2Hb for right prefrontal cortex changed the position of concentration area. Finally [O2Hb] declined over the time course of the rest phase compared to the active phase. Time:120s
Time:150s
Fig. 5. Activation change of O2Hb for occipital visual cortex
For the occipital visual cortex, the typical tracings for the changes in concentrations of O2Hb during the driving task are displayed in Figure 5. The concentration of O2Hb varied in partial channels during the active phase. 4.1 NIRS Data for Map Information Available For the prefrontal cortex, the typical tracings for the changes in concentrations of O2Hb during the driving task are displayed in Figure 6. The concentration of O2Hb
Fig. 6. Activation change of O2Hb for left and right prefrontal cortex
Fig. 7. Activation change of O2Hb for occipital visual cortex
The Effect of Map Information on Brain Activation During a Driving Task
243
increased in partial channels during the active phase compared to the baseline, and subsequently the concentration area of O2Hb for left prefrontal cortex expanded in a larger scope, while the concentration area of O2Hb for right prefrontal cortex expanded to a complete scope. Finally [O2Hb] still kept active over the time course of the rest phase compared to the active phase. For the occipital visual cortex, the typical tracings for the changes in concentrations of O2Hb during the driving task are displayed in Figure 7. The concentration of O2Hb keeps high in almost all channels. 4.2 Discussion As expected, in both cases of no map information available and map information available, we found a significant increase in O2Hb over measured frontal brain areas during the active phases compared to the baseline. Most importantly, both of left and right brain were activated in the active phase, with significantly higher concentrations of O2Hb. Meanwhile, the results indicated a missing activation of the prefrontal cortex sometimes. Therefore we can conclude that the driving task of the specific route mainly activates the partial prefrontal cortex. The activation map of brain illustrates specific effects of brain areas of the prefrontal cortex. But these differences between left and right prefrontal cortex did not reach significance. One conclusion can be drawn that in spite of changing the computational load imposed by a given traffic scene, there seems to a tendency of stable and symmetrical activation for left and right brain. In order to compare different cases, more details are illustrated as below: 1)
2)
For the case of no map available, according to the activation maps of O2Hb of prefrontal cortex, the effect that we also found significantly higher [O2Hb] in several points during the active phase compared to the neighboring area was probably caused by the fact of the hemodynamic response. At least two kinds of activation pattern can be found, just as shown in Figure 4. Since the pattern suggests the functional relevance of the frontal lobe for driving tasks, one conclusion is that it is possible to analyze the process of cognitive changes according to the physical position of prefrontal cortex. According to the activation maps of O2Hb of occipital visual cortex, two kinds of activation pattern can be found. Furthermore, one conclusion can be drawn that as environment becomes complex, left brain plays an initiative pole, while right brain closely follows towards the activation degree of left brain. Left brain focuses on problem solving, while right brain promotes active level. There seems to be collaborating areas of brain which each has multiple relative specializations and engages in extensive inter-area collaborations. For the case of map information available, according to the activation maps of O2Hb of prefrontal cortex, right brain completely activates in almost same degree, whereas the left brain uses the only partial brain. Only one kind of activation pattern can be found. Meanwhile, there is no more obvious priority of activation for left brain and right brain. According to the activation maps of O2Hb of occipital visual cortex, the occipital visual cortex keeps active almost in all areas of brain with the same
244
T. Shang, S. Wang, and S. Zhang
degree. Based on the above facts, it can be concluded that the case of map information available costs more energy than that of no map information available. Consequently drivers will become tired more quickly. If the case of map information available is looked on as the case with GPRS/GPS/GIS based on vehicle navigation and monitoring systems, the case of no map information available is looked on as the case without corresponding system, the kind of system will exhaust drivers more quickly so that the driver may face more potential danger, although it helps drivers search path conveniently. One limitation of our study is the missing specific effect for the HHb concentrations. In contrast to the specific concentration changes in O2Hb, concentration of HHb decreased in both conditions during the active phase compared to the baseline. In contrast to other studies claiming that HHb is more sensitive than O2Hb, our explanation for the absence of HHb is that, given the wavelengths used by the ETG-7100 system, [O2Hb] estimations are considerably more precise than estimation of [HHb], so that weaker effects in this parameter might not become statistically significant. Of course, this proposed analysis should be considered in further studies. At the same time, this shortcoming of low spatial resolution of the ETG-7100 equipment has been verified.
5 Conclusions In this paper, based on developed driving simulator, we used the NIRS method to investigate the functional activity of the cerebral cortex during the different driving tasks and discussed the negative effect of vehicle navigation and monitoring systems on drivers. After establishing a consistency with earlier research, the study produced three conclusions: firstly, in spite of changing the computational load imposed by a given traffic scene there seems to be a stable effect in a number of collaborating areas of brain, including left brain and right brain. It suggests that driving cognitive process that operates on different levels of environment may nevertheless draw on a shared infrastructure of cortical resource. Secondly, for the case of no map information available, left brain plays an initiative pole, while right brain closely follows towards the activation degree of left brain so that there is a tendency of left and right brain for symmetrical activation, suggesting that it may contribute to the construction of an easier model for driving cognitive process. Thirdly, for the case of map information available, there is no more obvious priority of activation for left brain and right brain, but with high activation degree. Meanwhile, there is a complete activation for the prefrontal cortex of left and right brain. Since map information attaches more burdens on brain, it suggests that the GPRS/GPS/GIS based on vehicle navigation and monitoring systems may exhaust drivers more easily so as to bring about more danger than traffic convenience under driving environment. We believe such conclusions will provide guide to explore well to those issues of developing computational theories for cognitive process, but also contribute to rehabilitation of those with cognitive deficits.
The Effect of Map Information on Brain Activation During a Driving Task
245
References 1. Tsugawa, S.: Automobile Driving Support Adjustable for Drivers. AISTTODAY 4, 12 (2004) 2. Akamatsu, M.: Driving Support System Based on Driving Action Data. AISTTODAY 14, 11 (2004) 3. Nechyba, M.C., Xu, Y.: Human Control Strategy: Abstraction, Verification and Replication. IEEE Control Systems Magazine 17, 48–61 (1997) 4. Koike, Y., Doya, K.: A Driver Model Based on Reinforcement Learning with Multi-Step State Estimation. Trans. IEICE Japan D-II 84, 370–379 (2001) 5. Shihabi, A., Mourant, R.R.: A Framework For Modeling Human-Like Driving Behavior For Autonomous Vehicles in Driving Simulators. In: Proc. The Fifth International Conference on Autonomous Agents, pp. 286–291 (2001) 6. Mizutani, K., Saito, G., Omori, T., Ogawa, A.: A Feasibility study of Cognitive Computation Model for Driver’s Process Estimation from driving Behavior. The Transactions of the Institute of Electrical Engineers of Japan, 967–975 (2005) 7. Jasdzewski, G., Strangman, G., Wagner, J., Kwong, K.K., Poldrack, R.A., Boas, D.A.: Differences in the hemodynamic responseto event-related motor and visual paradigms as measured by nearinfrared spectroscopy. Neuroimage 20, 479–488 (2003) 8. Fallgatter, A.J., Strik, W.K.: Reduced frontal functional asymmetry in schizophrenia during a cued continuous performance test assessed with near-infrared spectroscopy. Schizophrenia Bulletin 26, 913–919 (2000) 9. Herrmann ∗, M.J., Ehlis, A.-C., Wagener, A., Jacob, C.P., Fallgatter, A.J.: Near-infrared optical topography to assess activation of the parietal cortex during a visuo-spatial task. Neuropsychologia 43, 1713–1720 (2005)
Worm 5: Pseudo-organics Computer and Natural Live System Yick Kuen Lee1 and Ying Ying Lee2 1
Sun Yat-sen University, Software School
[email protected] 2 University of Leeds, School of Biological Science
[email protected]
Abstract. Life began hundred million years ago; it started from simple inorganic substances to intricate multi-cellular organics. Gradually, the brains of higher animals developed emotions and intelligence. These are well illustrated by the learning abilities and social behaviors of man. Those intelligent activities, progress from simple to complicate, primitive to sophisticated processes from incarnate to abstract. Man started to create artificial intelligent to enhance their brain capabilities sixty years ago. Here we are making a comparison between the natural and artificial intelligent, and see what we can emulate more from Nature. And disclose the author’s point of view about the creation of natural lives.
1 Introduction In a repetitive environment, events will happen with a sequenced order, with some rules govern them. Organisms can remember what happen before, and use knowledge from past experiences to predict what will happen in the future, give proper reaction. This enhances the survival of those organisms. For instance, Earth rotates around the solar system and spins about its own axis just provide such cycles are the fundamental dogma of physics and astrology. Homeostasis is the basis of survival in organisms. This delicate balance is maintained by constant interaction of the environment and organism. The organism picks up information from the ambient environment, and a proper cellular response is then elicited in the system. This instinct prolongs survival. Another characteristic of life is reproduction. Organisms can produce offsprings with similar genetic material. So even with the death of the older generation, their next generation survives. This life cycle continues the proliferation and survival of a species. The two characteristics discussed above involve information amplification. These processes require energy, which is obtained from the organism’s surrounding. However, the resources from the ambient environment are finite. Moreover, it is the nature of life to end. The equation of life entails development, reproduction and, inevitably, death. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 246–253, 2007. © Springer-Verlag Berlin Heidelberg 2007
Worm 5: Pseudo-organics Computer and Natural Live
247
Mutation occurs during reproduction of organisms. In addition, influence exerted by external environment enhanced the mutation. Such mutates create a diversified gene pool, resulting differing phenotypes to manifest. Darwinism states that organisms with the most desirable phenotypes would survive and proliferate. This process of natural selection constitutes to evolution of organisms. Before the 20th century, science was divided into a multitude of different domains such as, Mathematics, Physics, Chemistry, Biology, Pathology, and Philosophy. Life sciences were more closely associated with philosophy and religion studies by many. Little did they know that genetics has a molecular basis when genetic theory basis. The study of human genome has united the different fields of studies. Men already possess the ability to create artificial intelligence. Compared to Nature, which went through millions of years of evolution, this new advancement has only a brief history of less than a century. There are more to be explored in this exciting field. The age of information revolution has already begun. Where do we begin? In the 21st century, theoretical physics developed further into String Theory1 while time began when the ‘Big Bang’ occurred. If life is defined as maintaining homeostasis and reproduction, it could encompass different energy levels in atomic structure to black holes in the galaxy, or to silicon polymer chain. Under different temperature, pressure and time frame, these seemingly ‘lifeless’ materials may display properties pertaining to life. However, to simplify matters, we shall stick to carbon chain, which constitutes all organisms.
2 Genetic Memories - Steady and Slow Mechanism Genetic information is encoded using four different types of bases. They are namely adenine (A), thymine (T), cytosine (C) and guanine (G). A series of DNA transcription and translation should code for proteins that are essential for life. From basic structure of cells to intricate cellular mechanisms, proteins are always, some way or another, involved. Due to the highly stable structure of DNA, the whole process of mutation is slow and dreary. Therefore, evolution does not happen in a split second; it takes many, many generations to occur. It takes a long time to interact with its environment.
3 Prokaryotes Versus Von-Neumann Machines Prokaryotes, like bacteria, are the simplest life forms that exist. They are made up of a single cell. Only a bi-phospholipid layer separates the internal and external environment of the organism. Like the Maxwell’s devil that can distinguish black and white particles in statistic theory, the protein channels embed in lipid wall can distinguish molecules and determine which can go pass the bilipid layer. DNA in the chromosome codes the inheritance information. During transcription, RNA polymerase use the template strand of DNA from to assembled a strand of mRNA2. Those mRNA will than translated into its corresponding polypeptide or enzyme by ribosomes and RNA
248
Y.K. Lee and Y.Y. Lee
polymerase during translation. Proteins (e.g. enzymes) initiate and regulate cellular activities. Organisms assimilate materials in the outside environment, and start the reproduction mechanism at maturity to produce more individuals to proliferate. Using the Von-Neumann machine, as a normal computer. Steady binary signals recorded in magnetic or optical media, coding the program and data information, CPU use Cluster of records as a template transcript them into high speed cache, from which program execute CPU instructions and manipulate electric signals as well as program instruction itself, and perform data input and output. Transformation of electric signals initiates intelligent activity, and control those intelligent activities themselves. Under control of human, program may be regurgitated and develop continuously. The two mechanisms mentioned above have many similarities. The core information is stored in a simple digital format in sequence and can be retrieved segment by segment. After spreading, the information transforms into a more power status, adjusting its environment, and feedback its signals, control complicate activities flow. Natural organisms maintain their lives, and reproduce offspring generation by generations. The born, growth, reproduction mutation become a complete cycle. And go through by the organism itself, no external interference will necessary except absorb energy from the environment. On the other hand, the name Von-Neumann Machine implies incompleteness, Man make computer machine hardware, software are written by man, turn on, turn off, and management all done by human. The computer development and evolution depends on human too. As a tool for extension human intelligent it cannot independently exist without interference from mankind.
4 Clustering of Lives - Eukaryotes, Multi-cell Organisms Prokaryotes exchange information through chemical reaction between molecules. The free path length and moving speed limits their sizes. So organism with simple structure is usually small. Individuals come together and form a larger body by communication. High coherence, low coupling function group usually clustered together, more often than not, the organization proves to be adaptable to the environment Eukaryotes could be treated as a cluster of lysosome, ribosome, mitochondrion, nucleus, etc. similar to prokaryotes packed together[1]. For higher animals, the structure of organization complicates to give rise to different hierarchies within the system. Cells in multi-cellular organism differentiate into different cells types to form tissues, organs and systems. Organisms can come together to form a community, like a herd of lions. In the field of artificial intelligent, single instructions usually combine into subroutines, and put together to be routine library, and high coherence, low-coupled structures will put together to form objects. Then come operating systems and application systems. Hierarchy is an important aspect in big systems. For hardware, single PC usually connected into a network, enterprises network most probably include function groups of file server, data base, mail server etc. then, linked together by internet.
Worm 5: Pseudo-organics Computer and Natural Live
249
5 Data Carrier - Nerve Network and Internet To maintain homeostasis in an ever-changing environment, one mechanism is negative feedback. For example, cells selectively synthesize lactase to break down lactose when there is no glucose present for glycolysis. Under the regulation of cAMP, dictyostelium will transform from amoebae to fruiting body[4]. Incretion, exocrine and inducer during embryo fetation are example of chemical signals. Chemical signal promulgate only in a small range, as the space enlarged, the speed of molecule diffusion limits propagation efficiency. Nervous system then developed in multi-cellular organism during evolution. Sensory receptor detects signals by inducing a change in the electric potential across cell membrane, transmitted along axons of nerve cells, to central nervous system (CNS) for interpretation and response. Electrical signal can be conducted more rapidly than chemical signals. Organism with nerve system survives better in a competitive environment. Together with the development of CNS, Man could communicate with voice or writing hundreds of years before, nations distributed in separated area depends on geographical location. Nowadays, by using electric signals, telegram, telephone, radio, shorten the nation’s distances, unifies the culture of different area. During the last ten years, optical fiber and Internet technology enhance the communication, increase efficacy and efficiency; making it more possible for globalization. Will Internet become a higher lever nervous system of human being?
6 Logical Network - Another Form of Memory Inchoate organism, like jellyfish, their nerves are divided into two types - sensory and motor, without CNS[2]. The batter like annelid, neurons cell bodies cluster in ganglion, have fixed format reaction to stimulations signals. The numbers of cells in eelworms are fixed[4] determined their genetic materials. Nerve memory is not a feature in them. For higher animals, CNS are highly developed. Sensations such as touch, smell, vision, hungry, cool, warm and pain can be detected. The sensory receptors of the peripheral nervous system transit impulse from sense organ to CNS, exciting the corresponding area. CNS processes such signals in sensory cortex of the brain, relays information to the motor cortex to transmit impulse to the appropriate effecter organs2. These reactions effect their survival. Nervous conduction includes chemical signals, such as direct acting substances like neurotransmitter (e.g. adrenaline and dopamine), neuropepties and Adenosine Triphosphate (ATP). In the early stage of the life, genetic information controls the structure and function of the nervous system. Most of the brain structure is an empty cavity4. However, as the system develops, sensory signals occur simultaneously, when the stimulation is above threshold level, or it happens frequently enough, synaptic connections form between the dendrites and axons in the brain cells. They connect and build a memory system based on the topology of the nervous network.
250
Y.K. Lee and Y.Y. Lee
There are some theories states that: Regulated by hippocampus in the brain, the excited cells would be connected by new nerves5, after that, stimulate only part of the cells that would transmit to others, thus, conditioned reflex, or memory forms. Conditioned reflex changes the physical structure of the brain, creating character of an individual. Compare nerve reaction with artificial process controller, the leaky channels, ligand channels, and voltage-gated channels in the dendrite and axon, are quire similar to the logical gate in electronic circuit where input signals, after transformation by the gate array, transmitted to control output devices. Field Programmable Gate Array (FPGA) is just like the nervous network in the brain[3]. However, programming of such device has to be performed by human at this stage, they cannot change their structure according to the input strength or frequency. Search engines in the Internet find the target address, redirect the signal, and get the necessary information from there. This type of operation, works in cooperation of Von-Neumann machine and logical gates. Pseudo-organic computer, like “WORM” (see below), however, has appropriate hardware, change communication connection between elements according to their structure instructions[7]; simulate the operation mode in the brain, making a foundation to the simulation of conditioned reflex.
7 Formation of Knowledge - Abstract Abilities Higher organisms have nervous system, connecting stimulate and reactions for body survival. Emotion expressed in pleasure, anger, sadness, and cheer, etc. That is combination a series of activities, including body movements, expression in face, sounds, incretion and secretion etc5. Comfortable feeling initiate organism’s intention; Avoiding pain is a basic instinct of most organisms. Those reactions, shows sequence or association of ordered events. Higher animals can acquire knowledge from past experiences, concludes rules that may predict what might happen in the future, making decisions earlier. As the brain developed in different stage, different species, having their means of reaction in quire a different ways, direct or indirect, sort terms and long terms, simple to complicate, incarnate and abstract. A lot of memories may have similarities and forms more abstract concepts. Abstract concepts can reduce memories element required, enlarge the range of concept, increases information process efficiency. For example, from many incarnate concepts like cow, sheep, grass, flower, tree, bird, door, window, we can distinguish the more abstract concepts of creature, plant, building structure, etc. The concept of abstract can be organized by hierarchy. The higher animals have the more abstract concept forming ability. Language is one of such abstract connecting concepts and voice. An ability to predict is intelligent or knowledge. A huge quantity of data, after cleaning, integration, selection, transformation, mining, and pattern evaluation, can be converted to useful information, concluded to rules to make predictions. Most work in developing artificial intelligent is knowledge discovery (mining) from large database6, search for frequent item sets; calculate the degree of association, degree of support, degree of confidence, finding associated relations. This way of knowledge discovery is similar to what the Natural does.
Worm 5: Pseudo-organics Computer and Natural Live
251
However, most of the artificial intelligent developed depends on human programming in a Von-Neumann machine. Some of which can keep the training data in permanent memories, but it is still relatively primary, cannot find their goal automatically, no emotion, no self-recognize, no reproduction, and cannot develop independently.
8 Formation of Characteristic, Self-cognize Genes records the information of species, reproduce similar organism, creating the innate instincts. However, the individuals of higher organism have their own characteristics. The characteristics of each individual are generated by memories during their growth. Genes stores the common traits of a species. Memories and experiences influence an individual’s characteristics. Organic clone, like identical twins, may have identical genes, but the personal experiences may not be the same when they grow up. Albeit the identical genetic constituents, twins may very well have different personality traits. Who has seen two trees that are replica of each other? Who am I? When you look into a mirror and think who is that, you may be making a soul-searching journey. Yourself, just are a sequence of memories start from your childhood, perceive of outside world, goals of tracing and evasiveness brought from your genes and experiences.
9 Man, Civilization and Abstract Lives Out of all the life forms that exist, Man stood out from the rest with advanced abstract intellect and imagination. This gift of intelligence allows people to acquire sophisticated language abilities. They were able to express emotions and distinguish themselves from others. Early civilization relied on verbal interactions to impart knowledge and history. As Man progressed, they were able to use writings to record knowledge and history. This more tangible form of documentation allowed Man to accumulate year’s experiences and pass down to the next generation. In 16th century Europe, the emergence of printing supplements the spread of knowledge and information. With that, it boosted the development of ancient science to modern technology. Drawing and writing are forms of memory of individuals, shared by a group of people. It is an improvement beyond innate memory. Most organisms’ memories disappear as they die. However, writing can impart human knowledge from generations to generations. We know the cataclysmic stories of ancient times and learn Euclid geometry because our ancestors have written down their knowledge so that we can learn. Writing creates human culture, experiences of a nation for thousands of years, can be recorded in books. Printing technology allows spread of knowledge to be easier. Schools and universities regulated education bring up more systematic sciences. Culture is an abstract form of life, Abstract life can be defined as any form of existence have memory ability, can reproduce itself. A nation, a society, an
252
Y.K. Lee and Y.Y. Lee Table 1. Comparison of Natural and Artificial Organism
Existing Years Simplest Form Hierarchy Examples
Nonneural Organism 3.8 billion
Neural Organism 400 million
Bacteria
Jellyfish
Cell fungi protist
Von-Neumann Machine
Computer Net-work
100
60
40
Electricrelay
Personalcomputer
Worm Fish Mammal human Neural gate infrastructure
TTL IC ProcessControl FPGA Electric gate network
Minicomputer Mainframe Seq-uence of binary code
Local area network Internet
Organ Function groups Neural pulses in parallel
Function groups
Files Program Objects Electric pulse words in Series 500 mb/s per bus channel
Logic Gate
Memory Format
Sequence of tetrad code
Structure Units
Organelle
Internal Data Carrier
Chemical Molecule
Propagate Speed
Molecular diffusion
100 ms/m in parallel
1 us/gate in parallel
Data Processor
Multiple ribosome
Multiple axon gate
Multiple local gate
Process Manner
Segment and parallel
Hierarchy parallel
parallel
Process Cycle Efficiency
seconds
0.05 seconds high
1 us Medium
Medium
Electric pulse in parallel
PseudoOrganic Computer --WORM
Binary code with infrastructure Hosts Nodes Servers Electric pulse packets in Series 100-1000 mb/s per line
Binary code with infrastructure Hardwaresoftware objects Optical pulse in packets of words 10-40 gb/s per optical channel Multiple CPU and routers Hierarchy parallel of sequential elements
30 ps
Multiple CPU and routers Hierarchy parallel of sequential elements 30 ps
Low
Very high
Very high
Single or multiple CPU Segment and sequential
30 ps
organization, a company or even software can be treated as an abstract life. VanNeumann Machine is a basic form of artificial life; Internet is also an abstract artificial life.
10 A Sample of Pseudo-organic Computer – “WORM” After billions of years wash out, the natural organism evolved many brilliant methods that are worth imitating, generation of artificial intelligence.
Worm 5: Pseudo-organics Computer and Natural Live
253
Base on a sequence of binary code, Von-Neumann machine has a tight structure, easy to program. However, logic gate similar to the brain has more processing efficiency. Objects oriented programming technique, save information in distributed objects, which communicate each other by passing massage. Network of hardware or network of software together with its infrastructure constituted as part of memory information. Communication efficiency becomes an important issue. A pseudo-organic computer project, named “WORM”^, clustering a lot of computing resources by photo-electric coupling data bus to form an object oriented networked task services system. Jump out of the frame restricted by the VonNeumann Machines. Beside the instruction flow control, use structure control to combine the multiple processors7. New computer will improve the internal communication system; dynamically control the common bus and other resources.
11 Conclusions As a tool of prediction, Pseudo-organic computer imitate only the brain manipulating information, and does not involved in reproduction itself. Can man create an artificial intelligent organism with reproduction ability? The possibility of imitate organic seems no doubt; however, there are practical problems in technology. Furthermore, it is a very difficult problem. To make computer, or even something simpler, just like paper, metal wire, plastic sheet, etc. needs the entire industry cooperation system. Nowadays, only a few countries can produce full set of computer. It is far from making a single machine that can replace the whole industry system. Philosophers argue whether God creates man, or Man creates God, according to their own image. However it can be sure that men create computer according to their image. Although the artificial intelligent is still simple and crude, Pandora's box has already opened. Successful breaking through and translate the genetic code in 21st century prelude the information revolution. The secret of lives and intelligent would be disclosed, gradually.
References 1. Baaquies, B.E., Kwek, L.C.: Superstrings, Gauge fields and Black holes 2. Miller, L.: Biology. Pearson Education Inc. (2006) 3. Zeidman, B.: Designing with FPGAs & CPLDS. Mei Guo CMP Gong Si Jiao Chuen Beijing Hang Kong Hang Tian Da Xue Chu Ban She Chu Ban (2002) 4. Muller, W.A.: Translated by Huang Xiu Yin. Fa Yu Shen Wu Xue. Beijing Gao Deng Jiao Yu Chu Ban She (2000) 5. William, G.J., Stanfield, C.L.: Principle of Human Physiology, 2nd edn. Pearson Benjamin Cummings (2005) 6. Han, J., Kamber, M.: Data Mining: Concept and Techniques 7. Shen, X.B., Zhang, F.C., Feng, G.C., Che, D.L., Wang, G.: The Classification Model of Computer Architectures. Chinese Journal of Computers 26 (2005) ^
For more details WORM, do refer to Y.K Lee. 2006. Worm 1: A Pseudo-organic Computer’s System Structure.
Comparisons of Chemical Synapses and Gap Junctions in the Stochastic Dynamics of Coupled Neurons Jiang Wang, Xiumin Li, and Dong Feng Tianjin University, School of Electrical Engineering and Automation, 300072 Tianjin, China
[email protected]
Abstract. We study the stochastic dynamics of three FitzHugh-Nagumo neurons with chemical coupling and electrical coupling (gap junction) respectively. For both of the coupling cases, optimal coherence resonance and weak signal propagation can be achieved with intermediate noise intensity. Through comparisons and analysis, we can make conclusions that chemical synaptic coupling is more efficient than the well known linear electrical coupling for both coherence resonance and weak signal propagation. We also find that neurons with parameters locate near the bifurcation point (canard regime) can exhibit the best response of coherence resonance and weak signal propagation.
1 Introduction Noise-induced complex dynamics in excitable neurons have attracted great interest in recent years. The random synaptic input from other neurons, random switching of ion channels and the quasi-random release of neurotransmitter by synapses contributes to the randomicity in neurons [1]. While in contrast to the destructive role of noise, such as disorder or destabilize the systems, in some cases noise play an important and constructive role for the amplification of information transfer. Particularly, in the presence of noise, special attentions have been paid to the complex behaviors of neurons that locate near the canard regime [2-7], where neurons can exhibit great sensitive to external signal. This is important and meaningful for weak signal processing which guarantees low energy consumption in biological systems. As investigated in [2;3;6], such neurons possess two internal frequencies which correspond to the standard spiking and small amplitude oscillations (Canard orbits) respectively. For the former, it is just the frequency of the most regular spiking behavior purely induced by intermediate noise intensity, which is known as Coherence Resonance (CR). For the latter, the subthreshold oscillations are critical in the famous Stochastic Resonance (SR) phenomenon, which describes the cooperative effect between a weak signal and noise in a nonlinear system, leading to an enhanced response to the periodic force [8]. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 254–263, 2007. © Springer-Verlag Berlin Heidelberg 2007
Comparisons of Chemical Synapses and Gap Junctions in the Stochastic Dynamics
255
Recently, E.Ullner gave detailed descriptions of several new noise–induced phenomenon in the FHN neuron in [1]. They showed that optimal amplitude of highfrequency driving enhances the response of an excitable system to a low-frequency signal [9]. They also investigated the Canard-enhanced SR [3], the effect of noiseinduced signal processing in systems with complex attractors [10] and a new noiseinduced phase transition from a self-sustained oscillatory regime to an excitable behavior [11]. And in [12], C. Zhou etc. have demonstrated the effect of CR in a heterogeneous array of coupled FHN neurons. They find that both the decrease of spatial correlation of the noise and the inhomogeneity in the parameters of the array can enhance the coherence. However, most of the relevant studies considered the single neuron [3;5;13] or neurons with linear electrical coupling (gap junctions) [4;12;14;15]. Only in [16], another very important case—nonlinear pulsed coupled neurons with noise were investigated. In the case of chemical (nonlinear) coupling, they observed a substantial increase in the CR of Morris-Lecar models, in comparison with the (linear) electrical coupling. Therefore, inspired by [16] and based on our previous work on canard dynamics of chemical coupled neurons [17], we study the effects of chemical synapses on CR and the enhancement of signal propagation in three coupled FHN neurons, which locate near the canard regime and are subjected to noisy environment. In particular, in order to investigate the signal propagation, only one of the neurons is subjected to external periodic signal. With the optimal noise intensity, chemical coupled neurons, due to the selective couplings between individuals, can enhance CR and exhibit much better response of external signal than the electrical coupled ones. This paper is arranged as follows: in Sec. II, we give descriptions of the neuron model and two kinds of coupling; In Sec. III and , comparisons are made between the chemical coupled neurons and electrical coupled neurons for coherence resonance and information transfer respectively; finally, we make conclusions and discussions in Sec.V.
Ⅳ
2 Neuron Model and Coupling Description We consider three bidirectional coupled Fitz Hugh-Nagumo (FHN) neurons which is described by
1 3 ⎧ app syn −I ⎪ε Vi = Vi − 3 Vi − Wi + I i ⎨ ⎪W = V + a − bW + B cos(ω t ) + Aξ (t ) i i i i ⎩ i
(1)
where i = 1,..., N index the neurons, a, b and ε are dimensionless parameters with ε 1 that makes membrane potential Vi as fast variable and recovery variable Wi as slow variable. ξ i is an independent white Gaussian noise with zero mean and intensity A for each element. Bi cos( wt ) is the forcing periodic signal. I app and I isyn is the external applied current and the synaptic current through neuron i respectively. For the linear diffusive coupling (gap junctions),
256
J. Wang, X. Li, and D. Feng
I isyn =
∑
g syn (Vi − V j )
j∈neigh ( i )
(2)
where g syn is the conductance of synaptic channel. For the nonlinear pulsed coupling (chemical synapses), we refer to [18;19] and I isyn is defined as I isyn =
∑
g syn s j (Vi − Vsyn ),
j∈neigh ( i )
(3)
where g syn is the synaptic coupling strength and Vsyn is the synaptic reversal potential which determines the type of synapse. In this paper, considering the excitatory synapse, we take Vsyn = 0 . The dynamics of synapse variable s j is governed by V j , and it is defined as ⎧ s j = α (V j )(1 − s j ) / ε − s j / τ syn ⎪ α0 ⎨ ⎪α (V j ) = 1 + exp(− V V ) j shp ⎩
(4)
where synaptic decay rate τ syn is written as τ syn = 1 δ . The synaptic recovery function α (V j ) can be taken as the Heaviside function. When the neuron is in silent state V < 0 , s is slowly decreasing, the first equation of (5) can be taken as s j = − s j / τ syn ; while in the other case, s fast jumps to 1 and thus makes action to the postsynaptic cells. The parameters used in this paper are respectively a = 0.7 , ε = 0.08 , Vsyn = 0 , α 0 = 2 , Vshp = 0.05 , I app = 0 and the rest parameters are given in each case. In this model, b is one of the critical parameters that can significantly influence the dynamics of the system. For the single neuron in the absence of noise, AndronovHopf bifurcation happens at b = 0.45 . As b > 0.45 , it is excitable and corresponding to the rest state; while as b < 0.45 , the system possesses a stable periodic solution generating a periodic sequence of spikes. Between these two states there exists an intermediate behavior, known as canard explosion [19]. In a small vicinity of b = 0.45 , there are small oscillations near the unstable fix point before the sudden growth of the oscillatory amplitude. This canard regime tends to zero as the parameter ε → 0 . Here we take ε = 0.08 as used in [20], and in this case canard regime exists for b ∈ [0.425, 0.45] . This regime is much sensitive to external perturbations and thus plays a significant role in the signal propagation which will be further discussed below. We numerically integrate the system by the explicit EulerMaruyama algorithm [21].
3 Coherence Resonance Coherence Resonance (CR) is a noise-induced effect and describes the occurrence and optimization of periodic oscillatory behavior due to noise perturbations [1]. In
Comparisons of Chemical Synapses and Gap Junctions in the Stochastic Dynamics
257
this section, we study the effect of chemical synapses on the coherence resonance of coupled neurons, where Bi = 0, i = 1, 2,3 . As is discussed in [16], for large enough coupling strength g syn , time traces of electrical coupled neurons are basically identical, while in the chemical coupling case, there exists a slight delay between spikes and the subthreshold oscillations are different form each other (see Fig.1). Therefore, we only examine the coherence of 2 nd neuron instead of the mean field. (b) Electrical Coupling
2
(a) Chemical Coupling
2
1.5 1.5
1
V1&V2&V3
V1&V2&V3
1 0.5 0
0.5 0 -0.5
-0.5
-1 -1
-1.5 -1.5
-2 -2 0
20
40
60
80
100
0
10
20
30
40
50
60
70
80
90
t
t
Fig. 1. Time series of Vi ( i = 1, 2,3. ). (a) b = 0.45 , g syn = 0.27 , A = 0.11 ; (b) b = 0.45 , g syn = 0.1 , A = 0.19 .
We take S i = Tki
Tmean = Tki
t
t
Var (Tki ) , i = 2
and the average interspike interval
as the coherence factor of the firing events, where Tki is the pulse
internal: Tki = τ ki +1 − τ ki , τ ki is the time of the k th firing of the i th cell. ⋅ t denotes average over time. S describes the timing precision of information processing in neural systems. We study CR for these two kinds of couplings when the neurons locate near the bifurcation point b = 0.45 , where all the cells are in subthreshold regime in the absence of noise. In order to investigate the influence of coupling strength we calculate the maxim of S ( Sm ) at the corresponding optimal noise intensity for different values of g syn in two coupling cases respectively (see Fig.2 (a) (b)), where g syn = 0.15 in (a) and g syn = 0.1 in (b) are the smallest value for neurons to fire synchronously. It is obvious that both too weak and too strong coupling can decrease CR in each case. Therefore, we choose the optimal coupling strength g syn = 0.27 for chemical coupling and g syn = 0.1 for electrical coupling. Fig.2 (c) (d) shows the coherence resonance for two coupling cases. In both cases, Tmean decays quickly and tends to the period of normal spiking with the increase of noise intensity. While chemical coupling exhibits a significant increase of CR and need smaller noise intensity to achieve the optimal periodic oscillatory behavior than electrical coupling. Similar as discussed in [16], the interpretation of this phenomenon is that chemical synapses only act while the presynaptic neuron is spiking, whereas electrical coupling connect the voltage of neurons at all times. This can be observed in Fig.1 which shows the optimal case for each coupling. Chemical coupling ensures that small oscillatory neurons are free from each other and give more opportunities for
258
J. Wang, X. Li, and D. Feng
the individuals to fire induced by noise, compared with the electrical coupling case. And once one spikes, it will stir the others to spike synchronously. While for electrical coupling, the strong synchronizations between subthreshold oscillatory neurons result in the decrease of the oscillatory amplitude and thus the increase of the threshold for firing. And from this phenomenon we can learn that subthreshold oscillations are very important for the firing of large spikes. As b increases, where neurons locate far from the canard regime, CR declined in both of the two coupling cases (see Fig.2 (d)). (a) Chemical Coupling
(b) Electrical Coupling
5
6.2 4.8
6 4.6
S
5.6
m
m
S
5.8 4.4
4.2
5.4 4
5.2 3.8
5 0.15
0.2
0.25
0.3
g0.35 syn
0.4
0.45
0.5
3.6 0.1
0.55
0.15
0.2
0.25
(c)
0.35
0.4
0.45
0.5
CC EC
m
6
20
S
m ean
T
CC EC
30
10 0 0
0.3
gsyn
(d)
40
5 4
0.1
0.2
A
0.3
0.4
0.5
0.44
0.46
b
0.48
0.5
0.52
8 CC EC
0.2
2 0 0
0.1
0.2
0.3
A
0.4
0.5
m
0.15
4
A
S
6
CC EC
0.1 0.05
0.44
0.46
b
0.48
0.5
0.52
Fig. 2. (a)(b) The maxim of S for different g syn in two kinds of coupled neurons respectively, where b = 0.45 ; (c) CR factor S and Tmean versus the noise intensity A for the coupled system in different cases, where b = 0.45 , CC: g syn = 0.27 , EC: g syn = 0.1 ; (d) The maxim of S and the corresponding noise intensity Am for different parameter b in two kinds of coupled neurons respectively, where CC: g syn = 0.27 , EC: g syn = 0.1
4 Stochastic Resonance As mentioned above, Stochastic Resonance (SR) describes the optimal synchronization of the neuron output with the weak external input signal due to intermediate noise intensity. In this section, we take the parameters of input periodic signal Bi = 0.05 and ω = 0.3 so that there are no spiking for all the neurons in the absence of noise. In order to investigate the information transfer in these coupled neurons, we consider the local stimulus, that is, only one element is subjected to external periodic signal. So except for particular statement, Bi is taken as: B1 =0.05, B2 = 0, B3 = 0 . As is shown in Fig.3, there exists an optimal response of the neurons to input signal with intermediate intensity of noise. To evaluate the response of output frequency to the input frequency, we calculate the Fourier coefficient Q for the input signal. The definition of Q [1] is
Comparisons of Chemical Synapses and Gap Junctions in the Stochastic Dynamics
2
(a) Chemical Coupling (CC)
2
1
1
V1&V2&V3
V1&V2&V3
(b) Electrical Coupling (EC)
1.5
1.5
0.5 0 -0.5 -1
0.5 0 -0.5 -1 -1.5
-1.5 -2 0
259
-2 20
40
60
80
100
-2.5 0
20
t
40
t
60
80
100
Fig. 3. Time series of Vi ( i = 1, 2,3. ) and the input signal with a ten times higher amplitude than in the model (black line) B1 =0.05, B2,3 = 0, (a) b = 0.45 , g syn = 0.15 , A = 0.015 ; (b)
b = 0.45 , g syn = 0.12 , A = 0.045
ω 2π n / ω 2Vi (t )sin(ωt )dt , 2π n ∫0 ω 2π n / ω = 2Vi (t ) cos(ω t )dt 2π n ∫0
Qsin = Qcos
(6)
Q = Qsin 2 + Qcos 2 . Where n is the number of periods 2π / ω covered by the integration time. Similar as in Sec. we only examine the response of 2nd neuron to external input instead of the mean field, that is Vi = V2 in Eq.(6). And in neuron systems, for information is carried through the large spikes but not the subthreshold oscillations, we are only interested in the frequency of spikes. So following [3], we set the threshold Vs = 0 in the calculation of Q. If V < Vs , we replace V by the value of the fix point V f ; if V > Vs , we use the original value of V . We consider the differences between these two kinds of couplings for the signal processing we when the neurons locate near the bifurcation point b = 0.45 . Following Sec. choose the optimal coupling strength g syn = 0.15 for chemical coupling and
Ⅲ
Ⅲ
g syn = 0.12 for electrical coupling (Fig.4 (a) (b)). For local stimulus B1 =0.05, B2 = 0, B3 = 0 , chemical coupling is much efficient than electrical coupling for signal processing (Fig.4 (c) ~ (e)). As is discussed above, chemical coupled neurons can make better explorations of the internal sensitive dynamics and need smaller noise intensity to fire and complete signal processing than electrical coupled ones. And the subthreshold oscillations are still very important in this case. As b increases the ability of information transfer declined greatly in both of the two coupling cases (Fig.4 (d) (e)).
260
J. Wang, X. Li, and D. Feng
(a) CC
(b) EC
0.12
0.15
0.115
0.14 0.11
0.13
Qm
0.105
Qm
0.12
0.1
0.11
0.095
0.1
0.09
0.09 0.08 0.15
0.2
0.25
0.3
0.35 gsyn
0.4
0.45
0.5
0.085 0.1
0.55
0.15
0.2
0.3 gsyn
0.25
0.35
0.4
0.45
0.5
(c)
0.35
CC B2,3=0 EC B2,3=0 EC B2,3=0.05 CC B2,3=0.05
0.3 0.25
Q
0.2 0.15 0.1 0.05 0 0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
A
(d)
0.09 CC EC
0.18
0.08 0.07
0.14
0.06
Qm
Am
0.16
0.12
(e) CC EC
0.05 0.04
0.1
0.03 0.08
0.02 0.06 0.04
0.01 0.44
0.46
b
0.48
0.5
0.52
0.44
0.46
0.48
b
0.5
0.52
Fig. 4. (a)(b) The maxim of Q for different g syn in two kinds of coupled neurons respectively, where b = 0.45 ; (c) Signal processing at the input signal versus the noise intensity for the coupled system in different cases, where b = 0.45 , B1 = 0.05 , CC: g syn = 0.15 , EC: g syn = 0.12 ; (d) (e) The maxim of Q and the corresponding noise intensity Am for different parameter b in two kinds of coupled neurons respectively, where CC: g syn = 0.15 , EC: g syn = 0.12
Besides, we also investigate the global stimulus B1,2,3 =0.05 , where each neuron is forced by the forcing signal. Here the chemical coupled system is not as efficient as the electrical coupled one for the response to input signal (Fig.4 (c)). In this case,
Comparisons of Chemical Synapses and Gap Junctions in the Stochastic Dynamics
261
neurons are more active and can fire easily induced by the external signal and noise. The continuous connection in electrical coupled neurons lead to high synchronization and can make better control of the firing rate than the selective connection in chemical coupled neurons see (Fig.5). However, this case is not common in real systems, where input signals are always weak and added to only a small amount of neurons for the sake of low energy consumption. (b) Electrical Coupling (EC)
(a) Chemical Coupling (CC) 1.5 1.5
1
V1&V2&V3
V1&V2&V3
1 0.5 0 -0.5 -1
0.5 0 -0.5 -1 -1.5
-1.5
-2
-2 380
400
420
440
t
460
480
500
520
-2.5 380
400
420
440
t
460
480
500
520
Fig. 5. Time series of Vi ( i = 1, 2,3. ) and the input signal with a ten times higher amplitude than in the model (black line). (a) b = 0.45 , g syn = 0.15 , A = 0.025 , B1,2,3 = 0.05 ; (b) b = 0.45 , g syn = 0.12 , A = 0.045 , B1,2,3 = 0.05 .
5 Conclusions In this paper, we have made comparisons of coherence resonance and the response to weak external signal between chemical coupled and electrical coupled noisy neurons. Chemical coupled neurons are prone to stir spikes due to its selective coupling, while electrical coupling is beneficial for synchronization. Therefore, as subjected to noisy environment and weak forcing signal, chemical coupling is more flexible and can increase the mutual excitations between cells, which enhance the coherence resonance and weak signal propagation. Also, the canard regime where system dynamics are complex and sensitive to external perturbations plays significant roles for the information transmission in neural systems. Besides, it should be noted that canard dynamics, which had been detailedly discussed in [19;22], is critical for signal processing. The number of subthreshold oscillations between two closest large spikes has close relationships to the firing rate, which carries the information during signal propagation. We will further this study and extend it to larger size of networks with different topological connections. Acknowledgements. The authors gratefully thank the valuable discussions with Wuhua Hu. And this paper is supported by the NSFC (No.50537030).
262
J. Wang, X. Li, and D. Feng
References 1. Ullner, E.: Noise-induced Phenomena of Signal Transmission in Excitable Neural Models. DISSERTATION (2004) 2. Perc, M., Marhl, M.: Amplification of information transfer in excitable systems that reside in a steady state near a bifurcation point to complex oscillatory behavior. Physical Review E 71(2), 26229 (2005) 3. Volkov, E.I., Ullner, E., Zaikin, A.A., Kurths, J.: Oscillatory amplification of stochastic resonance in excitable systems. Physical Review E 68(2), 26214 (2003) 4. Zhao, G., Hou, Z., Xin, H.: Frequency-selective response of FitzHugh-Nagumo neuron networks via changing random edges. Chaos: An Interdisciplinary Journal of Nonlinear Science 16, 043107 (2006) 5. Zaks, M.A., Sailer, X., Schimansky-Geier, L., Neiman, A.B.: Noise induced complexity: From subthreshold oscillations to spiking in coupled excitable systems. Chaos: An Interdisciplinary Journal of Nonlinear Science 15, 026117 (2005) 6. Makarov, V.A., Nekorkin, V.I., Velarde, M.G.: Spiking Behavior in a Noise-Driven System Combining Oscillatory and Excitatory Properties. Physical Review Letters 86(15), 3431–3434 (2001) 7. Shishkin, A., Postnov, D.: Stochastic dynamics of FitzHugh-Nagumo model near the canard explosion, Physics and Control, 2003. In: Proceedings. 2003 International Conference, vol. 2 (2003) 8. Wellens, T., Shatokhin, V., Buchleitner, A.: Stochastic resonance. Reports on Progress in Physics 67(1), 45–105 (2004) 9. Ullner, E., Zaikin, A., García-Ojalvo, J., Báscones, R., Kurths, J.: Vibrational resonance and vibrational propagation in excitable systems. Physics Letters A 312(5-6), 348–354 (2003) 10. Volkov, E.I., Ullner, E., Zaikin, A.A., Kurths, J.: Frequency-dependent stochastic resonance in inhibitory coupled excitable systems. Physical Review E 68(6), 61112 (2003) 11. Ullner, E., Zaikin, A., García-Ojalvo, J., Kurths, J.: Noise-Induced Excitability in Oscillatory Media. Physical Review Letters 91(18), 180601 (2003) 12. Zhou, C., Kurths, J., Hu, B.: Array-Enhanced Coherence Resonance: Nontrivial Effects of Heterogeneity and Spatial Independence of Noise. Physical Review Letters 87(9), 98101 (2001) 13. Gong, P.L., Xu, J.X.: Global dynamics and stochastic resonance of the forced FitzHughNagumo neuron model. Physical Review E 63(3), 31906 (2001) 14. Toral, R., Mirasso, C.R., Gunton, J.D.: System size coherence resonance in coupled FitzHugh-Nagumo models. Europhysics Letters 61(2), 162–167 (2003) 15. Casado, J.M., Baltanás, J.P.: Phase switching in a system of two noisy Hodgkin-Huxley neurons coupled by a diffusive interaction. Physical Review E 68(6), 61917 (2003) 16. Balenzuela, P., Garcia-Ojalvo, J.: On the role of chemical synapses in coupled neurons with noise. Arxiv preprint q-bio. NC/0502025 (2005) 17. Wang, J., Li, X., Hu, W.: Canards and Bifurcations in the Chemical Synaptic Coupled FHN Neurons (2006) 18. Drover, J., Rubin, J., Su, J., Ermentrout, B.: Analysis of a canard mechanism by which excitatory synaptic coupling can synchronize neurons at low firing frequencies. SIAM J. Appl. Math. 65, 69–92 (2004) 19. Wechselberger, M.: Existence and bifurcation of canards in R3 in the case of a folded node. SIAM J. Applied Dynamical Systems 4, 101–139 (2005)
Comparisons of Chemical Synapses and Gap Junctions in the Stochastic Dynamics
263
20. Cronin, J.: Mathematical Aspects of Hodgkin-Huxley Neural Theory. Cambridge University Press, Cambridge (1987) 21. Higham, D.J.: An algorithmic introduction to numerical simulation of stochastic differential equations. SIAM REVIEW 43, 525–546 (2001) 22. Szmolyan, P., Wechselberger, M.: Canards in R3. Journal of Differential Equations 177(2), 419–453 (2001)
Distinguish Different Acupuncture Manipulations by Using Idea of ISI Jiang Wang1, Wenjie Si1, Limei Zhong2, and Feng Dong1 1
School of Electrical engineering and automation, Tianjin University, 300072, Tianjin, P.R. China
[email protected] 2 School of Information engineering, Northeast Dianli University, 132012, Jilin, P.R. China
Abstract. As well-known, the science of acupuncture and moxibustion is an important component of Traditional Chinese Medicine with a long history. Although there are a number of different acupuncture manipulations, the method for distinguishing them is rarely investigated. With the idea of the interspike interval (ISI), we study the electrical signal time series at the spinal dorsal horn produced by three different acupuncture manipulations in Zusanli point and present an effective way to distinguish them. Comparing with the traditional analysis methods, like phase space reconstruction and largest Lyapunov exponents, this new method is more efficiently and effective.
1 Introduction The neural systems have strong nonlinear characters and will display different dynamics due to different system parameters or external inputs. Usually the dynamics of these systems experience little change when the parameters are slightly modified, but in vicinity of a critical point, the situation will be totally different. The systems would be driven from chaotic pattern to periodic pattern, one periodic pattern to another periodic pattern or from periodic pattern to chaotic pattern [1] [2]. Although there is an examination of intracellular membrane potential, most of the study is aimed at an easily obtained physiological measure, in particular ISI, to facilitate comparison with experimental data. ISIs play an important role in encoding the neuronal information which is conveyed along nerve fibres in the form of series of propagating action potentials. Continuing researches focus on the ISI sequence [3-12]. According to their work, an ISI can be seen as a state variable by which the temporal dynamics of the neuron can be characterized. In analogy with the Takens theorem [13] for discrete- or continuous-time dynamical systems, later generalized in [14], it should then be possible to reconstruct the essential features of the attractor dynamics of a neuron from measurements of only one variable, using e.g. delay embeddings on ISI sequences. Acupuncture is an important part of Chinese medicine theory and it is approved to be highly effective in treatment of more than 300 diseases [15]. Since the middle period of the 20th century, the applications of acupuncture have advanced in abirritation [16], quitting drug [17] and so on. Acupuncture at the Zusanli point is not only K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 264–273, 2007. © Springer-Verlag Berlin Heidelberg 2007
Distinguish Different Acupuncture Manipulations by Using Idea of ISI
265
utilized to treat diseases of common digestive system such as duodenum ulcer, acute gastritis and gastroptosis etc, but also has auxiliary efficiency on enteritis, dysentery, constipation, hepatitis, gallstone, kidney stone, diabetes and hypertension [18]. When acupuncture is applied to the Zusanli point, electrical activity can be recorded from the spinal dorsal horn. Different kinds of acupuncture manipulations can evoke various electrical time series and achieve different curative effect. Our paper is organized as follows. Section 2 provides the transmission path of the acupuncture signals background while section3 gives the time series evoked by different acupuncture manipulations. Section 5 introduces the ISI method for distinguishing the three different acupuncture manipulations to compare with the methods mentioned in section 4. The last part of this work is conclusion.
2 Transmission Path of the Acupuncture Signals According to the previous studies, the acupuncture signals follow a certain route from the acupuncture point to the spinal dorsal horn [19]. The corresponding transmission path for acupuncture signals at the Zusanli point is shown in Fig.1. Then the electrical signal time series at the spinal dorsal horn can be recorded.
Fig. 1. The transmission path of the acupuncture signals at the Zusanli point
3 The Time Series Evoked by Acupuncture There are twelve alternative manipulations used in acupuncturing at the Zusanli point. This paper selects three of them; they are the twist manipulation, the drag-plug manipulation and the gradual manipulation [20]. The time series at the spinal dorsal horn evoked by these three methods are shown in Fig.2.
266
J. Wang et al.
(a) twist manipulation
(b) gradual manipulation
(c) drag-plug manipulation Fig. 2. The time series evoked by three acupuncture manipulations
Distinguish Different Acupuncture Manipulations by Using Idea of ISI
267
4 Time Series Analysis 4.1 Phase Space Reconstruction Theory The phase space is reconstructed according to the delay coordinate method proposed by Takens [21] and Packard [22]. Here, define a discrete time array obtained by measurement or simulation, reconstruct the m-dimension state vector Xn by the delay coordinate method: (1)
.
where T is called time delay and m is embedding dimension. T and m are two important parameters in the phase space reconstruction. Values of T, m are obtained by the mutual information method [23] and Cao’s method [24], respectively. 4.2 Largest Lyapunov Exponents Based on the reconstructed phase space, we analyze the spatio-temporal behavior of the time series. The Lyapunov exponent is an important parameter for describing the non-linear system behavior. It states the rate of exponential divergence from initial perturbed conditions. Consider a one-dimensional map .
(2)
Assume the difference of the initial value x0 is δx 0 . The n-time iteration value is (3) . is Lyapunov exponent. The magnitude of the Lyapunov exponent is a measwhere urement of the sensitivity to initial conditions. The system is chaotic and unstable when the Lyapunov exponent is positive. For an n-dimensional map, the largest Lyapunov exponent (LLE) is preferred to estimate whether the system is chaotic or not. This paper adopts the method introduced by Wolf [25] to calculate the LLE of the time series at the spinal dorsal horn.html. 4.3 Experimental Data Processing We select 80,000 data points within 20s and reconstruct the phase space for these experimental data [26-29]. As to the data of the twist method shown in Fig.3, the time delay T=3 and the embedded dimension m=3, while T=3, m=3 and T=4, m=3 are chosen for the drag-plug method shown in Fig.4 and the gradual method shown in Fig.5, respectively.
268
J. Wang et al.
Fig. 3. The embedding parameters of the twist method
Fig. 4. The embedding parameters of the drag-plug method
The attractors of the reconstructed phase space are shown in Fig.6. They are all strange attractors for all the three methods even with different shapes according to the figures. So we confirm these signals are chaotic preliminarily. In addition, the LLEs according to the Wolf’s algorithm are calculated to quantitatively describe the time series. The calculation results of the LLEs are shown in Fig.7. The LLEs of the three methods are 1.7333 , 1.7676 , 1.7635 , respectively. Obviously, the difference among these LLEs is too small to help us to distinguish them clearly. Based on the attractors of the reconstructed phase space, as the situation of the former one, we can not differentiate which is which.
±0.0018
±0.0015
±0.0010
Distinguish Different Acupuncture Manipulations by Using Idea of ISI
Fig. 5. The embedding parameters of the gradual method
(a) twist method
(b) drag-plug method
(c) gradual method Fig. 6. The reconstructed attractors of the three methods
269
270
J. Wang et al.
(a) twist method
(b) drag-plug method
(c) gradual method Fig. 7. The LLEs of the three methods
5 Time Series Analysis Using the Idea of ISI As Seen from Fig.2, we could find that the amplitude in the time series almost ranges from -20mV to 20mV except for some points. In this work, we consider the point whose amplitude is larger than 30mV or smaller than -30mV as a quasi-spike point. All of the successive quasi-spike points with the same sign together make a quasispike, and it is apparent that a quasi-spike has one quasi-spike point at least. Then the authors take the analysis method of ISIs to study the three time series in Fig.2 to distinguish them. For the sake of convenience, we classify the quasi-spikes into two kinds: one with positive quasi-spike points as Pquasi-spike and the other with negative ones as Nquasi-spike. Using Kn denoting the number of the nth quasi-spike, the interquasi-spike intervals (quasi-ISIs) are given by
Distinguish Different Acupuncture Manipulations by Using Idea of ISI
271
(4)
.
For convenience, Pquasi-ISI is taken to denote the quasi-ISI of Pquasi-spikes while Nquasi-ISI for that of Nquasi-spikes. The graphs of quasi-ISIs are shown in Fig.3. Because the interval of two adjacent points in Fig.2 is a constant, the quasi-ISIs can also be measured by the time.
(a) Pquasi-ISI according to n
(b) Nquasi-ISI according to n
Fig. 8. quasi-ISI according to n
From Fig.8, we could easily distinguish the twist manipulation and the drag-plug manipulation from the third one, but can’t tell the differences between the former two correctly. So the means of Pquasi-ISIs and Nquasi-ISIs are evaluated respectively and the results are shown in Table 1. Table 1. Means of Pquasi-ISIs and Nquasi-ISIs Twist Mean of Pquasi-ISIs 2091.5 Mean of Nquasi-ISIs 2091.5
Drag-plug 2019.3 2503.4
Gradual 737.58 737.58
From Table 1, it is easy to differentiate the former two methods, because the means of Pquasi-ISI and Nquasi-ISI are different in the drag-plug method but the situation is reverse in the twist manipulation and the gradual manipulation. Thus this analysis method is an effective way to distinguish the different acupuncture manipulations.
6 Conclusions In this work, the authors developed a new method basing the idea of ISI to differentiate the different acupuncture manipulations. Compared with the traditional methods like the phase space reconstruction and the largest lyapunov exponent method, this
272
J. Wang et al.
method can distinguish the signals at the spinal dorsal horn evoked by the three acupuncture manipulations clearly. Through using it, the different manipulations produced by different doctors can be quantized, for example, a specialist finishes acupuncture for a sufferer, and the mean of Pquasi-ISIs and Nquasi-ISIs is measured as 2030 and 2400, respectively, then we can conclude that this specialist has used the twist and drag-plug manipulations and used the drag-plug manipulation more frequently. Ulteriorly, these data can be used to instruct doctors to acupuncture. Some common grounds have been discovered between the entrainment patterns under the external voltage situation and the signal time series produced by the acupuncture manipulations, even thought their models are different. For example, spikes or quasi-spikes can be found in both of them. There may be some relation between them, so our future work will focus on the problem of whether the acupuncture manipulations can be quantized into the external electric stimulus. And that will definitely lead the acupuncture theory to a more advanced level. Acknowledgments. The authors gratefully acknowledge the support of the NSFC (No.50537030).
References 1. Jianxue, X., Yunfan, G., Wei, R., Sanjue, H., Fuzhou, W.: Propagation of periodic and chaotic action potential trains along nerve fibers. Physica D: Nonlinear Phenomena 100(12), 212–224 (1997) 2. Eugene, M.: Izhikevich, Resonate-and-fire neurons. Neural Networks 14(6-7), 883–894 (2001) 3. Masuda, N., Aihara, K.: Filtered interspike interval encoding by class. Neurons Physics Letters A 311, 485–490 (2003) 4. Gedeon, T., Holzer, M., Pernarowski, M.: Attractor reconstruction from interspike intervals is incomplete. Physica D: Nonlinear Phenomena 178(3-4), 149–172 (2003) 5. Racicot, D.M., Longtin, A.: Interspike interval attractors from chaotically driven neuron models. Physica D: Nonlinear Phenomena 104(2), 184–204 (1997) 6. Jin, W.-y., Xu, J.-x., Wu, Y., Hong, L., Wei, Y.-b.: Crisis of interspike intervals in Hodgkin-Huxley model. Chaos, Solitons & Fractals 27(4), 952–958 (2006) 7. Tuckwell Henry, C.: Spike trains in a stochastic Hodgkin-Huxley system. Biosystems 80(1), 25–36 (2005) 8. Horikawa, Y.: A spike train with a step change in the interspike intervals in the FitzHughNagumo model. Physica D: Nonlinear Phenomena 82(4), 365–370 (1995) 9. Rasouli, G., Rasouli, M., Lenz, F.A., Verhagen, L., Borrett, D.S., Kwan, H.C.: Fractal characteristics of human parkinsonian neuronal spike trains. Neuroscience 139(3), 1153– 1158 (2006) 10. Canavier, C.C., Perla, S.R., Shepard, P.D.: Scaling of prediction error does not confirm chaotic dynamics underlying irregular firing using interspike intervals from midbrain dopamine neurons. Neuroscience 129(2), 491–502 (2004) 11. Gu, H., Ren, W., Lu, Q., Wu, S., Yang, M., Chen, W.: Integer multiple spiking in neuronal pacemakers without external periodic stimulation. Physics Letters A 285(1-2), 63–68 (2001) 12. Yang, Z., Lu, Q., Gu, H., Ren, W.: Integer multiple spiking in the stochastic Chay model and its dynamical generation mechanism. Physics Letters A 299(5-6), 499–506 (2002)
Distinguish Different Acupuncture Manipulations by Using Idea of ISI
273
13. Takens, F.: Detecting strange attractors in turbulence. Lecture Notes in Mathematicsb, vol. 898, pp. 336–381 (1981) 14. Sauer, T., Yorke, J.A., Casdagli, M.: Embedology. Journal of Statistical Physics 65, 579– 616 (1991) 15. Xuemin, S.: Acupuncture. China Press of Traditional Chinese Medicine, Beijing (2004) 16. Ke, Q., Wang, Y., Zhao, Y.: Acupuncture abirritation and its mechanism. Sichuan Journal of Anatomy 10(4), 224–230 (2003) 17. Yin, L., Jun, H., Qizhong, M.: Advance in Research on Abstinence from Narcotin Drugs by Acupuncture. Shanghai J. Acu-mox. 18(3), 43–45 (1999) 18. Zhang, J., Jin, Z., Lu, B., Chen, S., Cai, H., Jing, X.: Responses of Spinal Dorsal-horn Neurons to Gastric Distention and Electroacupuncture of ”Zusanli” Point. Acupuncture Research 26(4), 268–273 (2001) 19. Wan, Y.-H., Jian, Z., Wen, Z.-H., Wang, Y.-Y., Han, S., Duan, Y.-B., Xing, J.-L., Zhu, J.L., Hu, S.-J.: Synaptic transmission of chaotic spike trains between primary afferent fiber and spinal dorsal horn neuron in the rat. Neuroscience 125(4), 1051–1060 (2004) 20. Cheng, S.: Chinese Acupuncture 1998. People’s Medical Publishing House 21. Takens, F.: Detecting Strange Attractors in Turbulence. Lecture Notes in Mathematics, vol. 898, pp. 366–381 (1981) 22. Packard, N.H., Crutchfield, J.P., Farmer, J.D., Shaw, R.S.: Geometry from a Time Series. Phys. Rev. Lett. 45, 712–716 (1980) 23. Fraser, A.M., Swinney, H.L.: Independent coordinates for strange attractors from mutual information. Phys. Rev. A 33, 1134–1140 (1986) 24. Liangyue, C.: Practical method for determining the minimum embedding dimension of a scalar time series. Physica 110D, 43–50 (1997) 25. Wolf, A., Swift, J.B., Swinney, H.L., Vastano, J.A.: Determining Lyapunov exponents from a time series. Physica 16D, 285–317 (1985) 26. Yong, X., Jian-Xue, X.: Phase-space reconstruction of ECoG time sequences and extraction of nonlinear characteristic quantities. Acta Phys Sinica 51(2), 205–214 (2002) 27. Wang, Z.S., Zhenya, H., Chen, J.D.Z.: Chaotic behavior of gastric migrating myoelectrical complex. IEEE Trans. on Biome. Eng. 51(8), 1401–1406 (2004) 28. Matjaz, P.: Nonlinear time series analysis of the human electrocardiogram. Eur. J. Phys. 26, 757–768 (2005) 29. Small, M., Yu, D.J., Simonotto, J., Harrison, R.G., et al.: Uncovering non-linear structure in human ECG recordings. Chaos, Solitons & Fractals 13(8), 1755–1762 (2002)
The Study on Internet-Based Face Recognition System Using PCA and MMD Jong-Min Kim Computer Science and Statistic Graduate School, Chosun University, Korea
[email protected]
Abstract. The purpose of this study was to propose the real time face recognition system using multiple image sequences for network users. The algorithm used in this study aimed to optimize the overall time required for recognition process by reducing transmission delay and image processing by image compression and minification. At the same time, this study proposed a method that can improve recognition performance of the system by exploring the correlation between image compression and size and recognition capability of the face recognition system. The performance of the system and algorithm proposed in this study were evaluated through testing.
1 Introduction The rapidly growing information technology has fueled the development in multimedia technique. However, demand for techniques involving searching multimedia data in alarge scale database efficiently and promptly is still high. Among physical characteristics, face image is used as one of the reliable means of identifying individuals. Face recognition system has a wide range of applications such as face-based access control system, security system and system automation based on computer vision. Face recognition system can be applied to a large number of databases but requires a large amount of calculations. There are three different methods used for face recognition: template matching approach, statistical classification approach and neural network approach[1].Elastic template matching, LDA and PCA based on statistical classification approach are widely used for face recognition[2, 3]. Among these methods, statistical classification-based methods that require a small amount of calculations are most commonly used for face recognition. The PCA-based face recognition method identifies feature vectors using a Kahunen-Loeve transform. Given the proven feasibility of PCA as face recognition method, this study used PCA along with Kenelbased PCA[4, 5] and 2D-PCA[6]. The real-time face recognition system proposed in this study will be available in a network environment such as Network. Each client is able to detect face images and forward detected images to remote server by compressing the images to reduce file size. However, the compression of facial images poses a critical risk because of the possibility of undermining image quality. This study investigated the effects of image K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 274–283, 2007. © Springer-Verlag Berlin Heidelberg 2007
The Study on Internet-Based Face Recognition System Using PCA and MMD
275
compression and image size on recognition accuracy of the face recognition systems based on PCA, KPCA, 2D_PCA algorithms and came up with the most effective realtime face recognition system that can be accessed across the Network.
2 Network-Based Face Recognition System Based on the assumption that multiple variations of the face improves recognition accuracy of face recognition system, multiple image sequences were used. To reduce transmission delay, the images were compressed and minimized in the proposed system Fig. 1.
Fig. 1. Composition of the Proposed Face Recognition System
3 Face Recognition Algorithms The real-time recognition accuracy was evaluated using PCA, KPCA and 2DPCAbased algorithms. 3.1 PCA(Principal Component Analysis) The PCA-based face recognition algorithm calculates basis vectors of covariance matrix ( C ) of images in the following equation.
C=
1 M
M
∑ (X i =1
i
− m)( X i − m)T
(1)
Where xi represents 1D vector converted from the i th image in a sequence of images in the size of m × n . m indicates average of total M images of training face.
276
J.-M. Kim
Maximum number of eigenvectors ( m × n ) of covariance matrix ( C ) of images are also calculated. Top K number of eigenvectors are selected according to descending eigenvalues and defined as basisvector ( U )[7]. Feature vectors( w ) of input image( x ) are distributed as basis vectors in the vector space according to the following equation (2):
w = U T ( x − m)
(2)
3.2 2DPCA While covariance matrix is computed from 1D images converted from input images for PCA, covariance matrix ( G ) is computed from 2D images and the average image for 2DPCA in the following equation (3) [6].
C=
1 M
M
∑ ( A − E ( A))( A − E ( A)) i =1
i
i
T
(3)
Eigenvalues/eigenvectors of covariance matrix of images are calculated. Top k number of eigenvectors according to descending values are defined as basis vectors ( U ). Feature vector ( wi ) of the i th image in the image sequence of face ( A ) are extracted in the equation (4). Characteristics of the face B = [ wi ,......, wk ] can be extracted from wi .
wi = Aui ,
i = 1,2,..., K
(4)
Compared with covariance matrix used for PCA analysis, the covariance matrix derived from input images for 2DPCA analysis is smaller. This means that 2DPCA has the advantage of requiring less learning time [6]. 3.3 KPCA(Kernel Principal Component Analysis) KPCA face recognition algorithm involves converting input data on a face image into an image using nonlinear functions Φ . The converted images are reproduced as eigenvectors of the covariance matrix calculated for a set of nonlinear functions Φ and coefficients obtained during this process are used for face recognition in KPCA analysis. For PCA, the covariance matrix can be efficiently computed by using kernel internal functions as the elements of the matrix [8,9]. In the equation (5), nonlinear function Φ (x) is substituted for input image x , and F was substituted for the feature space R N .
Φ : R N → F , xk → Φ ( xk )
(5)
The training matrix and covariance matrix of images in the nonlinear space are pre~ sented in the equations (6) and (7). The nonlinear function Φ in the equation (6) must be mean zero by meeting the requirements for normalization.
The Study on Internet-Based Face Recognition System Using PCA and MMD
C
Φ
=
1 l l
∑
l
∑
k =1
k =1
~ ~ Φ ( x k )Φ ( x k ) T
277
(6)
~ Φ (xk ) = 0
(7)
4 Face Recognition Rate 4.1 Changes in Recognition Rates with Image Compression Image compression is essential to shorten transmission time through the Network. However, compression approach also has a downside as it may hinder image quality. As presented in Fig. 2, data file size was reduced but subjective image quality deteriorated as quantization parameters of compressed images increased. As a result, the recognition performance of the system is expected to decline. It is however found that recognition rate was not varied by the value of quantization parameters at the original
(a)Original image
(b)QP=10
(c)QP=20
(d)QP=30
Fig. 2. Changes in visual image with QP value
(a) Changes in data size with QP value (b) Changes in recognition rate with QP value Fig. 3. Effects of image compression on data size and recognition performance
278
J.-M. Kim
(a)QP value of 5
(b) QP value of 15
(c) QP value of 30
Fig. 4. Changes to the distance between the original image and compressed image with QP value. (Triangle in an image with the minimum distance, rectangular indicates the closest image to other class, green represents the original image and blue represents an compressed image.)
image size of 92*122 pixels, as shown in Fig. 4 (b). Such a phenomenon was also confirmed by the distance between the original image and compressed image. Changes to the distance between the original image and compressed image with the value of quantization parameters are presented in Fig. 4. There was a positive correlation between the distance and QP value. In other words, the distance between the original image and compressed image ( d 3 ) increased to 59, 305 and 689 as the value of QP reached 5, 15 and 30, respectively. However, changes to the distance is meager, so the effects of compression can be eliminated, meaning almost no changes in recognition performance but a significant reduction in data file size to be forwarded. In conclusion, transmission time can be reduced without affecting recognition performance of the system. 4.2 Changes in Recognition Rates with Image Size The size of image has an impact on transmission time and computational complexity during the recognition process. Images were passed through a filtering stage to get the low-low band using wavelet transform. Image size ( S R ) is defined in the following equation:
SR =
S origin 4( R )
(8)
For instance, the original image is reduced to 25% of its actual size when R equals to 1. Effects of image filtering are presented in Fig. 5. Effects of image size on time required for learning and recognizing images and recognition performance are presented in Fig 6. As shown in Fig 6 (a) and (b), the time required for learning and recognizing images drastically fell as the size of the image was reduced. The recognition rate also dropped when R was less than 4 but stopped its decline and remained almost unchanged when R was 4 or above. In fact, it
The Study on Internet-Based Face Recognition System Using PCA and MMD
(a)original
(b)R=1
(c)R=2
279
(d)R=3 (e)R=4
Fig. 5. Effects of image filtering
(a) Learning time
(b) recognition time
(c) recognition rate
Fig. 6. Effects of image size on time required for learning and recognizing images and recognition rate
is difficult to recognize features of the image with eyes when the image size became smaller. This is due to the fact that image size reduction involves reducing the number of faces in original images and the size of coefficient vectors.
5 Majority-Making-Decision Rule The present study found that recognition rates remained almost unchanged in response to certain degrees of compression and minification of images. Based on these findings, the study proposes a face recognition algorithm capable of improving recognition performance of the system. The algorithm is designed to calculate a recognition rate based on the majority, composition and decision-make rules when multiple input images are used. A theoretical estimation of recognition rate ( Pm ) can be calculated in the following equation on the condition that more than half of transmitted images were matched with image models.
280
J.-M. Kim
Pm =
⎛n⎞ k ⎜⎜ ⎟⎟ p s (1 − p s ) n−k k = ⎣n / 2 ⎦ ⎝ k ⎠ n
∑
(9)
Where n is the number of images forwarded, Ps is the average probability of face
⎛n⎞
recognition ⎜ ⎟ is possible number when k number of features are recognized among ⎜k ⎟
⎝ ⎠
n number of features. ⎣x ⎦ is a fixed number that is smaller than x but the largest value. For instance, when Ps is 0.94 and three images are forwarded, the value of Pm is 0.99 based on the majority, composition and decision-making rules. The proposed algorithm was tested with 3, 5 and 7 images in PCA-, KPCA- and 2DPCA-based real-time face recognition systems. According to the equation (9), the saturation estimation was achieved when Ps was roughly larger than 0.7 and n equaled to 5. Five images were therefore used for the test of the proposed system. Test results are presented in Fig 7.
6 Experimental Results The composition of the proposed system is presented in Fig 1. For the test, ChosunDB (50 class, 12 images in the size of 60*1 20) and Yaile DB were used. The test was performed in a network environment, and the optimized value of quantization parameters was applied. Experimental results are presented in Table 1. The performance of real-time face recognition system is measured by the length of time required for learning and recognizing face images, total amount of data transmitted and recognition rate. The KPCA-based proposed system increased recognition rate by 14% and reduced the time required for recognizing images by 86%. The time required for learning images was reduced when smaller sizes of images were used. The 2DPCA-based proposed system showed the recognition rate of 96.4%, compared with 90.3% of the existing 2DPCA-based system. Besides, a 78% decrease was observed in learning time and a 24% decrease in recognition time in the same system. The amount of data transmitted was reduced to 3610 bytes from 19200 bytes, leading to a 81 reduction in transmission delay. Table 1. Comparison of performance between proposed and existing systems Algorithm PCA Proposed system(PCA) 2D PCA Proposed system(2D PCA ) KPCA Proposed system(KPCA)
Recognition Rate(%) 88.7 92.0 91.3 96.4 79.0 93.5
Training Time(min) 28 16 27 6 2.4(hour) 0.33(hour)
Recognition Time(sec) 1.5 1.0 0.5 0.38 36 5
The Study on Internet-Based Face Recognition System Using PCA and MMD
281
(a) Recognition Rate
(b) Training Time Fig. 7. Effects of image compression and minification on recognition rate and time required for the recognition process of five images
282
J.-M. Kim
(c) Recognition Time Fig. 7. (continued)
7 Conclusions This study proposed a real-time face recognition system that can be accessed across the network. The test of the proposed system demonstrated that image filtering and image compression algorithms reduced transmission delay and the time required for learning and recognizing images without hindering recognition accuracy of the system. This study used multiple input images in order to improve the recognition performance of the system, and the proposed real-time face recognition system proved robust on the network. Although the system was based on PCA algorithms, it can be integrated with other face recognition algorithms for real-time detection and recognition of face images.
References 1. Jain, A.K., Duin, R.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(1), 4–37 (2000) 2. Yambor, W.: Analysis of PCA based and Fisher Discriminant-based Image Recognition Algorithms. Technical Report CS-00-103, Computer Science Department Colorado State University (2000) 3. Murase, H., Shree K.N.: Visual Learning and Recogntion 3-Dobject from appearance. International journal of Computer Vision 14 (1995)
The Study on Internet-Based Face Recognition System Using PCA and MMD
283
4. Zhang, Y., Liu, C.: Face recognition using kernel principal component analysis and genetic algorithms. In: Proceedings of the, 12th IEEE Workshop on Neural Networks for Signal Processing, pp. 337–343 (2002) 5. Yang, J., Zhang, D., Frangi, A.F., Yang, J.Y.: Two-Dimensional PCA: A New Approach to Appearance-Based Face Representation and Recognition. IEEE Trans. Pattern Analysis and Machine Intelligence 26(1) (January 2004) 6. Bourel, F., Chibelushi, C.C., Low, A.A.:Robust facial expression recognition using a statebased model of spatially localised facial dynamics. In: Proceedings of Fifth IEEE International Conference on Automatic Face andGesture Recognition, pp.106–111 (2002) 7. Georghiades, A.S., Belhumeur, P.N., Kriegman, D.J.: From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Trans. Pattern Analysis and Machine Intelligence 23(6), 643–660 (2001) 8. Viola, P., Jones, M.: Rapid Object Detection using a Boosted Cascade of Simple Features. Computer Vision and Pattern Recognition 1, 511–518 (2001) 9. Yang, H.-S., Kim, J.-M., Park, S.-K.: Three Dimensional Gesture Recognition Using Modified Matching Algorithm. In: Wang, L., Chen, K., Ong, Y.S. (eds.) ICNC 2005. LNCS, vol. 3611, pp. 224–233. Springer, Heidelberg (2005) 10. Belhumeur, P.N., Hepanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. Pattern Analysis and Machine Intelligence 19(7), 711–720 (1997) 11. Georghiades, A.S., Belhumeur, P.N., Kriegman, D.J.: From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Trans. Pattern Analysis and Machine Intelligence 23(6), 643–660 (2001) 12. Kim, J.-M., Yang, H.-S.: A Study on Object Recognition Technology using PCA in Variable Illumination. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 911–918. Springer, Heidelberg (2006)
Simulation of Virtual Human's Mental State in Behavior Animation Zhen Liu Faculty of Information Science and Technology, Ningbo University, 315211, China
[email protected]
Abstract. Human mental state is related to outer stimuli and inner cognitive appraisal, and it mainly includes emotion, motivation, personality and social norm. Modeling mental state of virtual humans is very important in many fields. Simulating virtual humans with mental state is a challenging branch of computer animation, where virtual humans are regarded as agents with sense, perception, emotion, personality, motivation, behavior and action. 3D virtual humans are constructed with sensors for perceiving external stimuli and are able to express emotions autonomously. Mental state-based animation is demonstrated in a prototype system.
1 Introduction Life simulation is the dream that people have been pursuing all the time. The development of artificial intelligence has promoted the advancement of traditional computer animation. The combination of computer animation and artificial intelligence is closer in recent years, and modeling 3D virtual humans have already caused extensive concerns from many fields. Artificial life is the research field that tries to describe and simulate life by setting up virtual artificial systems with the properties of life. Artificial life becomes a new method in computer animation [1][2], entertainment industry need cleverer virtual human with built-in artificial life model. Computer animation and artificial life are blended each other, intelligent virtual life, a new research field is born in the intersection between artificial life and computer animation. We can get more understanding from the developing history of computer animation. Early computer animation only includes shape and movement of geometry model, it is very difficult to draw complex natural landscape, and artificial life can help to solve these problems. In general, artificial life model is based on bottom-up strategy. Emergence is the key concept of artificial life. It means complex system is from the simple location interactions of individuals. Another key concept of artificial life is adaptation, which means evolution. In 80 years of the 20th century, many models of computer animation were presented, such as particle model, L-system, kinematics and dynamics, facial animation, etc. In 90 years of the 20th century, artificial life influenced the development of the computer animation greatly, many intelligent animation characters with perception were realized on computer systems, and behavior animation and cognitive model were the milestones of development of computer animation. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 284–291, 2007. © Springer-Verlag Berlin Heidelberg 2007
Simulation of Virtual Human's Mental State in Behavior Animation
285
Behavior animation of virtual humans is becoming more and more important in computer graphics. People want to find a flexible and parameterized method to control locomotion of virtual human. Badler developed Jack software for virtual human animation [3]. Jack software is designed for simulating human factors in industrial engineering, they also built a system called Emote to parameterized virtual human and to add personality model for virtual humans [4]. N.M.Thalmann suggested that virtual human should not only look visual, they must have behavior, perception, memory and some reasoning intelligence [5], they also presented a personality model for avatars [6]. Gratch et al. presented a domain-independent framework for modeling emotion, they thought that people have beliefs about past events, emotions about those events and can alter those emotions by altering the beliefs [7]. Cassell et.al realized a behavior animation toolkit [8], and Pelachaud et.al presented a method to create facial expression for avatars [9]. A model of virtual human's mental state is presented in this paper, a believable 3D virtual human should be provided with the mechanism of mental variables that include emotion, personality, motivation and social norm (see Fig.1). Social norm includes status, interaction information and interaction rules, it controls the process of a nonverbal social interaction, it provides the social knowledge for virtual human. Based on Freud theory [10], ID is the inborn, unconscious portion of the personality where instincts reside, and it operates on the pleasure principle. Libiduo is a source of psychic energy. Ego is responsible for organizing ways in the real world, and it operates on the reality principle. Superego is the rulers that control what a virtual human should do or not. The research is mainly based on behavior animation, and the goal is setting up a mental state-based animation model for 3D virtual humans.
Fig. 1. Structure of a virtual human
2 Perception of Virtual Human In this paper, we only discuss the visual perception. Synthetic vision is an important method for visual perception [5], which can accurately simulate the vision from view of a virtual human, the method synthesis vision on PC. When a virtual human needs to observe the virtual environment, the demo system can render the scene in invisible
286
Z. Liu
windows with no texture, and get a synthesis vision image. The virtual human can decide what he (she) could see from the values in color buffer and depth buffer. The purpose of using color buffer is to distinguish objects with different color code. The purpose of using depth buffer is to get the space position of a pixel in the window, if the coordinate of a pixel is ( inv_x, inv_y), and zscreen is depth value. Let VTM is view transformation matrix, PTM is projection transformation matrix, and VM is viewport matrix. Let PW is the world coordinate corresponding to the pixel, and IPW ={ inv_x, inv_y, zscreen}, PW is calculated by formula(1). PW= IPW × (VTM × PTM × VM)-1.
(1)
In order to simulate perception for space, we use static partition of scene octree that is a hierarchical variant of spatial-occupancy enumeration [5]. We partition the static part of the scene in advance and record octree in data base module. We can use octree to solve path searching problem, as scene octree and the edges among them compose a graph, and so the path searching problem can be transformed to the problem of searching for a shortest path from one empty node to another in the graph.(See Fig.1).
Fig. 2. A* path searching (the left is no obstacle and the right is near a house)
In a complex virtual environment in which there are a lot of virtual human, synthetic vision will be costly. Furthermore, this method cannot get the detail semantic information of objects. Therefore, another efficient method for simulation of visual perception is presented. The visual perception of virtual human is limited to a sphere [1], with a radius of R and angle scope of θ max. The vision sensor is at point Oeyes (the midpoint between the two eyes), and sets up local left-handed coordinate system. Oeyes is the origin and X axis is along front orientation (See Fig.2). To determine whether the object Pob is visible, the first step is to judge whether Pob is in the vision scope. If || Pob - Oeyes ||< R and the angle between the ray and X axis is less than θ max/2, the object Pob is in the vision scope. The second step is to detect whether other obstacle occlude Pob. We can shoot a ray OP from the Oeyes to Pob, cylinders can serve as the bounding boxes of obstacles. In order to check the intersection of OP with an obstacle’s bounding box, we can check whether OP intersects with a circle that is a projection of the obstacle’s bounding box, and
Simulation of Virtual Human's Mental State in Behavior Animation
287
further to check whether OP intersects with the obstacle’s bounding box. In a 3D virtual environment, there are a lot of dynamic objects, on which we set up feature points (such as geometric center). If one feature point is visible, the object is regarded as visible. In the demo system, all obstacles are building. Based on the Gibson’s theory of affordances [11], affordances are relations among space, time and action. A virtual human can perceive these affordances directly. An affordance is invariance for environment. In this paper, we use the Gibson’s theory to guide navigation, affordances of objects hints navigation information. We set up some navigation information in database for special area or objects in the 3D virtual environment. For example, when a virtual human wants to walk across a road, we set navigation information of the zebra crossing is accessible, so that the virtual human will select zebra crossing. We use scene octree to simulate the virtual human’s perception for static object in 3D virtual environment. The locations of all dynamic objects are recorded in memory module in animation time step. If an object is visible, we suppose that a virtual human moves on a 2D plane, let Dovc is detection radius, dmin is avoiding distance for the virtual human, if Dovc
288
Z. Liu
Fig. 3. Visible detection and detection of object’s affordance
free area and detect buildings by octree. When the emergent behavior is over, he will return to road, go to Step 5. Step 5: the virtual human will move to the next navigation area, go to Step 2.
3 Mental Model of Virtual Human For a certain virtual human, BE is a basic emotion set, BE={be1, … ,beN}, i ∈ [1, N], bei is a basic emotion (such as happiness). N is the number of basic emotion. EIi(t) is the intensity of bei. EIi(t) ∈ [0, 1], t is time variable. bei is the unit vector of bei.. For example, be1={1,…,0}, beN={0,…,1}. Let ES is emotion state, E is represented emotion vector of ES, the projection length of E on bei is EIi(t). E can be represented as formula (2): N
E=
∑
EIi(t) bei .
(2)
i =1
Let E1 and E2 are two emotion vectors, the synthesis of E1 and E2 is represented as E1 + E2 in formula (3). N
E1 + E2 =
∑
[EIi1(t) bei1 + EIi2(t) bei2].
(3)
i =1
Let EP is the set of all emotion vectors, if any element of EP satisfies to formula (1)(2), EP is called emotion vector space, bei is called the basic emotion vector. Let Oj(t) is an external stimuli, no is the number of stimuli(j=1,…, no). Θ [Oji(t)] is the stimuli intensity function of Oj(t) for emotion bei , and 0 ≤ Θ [Oji(t)] ≤ 1. Any virtual human has the ability of resisting external stimuli, let Ci(t) is the resistive intensity of emotion bei . The weaker Ci(t) of a virtual human is, the more emotional the virtual human becomes to be with emotion bei , and 0 ≤ Ci(t) ≤ 1.
Simulation of Virtual Human's Mental State in Behavior Animation
289
Let PS is a personality set, PSk (t) is the personality variable of PS, PS={PSk (t)},
Θ [PSk(t)] is the intensity of PSk (t), nps is the number of personality(k=1,…, nps), and 0 ≤ Θ [PSk(t)] ≤ 1.
MV is a motivation variable set, MVm(t) is the motivation variable of MV, MV ={MVm(t)}, w is the number of motivation variable(m=1,…,w). Θ [MVm(t)] is the intensity of MVm(t), 0 ≤ Θ [MVm(t)] ≤ 1. Let LBD is a libido set, lbds(t) is libido variable of LBD, LBD ={lbds(t)}, nl is the number of libido variable(s=1,…,nl). Θ [lbds(t)] is the intensity of lbds(t), 0 ≤ Θ [lbds(t)] ≤ 1. A status is a social degree or position. In general, a virtual human may own many status (such as father or son), let ST(CA) is a status set for virtual human CA, ST(CA)=(st1,…,stNM), NM is the number of ST(CA), sti is a status(i=1,…, NM). In general, status can control personality and motivation, we can construct a knowledge base for simulating relation among libido, status, personality, motivation, and resistive intensity of emotion, let TCi(t) is the updating Ci(t), if Θ [Oji(t)]TCi(t)>0, the emotion bei is active. When an emotion is active, emotion expression include three phases as follows: (1) Growth phase: the intensity of an emotion grows from its minimum value [EIji(t)]min to its maximum value [EIji(t)]max, [DTji]growth is the duration time. (2) Delay phase: the intensity of an emotion is equal to its maximum value [EIji(t)]max, [DTji]delay is the duration time. (3) Decay phase: the intensity of an emotion decrease to its minimum value [EIji(t)]min, [DTji]decay is duration time. In order to simplify the three phases, for a given virtual human and an emotion bei., we can give a default duration time for a certain external stimuli Oji(t), let Θ [Oji(t)]max=1. The corresponding default duration time of the three phases are indicated by DT[Oji(t)]s-growth, DT[Oji(t)]s-delay and DT[Oji(t)]s-decay. We suppose the intensity of an emotion changes with linear rule in growth phase or decay phase. The three phases are described as formula (4)-(8): [EIji(t)]min = [ Θ [Oji(t)- TCi(t)]/(1- TCi(t)).
(4)
[EIji(t)]max = Θ [Oji(t)].
(5)
[DTji]growth =[EIji(t)]max.DT[Oji(t)]s-growth
(6)
[DTji]delay =[EIji(t)]max.DT[Oji(t)] s-delay
(7)
[DTji]decay =[EIji(t)]max.DT[Oji(t)]s-decay
(8)
A snapshot of the demo system is shown in Fig.4.
290
Z. Liu
Fig. 4. A virtual human in a virtual environment
4 Conclusion and Future Work This paper discusses some basic problems on constructing virtual human's mental model. A visual perception model is presented, and we can add some semantic information in database for special area to guide navigating of virtual human. Mental states for virtual human include emotion, personality, motivation and social norm. A computer model of mental process is presented. This model integrates emotion, outer stimuli, personality, motivation, emotion, and resistive intensity of emotion together. Acknowledgements. The work described in this paper was co-supported by science and technology project of Zhejiang Province Science Department (grant no: 2006C33046), the National Grand Fundamental Forepart Professional Research (grant no: 2005cca04400), the National High-Tech Research and Development Plan of China (grant no:2006AA01Z303), the Natural Science Foundation of NingBo City(grant no:2007A610038), and the National Natural Science Foundation of China (Grant no: 60672071).
References 1. Tu, X., Terzopoulos, D.: Artificial fishes: Physics, locomotion, perception, behavior. In: Proceedings of SIGGRAPH’94, pp. 43–50 (1994) 2. Funge, J., Tu, X., Terzopoulos, D.: Cognitive Modeling: Knowledge, Reasoning and Planning for Intelligent Agents. In: Proceedings of SIGGRAPH’99, pp. 29–38 (1999)
Simulation of Virtual Human's Mental State in Behavior Animation
291
3. Badler, N., Phillips, C., Webber, B.: Simulating Humans: Computer Graphics Animation and Control, pp. 154–159. Oxford University Press, New York (1993) 4. Chi, D., Costa, M., Zhao, L., Badler, N.: The EMOTE model for effort and shape. In: Proceedings of SIGGRAPH’00, pp. 173–182 (2000) 5. Thalmann, N.M., Thalmann, D.: Artifical Life and Virtual Reality, pp. 1–10. John Wiley Press, chichester (1994) 6. Egges, A., Kshirsagar, S., Thalmann, N.M.: Generic personality and emotion simulation for conversational agents. Computer Animation and Virtual Worlds 15, 1–13 (2004) 7. Gratch, J., Marsella, S.: A Domain-independent framework for modeling emotion. Journal of Cognitive Systems Research 5(4), 269–306 (2004) 8. Cassell, J., Vilhjalmsson, H.H., Bickmore, T.: BEAT: the behavior expression Animation toolkit. In: ACM SIGGRAPH, pp. 477–486 (2001) 9. Pelachaud, C., Poggi, I.: Subtleties of facial expressions in embodied agents. Journal of Visualization and Computer Animation 13, 287–300 (2002) 10. Bernstein, D.A., Stewart, A.C., Roy, E.J., Wickens, C.D.: Psychology, 4th edn., pp. 360– 361. Houghton Miffin Company, New York (1997) 11. Gibson, J.J.: The ecological approach to visual perception, pp. 1–50. Lawrence Erlbaum Associates Inc., Hillsdale, NJ (1986)
Hemodynamic Analysis of Cerebral Aneurysm and Stenosed Carotid Bifurcation Using Computational Fluid Dynamics Technique Yi Qian1, Tetsuji Harada2, Koichi Fukui2, Mitsuo Umezu2, Hiroyuki Takao3, and Yuichi Murayama3
#58-322, 3-4-1 Okubo, Shinjuku-ku, Tokyo, 169-8555, Japan 1
Institute of Biomedical Engineering, Waseda University, 2
#
Integrative Bioscience and Biomedical Engineering, Waseda University Graduate School 58-322, 3-4-1 Okubo, Shinjuku-ku, Tokyo, 169-8555, Japan 3 The Jikei University School of Medicine 3-25-8 Shinbashi, Minato-ku, Tokyo, 105-8461, Japan
Abstract. Cerebrovascular diseases are one of the three major mortalities in Japan, such as the rupture of cerebral aneurysm and cerebral infarction caused by carotid stenosis. The growth mechanism of the cerebral aneurysm and carotid stenosis has not been clearly understood. In this research, we are introducing a numerical simulation tool; Computational Fluid Dynamics (CFD) technique, to simulate and predict the hemodynamics of blood passing through the cerebral aneurysms and stenosed carotid arteries. The results of a ruptured and an unruptured cerebral aneurysm were compared. Energy losses were calculated in ruptured and unruptured cerebral aneurysms, the results were 167 Pa and 6.3 Pa respectively. The results also indicated that the blood flows took longer residence inside of bleb of the ruptured aneurysm. The maximum wall shear stress was observed at 70% stenosis from the simulation results of stenosed carotid bifurcation. The result qualitatively agrees with classical treatments in carotid bifurcation therapy.
1 Introduction Cerebral aneurysms are pathological dilations of the arterial wall that frequently occur near arterial bifurcations. The most serious consequence is their rupture and intracranial bleeding into the subarachnoid space, with an associate high mortality and morbidity rate [1-3]. In Japan, cerebrovascular diseases are one of the three major high mortalities. The percentage of cerebral arterial diseases are shown in Figure 1. However, currently the prognosis methods for subarachnoid hemorrhages (SAHs) are still not developed enough. The mechanism of cerebral aneurysm’s genesis, growing, and rupture are not competitively understood. Although the evolution of cerebral aneurysms are affected by variety reasons; such as pathological, hemodynamic and other factors, a better understanding of the blood patterns pass through the aneurysm and K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 292–299, 2007. © Springer-Verlag Berlin Heidelberg 2007
Hemodynamic Analysis of Cerebral Aneurysm and Stenosed Carotid Bifurcation
293
physically analysis the progression of aneurysms vessel wall will provide to aneurysm surgery a lot of valuable references to understand relationship between the pathophysiological aspects and aneurysms progression depending on its geometry and local hemodynamics. It will be critically support aneurysm surgery to understanding aneurysm growth, preparing treatment, and predicting the risk of its regrowth after treatment. Blood flows in an aneurysm generally depend on its geometric configuration and relation to the parent vessel, the size of the orifice and the volume of the aneurysm. The classical treatments of aneurysms are direct surgical clipping or endovascular coil insertion which decided by the size of aneurysm. However certain intracranial aneurysms are not easy to be carried out at some special complex structure (Figure 2); fusiform, wide-neck, giant size and involvement branch vessels etc. Hence, hemodynamic factors, such blood velocities, wall shear stress (WSS), and blood pressure, play important roles in the pathogenesis of aneurysms and thrombosis.
other 14% Basilar artery6%
Middle cerebral artery 34%
Anterior communicating artery 14%
Internal carotid artery 32%
Fig. 1. The percentage of cerebral diseases
ICA
MCA
Fig. 2. Anterior cerebral circulations
The hemodynamic anaysis of cerebral aneurysms have been developed using numerical and experimental methods[4-6]. The relationship between flow patterns and diseases development, particularly the WSS, have been motivated in several of the researches in recent years. However, most of the studies have focused on employing some commercial software or directly introduced industrial numerical tools into the blood flow simulation of the aneurysms. Most of these approaches have significantly imitation in the connection with in-vivo hemodynamic factors. Therefore, the challenge for aneurysm’s hemodynamic analysis using the computational numerical methods are; validation with large numbers and specific geometries of aneurysms from clinical records, specification the blood flow boundary conditions at performing vessel domain, and availability to create a predicting criterion to recognize the risk of cerebral aneurysm before rupture. Our studies are to develop an efficient transfer system to convent the clinical image data (MRI and CT angiography) into computational available vessel shape geometries, to computationally validate blood flow patterns between a rupture aneurysm and an unrupture aneurysm, and as well to analyse flow characteristics in a series of the stenosed carotid arteries.
294
Y. Qian et al.
2 Pre-processing: Image Angiography Transfer Methods The pre-processing of numerical simulation geometry creation process is shown in Figure 3. The original clinical images are selected from the clinical database in the Jikei University School of Medicine (JUSM). In last three years, there were over 650 unrupture patients were diagnosed at JUSM, and over 75% of the patients were performed CT scan, and they are continuously taking half-yearly examination regularly. The geometries of cerebral aneurysms which were used in this study were selected from two specific patients. One example was chosen from 40 years old female who was diagnosed as no risk patient at this stage in JUSM; the case is named unruptured case in this study (Figure 4b). The another angiography was selected from a dilation consequence and finally bled (ruptured); the latter case is named ruptured case in this study (Figure 4a). The angiography of ruptured case was recorded just before the aneurysm ruptured. The geometry of aneurysm and their parent vessels in both cases are similar. Their flow conditions at internal carotid artery (ICA) were used same boundary conditions (Figure 5).
Clinical
3-D angiography
Numerical model
Mesh generation
Fig. 3. A data transfer system for aneurysm geometry generation
A medical angiography visualization software RealINTAGE® is introduced to transfer and extract DICOM format clinical image angiography into threedimensional vessel angiography, and export a STL format of numerical model. Finally the STL format geometry was meshed using ICEM (ANSYS®).
f
500
Flow Rate [mL/min]
400
a. ruptured case
b. unruptured case
300 200 100 0 0
0.2
0.4
0.6
0.8
Time [s]
Fig. 4. Ruptured and unruptured aneurysm
Fig. 5. Inlet flow condition at ICA
1
Hemodynamic Analysis of Cerebral Aneurysm and Stenosed Carotid Bifurcation
295
3 Numerical Modeling and Simulation Methods The flows in this study was assumed as an incompressibility, Newtonian (considering minimum WSS>10 Pa), and laminar flows (Reynolds number<1000). The vessels were modeled as rigid and omission roughness inside of vessel surface. CFX (ANSYS®), Navier-Stocks governing equations solved basing on Finite Volume Method (FVM), was introduced as a main solver. All simulations were performed under a personal computer; 3.2 GHz process, 4 Gbit RAM memory.
4 Post-processing and Results Analysis 4.1 Basic Modeling Study and Validation We focused on the aspect ratio (AR=depth/neck width) and curvature of parent vessel (CV) which have been considered to be a critical factor of aneurysm rupture. We simulated basic aneurysm models with some different combinations of AR and CV, and analyzed their flows in the aneurysms. As a result, wall shear stress (WSS) on the aneurysm surfaces was observed to increase against AR reducing or CV increasing. The results are shown in Figure 6. The flow rate inside of aneurysm rises with the increasing of CV, and reversely decreases against AR (1, 1.5, and 2). The results good agree with the standard in current clinical treatment. Moreover, in order to confirm the CFD results, a serial of validation tests were performed using same size of silicone model. The detail of validation tests will be published in other reports. The simulation results were validated using PIV measurement (figure 7 and 8). Figure 7 is velocity vectors on the section (section A) of the centre of the aneurysm. In the figure 7, blue colour is CFD results and red colour shows PIV results. To confirm the results shown in Figure 7, x direction velocities (vx) and y 0.0008
]
Section A
2
1
m
0.0007
m
blue: CFD red: PIV
1.5
/
0.0006
n i
2
m /
0.0005
L [ e r
0.0004
a
u
0.0003
q s r
0.0002
e p
w
0.0001
o l f n i
0 0
30
60
90
120
150
180
curvature
Fig. 6. Flow rate against CV and AR
Fig. 7. Validation results
direction (vy) velocities on the section A are compared in the figure 8. Although the velocity inside of the aneurysm is very small, the results in figure 8 give a good agreement flow patterns between CFD and experiment results. The simulation results good agree with the experimental results.
296
Y. Qian et al.
0.4 0.35 0.3 0.25
s] / m [
0.2
ty i c o l e v
0.15
vx (CFD) vx (PIV) vy (CFD) vy (PIV)
0.1 0.05
-6
-4
0 -2 0 -0.05
2
4
6
8
10
12
y [mm]
Fig. 8. Velocity at centre line (section A)
a. ruptured case
b. unruptured case
Fig. 9. Stream lines
4.2 Cerebral Aneurysm Study We performed the unsteady (pulsatile as shown in figure 5) flow analyses of a ruptured cerebral aneurysm (AR=1.8) and an unruptured cerebral aneurysm (AR=1.5). As a result, the unsteady recirculation flows were observed in the vicinity of bleb in the ruptured case (Figure 9b), and the results of the unruptured case is shown in Figure 9a. The blood flows in a ruptured aneurysm are clearly observed more complex than the flow patterns inside of unruptured case. The flows in the ruptured aneurysm are not only separation, but also exists a strong swirl created in the centre of aneurysm. The results indicate that more energy was loosed at the ruptured case when blood flows pass through the aneurysm. Furthermore, a bleb is found at ruptured case in figure 9a. A high speed recirculation and flow attachment on the bleb can be also observed around the bleb edge in figure 12. In generally, the bleb on aneurysm is judged as one of highest risk at cerebral aneurysm clinical diagnose [7]. aneurysm
with aneurysm
non-aneurysm
Fig. 10a. Pressure distributions (unruptured)
aneurysm
with aneurysm
non-aneurysm
Fig. 10b. Pressure distributions (ruptured)
Figure 10 shows blood stream lines which started at the section of ICA. Observation at the inside of aneurysms, there are more flows passing through the aneurysm of the ruptured case. In order to validate the energy loss at the aneurysms, the aneurysms were taken off on screen. The simulations of non-aneurysm were carried at same flow conditions as ruptured and unreptured cases. The total pressure losses in the aneurysm
Hemodynamic Analysis of Cerebral Aneurysm and Stenosed Carotid Bifurcation
297
Table 1. Energy lose in ruptured, unptured aneurysms and their non-aneurysm (Pa) ruptured unruptured
aneurysm 636.9 464.5
non-aneurysm 469.9 458.2
energy lose in the aneurysm 167 6.3
were 167 Pa and 6.3 Pa in the ruptured case and the unruptured case, respectively (Table 1). The energy losses under both non-aneurysm situations are very close (469.9 Pa and 458.2 Pa), it is considered except aneurysm two geometries are very similar. Hence, it was suggested that a total pressure losses may have an effect on rupture. In addition, the results indicated that at the ruptured case blood takes longer time to passes into the aneurysm, then go out from the aneurysm, and the energy loss are also higher to compare with unruptured case. The energy losses may be transferred to the energy of pressure and stress to load on the pathological aneurysm surfaces. The aneurysm surfaces will be drawn and shrunk frequently.
Fig. 11. Blood in aneurysm (unruptured)
bleb bleb
Fig. 12. Blood in aneurysm (ruptured)
Figure 11 shows flow patterns at the section of unruptured aneurysm. One vortex is observed at the top of aneurysm, and main flows from parent vessel are resisted at near aneurysm neck area. On the other hand, three strong vortexes can be observed in Figure 12(a ruptured case). The main flows pass through the aneurysm neck and directly move into the aneurysm top, and stagnation creates at the top of the aneurysm. The flows are separated around the stagnation points or lines, then turn into the bleb. A vortex is visualized inside of the bleb. According to some clinical reports, many of ruptured aneurysm was ruptured at the top of the bleb.
298
Y. Qian et al.
4.3 Carotid Stenosis Study Introduction the CFD technology we developed in the simulation of cerebral aneurysm, We also simulated basic carotid bifurcation models with a different stenosis rate (0, 25, 50, 70, 90)% as shown in Figure 13. In 50% and 70% models, flow separation was observed on the downstream of stenosis. Wall Shear Stress (WSS) was observed to increase against the stenosis rate. The maximum WSS; 25 Pa, was observed in the case of 70% stenosis rate (Figure 14, 15). It is 3.9 times higher than the normal carotid bifurcation (0% model). The results qualitatively agree with the current standards using in carotid bifurcation treatment.
0% stenosed
25% stenosed
50% stenosed
70% stenosed
90% stenosed
Fig. 13. Wall Shear Stress under stenosed carotid bifurcation model
0.8 0.7 MAX
W all S h ear S tress P a
Velocity m/s
30
Average
0.6 0.5 0.4 0.3 0.2 0.1 0 0
20
60 % Stenosis
40
25 20 15 10 5 0
80
Fig. 14. Velocity against stenosis
100
0
20
40
60
% Stenosis
80
100
Fig. 15. WSS against stenosis
5 Discussion This study has initially developed an efficient process to modeling the blood flow around cerebral aneurysms and carotid bifurcation. Using this system, it is possible to transfer medical 3-D angiography image data (DICOM format) to geometry formats. The results are possible to be used for mesh generation and also available for model manufacture for the experiment.
Hemodynamic Analysis of Cerebral Aneurysm and Stenosed Carotid Bifurcation
299
We found there are different flow patterns in a ruptured aneurysm and an unruptured aneurysm. The blood lose in a ruptured case is about 30 times higher than an unruptured case. The flow inside of ruptured case appeared more complex than in an unruptured case. Furthermore, a small bleb was observed at the top of the ruptured aneurysm, and a separation and reattachment was found near the bleb. Although, above simulations are not enough to make a conclusion a role in prediction the aneurysm rupture, it may give us an instruction for our further study. In general, the size of aspect ratio (AR) becomes a standard for critical rupture decision in medical treatment. It has been previously reported that near 80% of AR> 1.6 cerebral aneurysms was ruptured, and about 90% of unruptured aneurysm are AR<1.6. Although the ruptured vase (AR=1.8) and the unnruptured cases (AR=1.5) which we simulated in this study are just agreed with statistic results, actually there are many clinical examples that aneurysm exploded, and even a small size aneurysm (or small AR)[6]. The simulation system we developed was also applied in the simulation of carotid bifurcation. The maximum WSS= 25 Pa, was observed at 70% stenosis rate which just agree with current standards using in carotid bifurcation treatment. Our study in cerebral artery hemodynamic analysis and visualization are working very closing between clinical outcomes and engineering laboratory. Even though there are some theoretical and computational tool limitation, by using our long-term experience in cardiovascular support system and our experienced neurosurgeries, the CFD techniques constructed in this study will be available to provide valuable hemodynamic parameters to estimate a clinically prediction for aneurysm rupture. Acknowledgements. The authors express gratitude to Mr. Takanobu Yagi and other member of PIV measurement team in Umeze’s laboratory, Waseda University for generously providing available PIV measurement results for the validation.
References 1. Foutrakis, G.N., et al.: Saccular aneurysm formation in curved and bifurcation arteries. AJNR Am. J. Neuroadiol. 20, 1309–1317 (1999) 2. Linn, F.H., et al.: Incidence of subarachnoid hemorrhage: role of region, year, and rate of computed tomography: a meta-analysis. Stroke 27, 625–629 (1996) 3. Winn, H.R., et al.: Prevalence of asymptomatic incidental aneurysms: review of 4568 arteriograms. J. Neurosurg. 96, 43–49 (2002) 4. Valencia, A.: Flow dynamics in models of intracranial terminal aneurysms. MCM 1(3), 221–231 (2004) 5. Todaka, T., et al.: Analysis of mean transit time of contrast medium in ruptured and unruptured arteriovenous malformations: A digital subtraction angiographic study. Stroke 34, 2410–2414 (2003) 6. Ujie, H., et al.: Effects of size and shape (aspect ratio) on the hemodynamics of saccular aneurysms: a possible index for surgical treatment of intracranial aneurysms. J. Neurosurg. 45(1), 119–130 (1999) 7. Tateshima, S., Murayama, Y., Villablanca, J.P., Morino, T., Nomura, K., Tanishita, K., Vinuela, F.: In vitro measurement of fluid-induced wall shear stress in unruptured cerebral aneurysms harboring blebs. Stroke 34(1), 187–192 (2003)
Active/Inactive Emotional Switching for Thinking Chain Extraction by Type Matching from RAS JeongYon Shim Division of General Studies, Computer Science, Kangnam University San 6-2, Kugal-ri, Kihung-up,YongIn Si, KyeongKi Do, Korea, Tel.: +82 31 2803 736
[email protected] Abstract. During the memorizing process, knowledge is memorized closely related to the emotional state. The emotional factor also effects on the decision making process as well as memorization. Accordingly in this paper Reticular Activating System was designed and the knowledge management strategy considering the emotional switching was proposed. We applied this system to the virtual memory and tested the results.
1
Introduction
According to the studies of brain,it is known that memorizing nervous system is closely correlated to the emotional nervous system and effects on each other. The reason why we can easily memorize the favorite things is related to this fact.In the time point of moving from machine oriented technology to human oriented one, it is very important to consider the emotional factors in designing the intelligent system. On the other hand, everything in the world has own property and has a tendency to select the information matching to the self type. Selective function of a living is a essential part for surviving because any living thing cannot process all the information of the externals. Because the encircled environment has dynamic, complex and unpredictable characteristics, a living thing should select the necessary information and reconstruct the virtual world appropriate for a small living thing to survive. As the information world is getting more and more complex and dynamic, the requirement of smarter intelligent system is getting high. Smarter intelligent system means the human friendly system familiar to the person. For building this system, it is essential to adopt the human properties and intelligent processing and many studies for human intelligent system have been developed for these several decades. As one of these studies, the author designed Reticular Activating System considering the selecting measure in the previous paper and Selective functional strategy was introduced [1]. Developing the previous system, in this system we design the intelligent knowledge based system including the emotional factor. The emotional concept is inserted to the selecting, memory structuring and knowledge retrieving step. Especially Active/Inactive emotional K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 300–306, 2007. c Springer-Verlag Berlin Heidelberg 2007
Active/Inactive Emotional Switching for Thinking Chain Extraction
301
switching was designed the various extracting ways based on this knowledge network frame,Type matching and the emotional switching. We developed advanced model and applied this system to the virtual memory and tested with the sample data.
2 2.1
Advanced Reticular Activating System The Structure of Advanced Reticular Activating System
As described in the previous paper, Reticular Activating System consists of three parts, i.e., Knowledge Acquisition part, Selection part and Memory part. The different things compared to the previous model are Emotional factor and Type matching concept.
Fig. 1. Advanced Reticular Activating System
First,Knowledge acquisition part has multi modular NN(neural Network)s and perform the learning process with the training data according to the categories. It uses BP(Back Propagation) algorithm. The output nodes of Modular NN are connected to nodes in Associative layer which has logical network connected by associative relations. Its learning mechanism is described in the paper [1] in detail. Second, Reticular Activating layer has a knowledge net which consists of nodes and their associative relations. The nodes in knowledge net are connected to the nodes of Associative layer vertically. The importance value is assigned to the connection weight of this vertical relation. Selection module performs selecting process with these values of associative relations and vertical relations using the criteria given by Meta Knowledge. Third, Storing to Memory consists of two part of Knowledge Reconfiguration and storing the values for NN. In Reconfiguration, the selected nodes and relations are reconfigured and stored in memory. The knowledge net is performed by attaching nodes centering around common node. After reconfiguration the
302
J. Shim
centering node is connected to index which is used in searching process. In the case of polysemy, the common node is connected by Active/Inactive Emotional switching to multiple knowledge net. The another part of memory is storing the values for NN. After finishing the learning process of modular NN, this system stores the values of category, parameters and weight matrix. These stored values are used for perception, inference and knowledge retrieval.
3 3.1
The structure of Knowledge Cell in Knowledge Network Knowledge Cell
Knowledge cell is an atomic element composing the knowledge network in memory. It consists of Knowledge ID, Self Type, Emotional factor and Entropy of the cell as shown in the following figure.
Fig. 2. Knowledge cells and relations in knowledge network
3.2
Type Matching Relation
We defined self type as a particular property of the object. An object has one of five types, i.e. M,F,E,K and S. There exists Attracting relation or Rejecting relation between two types. If one type meets Attracting relation, two types are associated and their relational strength increases. On the contrary, if it meets Rejecting relation,expelling strength works on two types and their strength decreases as shown in table 1 and table 2. 3.3
Type Matching Selection Mechanism
Type Matching Selection System was designed as following Figure 2. If ACTIVE/INACTIVE EMOTIONAL SWITCHING switch turns on, the activating degree of selecting type is set. According to ACTIVE/INACTIVE EMOTIONAL SWITCHING signal, Type Selection Module selects matching knowledge with
Active/Inactive Emotional Switching for Thinking Chain Extraction
303
Table 1. Type Matching Rule : Attracting Relation Attracting Relation M⊕F F⊕E E⊕K K⊕S S⊕M Table 2. Type Matching Rule: Rejecting Relation Rejecting Relation ME ES SF FK KM
Fig. 3. Basic structure of Knowledge net
that type from Master Knowledge net in Reticular layer. Then Knowledge Reconfiguration module reconstructs new knowledge net with selected knowledge in the previous step. Active/Inactive switching controls the signal for activating status. ACTIVE/ INACTIVE EMOTIONAL SWITCHING switching has two status of P and N. P signal represents ’Active’ status and N signal represents ’Inactive’ status. If ACTIVE/INACTIVE EMOTIONAL SWITCHING switching is set on the status of ’P’, ’Affecting relation’ works stronger than ’Being affected relation’. Conversely, if it is ’N’ status, ’Being affected relation’ is stronger than ’Affecting relation’. For example, Suppose that the type of one object is B and type B is related to two Type Matching Rules ,i.e. A ⊕ B, B ⊕ C. In this rule A is ’affecting relation’ to B and C is ’being affected relation’ by B. That is,on the signal of ’P’ a B- type-object accepts C type knowledge stronger than A type knowledge. On the signal of ’N’,a B-type-object accepts stronger A type knowledge than C type knowledge.
304
3.4
J. Shim
Thinking Chain Extraction
The graphic representation of knowledge net is translated to associative knowledge list of table 3. In thinking chain extracting mechanism, starting from keyword node all the related knowledge chains are extracted according to the associative strength and emotional factor. Table 3. Associative knowledge list K − node T ype M E Relation − S K − node Ki
Ti
Mi Ei
Sij
Kj
Kj
Tj
Mj Ei
Sjk
Kk
:
:
:
:
:
:
Algorithm 1 : Thinking chain extraction STEP 1: Search the keyword. STEP 2: Look up Attracting Rule Table Search Affecting Type,Hp Search Being affected Type, HN STEP 3 : Look up Rejecting Rule table Search Affecting Type,Rp Search Being affected Type, RN STEP 4 : Extract the related node one by one from the knowledge list Put the extracted knowledge cell to the thinking chain list. STEP 5 : If Active emotional switching; then activate the emotional Factor; STEP 6 : If Inactive Emotional Switching; then ignore the Emotional factor STEP 7 : Output thinking chain list STEP 8: Stop.
4
Experiments
ACTIVE/INACTIVE Switching selection mechanism is applied to virtual memory. Table 4 shows translated Associative Knowledge List that represents master knowledge Net. Suppose that the type of object is F, the extracted thinking chain is as following Figure 6. New net was configured by extracted Type Matching Rules which are M ⊕ F and F ⊕ E. Figure 6 described the testing result of ACTIVE/INACTIVE Masking. It shows Associative knowledge list of reconfigured knowledge net according to
Active/Inactive Emotional Switching for Thinking Chain Extraction
305
ACTIVE/INACTIVE switching signal. For the type of object, F, a piece of type matching knowledge was selected and new knowledge net was reconfigured. As a result,it was found that New knowledge net controlled by ACTIVE/INACTIVE Type matching selection was successfully made. This new knowledge net containing type information can be used for efficient data retrieving. Table 4. Associative Knowledge List of Master Knowledge Net Ki T M E Rij Kj K1 F 0.0 0.2 1.0 K2 K1 F 0.0 0.2 0.0 K8 K2 M 0.0 0.7 0.9 K3 K2 M 0.0 0.7 0.5 K6 K3 M 0.0 -0.1 0.7 K4 K3 M 0.0 -0.1 1.0 K5 K4 E 0.0 0.7 0.0 N ull K5 E 0.0 -0.3 0.3 K7 K6 S 0.0 0.9 0.4 K7 K7 M 0.0 0.1 0.6 N ull K8 K 0.0 0.5 0.1 K9 K8 K 0.0 0.5 0.1 K10 K8 K 0.0 0.5 1.0 K11 K9 S 0.0 0.1 0.7 N ull K10 K 0.0 0.3 0.7 K12 K11 E 0.0 0.5 0.4 K12 K12 M 0.0 0.1 0.4 N ull
Fig. 4. Thinking chain extraction
5
Conclusions
In this paper, we propose Reticular Activating system which has functions of selective reaction, learning and inference. This system consists of Knowledge acquisition, selection , storing and retrieving part. Reticular Activating layer is connected to Meta knowledge in the high level of this system and takes part
306
J. Shim
in Data Selection. Type definition and ACTIVE/INACTIVE Type matching selection mechanism in Reticular Activating System were specially designed.We applied this system to the virtual memory and tested ACTIVE/INACTIVE Type Matching selection mechanism. As a result of testing, we could find that new knowledge net was successfully made by ACTIVE/INACTIVE Type matching selection. It is expected that Reticular Activating system , the concept of its ACTIVE/INACTIVE Type matching selection can contribute to implement flexible associative memory and efficient retrieval mechanism.
References 1. Shim, J.-Y.: Knowledge Retrieval Using Bayesian Associative Relation in the Three Dimensional ModularSystem. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 630–635. Springer, Heidelberg (2004) 2. Goldstein, E.B.: Sensation and Perception, BROOKS/COLE 3. Pearl, J.: Probabilistic reasoning in intelligent systems, networks plausible inference. Morgan kaufman Publishers, San Francisco (1988) 4. Fausett, L.: Fundamentals of Neural Networks. Prentice Hall, Englewood Cliffs (1994) 5. Haykin, S.: Neural Networks. Prentice Hall, Englewood Cliffs 6. Shim, J.-Y., Hwang, C.-S.: Data Extraction from Associative Matrix based on Selective learning system. In: IJCNN’99, Washongton D.C, vol. 4, pp. 2550–2553 (1999) 7. Anderson, J.R.: Learning and Memory. Prentice Hall, Englewood Cliffs 8. Shim, J.-Y.: Automatic Knowledge Configuration by Reticular Activating System. LNCS (LNAI). Springer, Heidelberg (2005)
Pattern Recognition for Brain-Computer Interfaces by Combining Support Vector Machine with Adaptive Genetic Algorithm Banghua Yang, Shiwei Ma, and Zhihua Li Shanghai Key Laboratory of Power Station Automation Technology, Department of Automation, Shanghai University, Shanghai 200072, China
[email protected],
[email protected],
[email protected]
Abstract. Aiming at the recognition problem of EEG signals in brain-computer interfaces (BCIs), we present a pattern recognition method. The method combines an adaptive genetic algorithm (GA) with the support vector machine (SVM). It integrates the following three key techniques: (1) the feature selection and model parameters of the SVM are optimized synchronously, which constitutes a hybrid optimization; (2) the aim of the hybrid optimization is to improve the classification performance of the SVM; and (3) the hybrid optimization is solved by using the adaptive GA. The method is used to classify three types of EEG signals produced during motor imaginations. It yields 72% classification accuracy, which is higher 8% than the one obtained with the individual optimization of the feature selection and SVM parameters.
1 Introduction A brain-computer interface (BCI) is an alternative communication and control channel that does not depend on the brain’s normal output pathway of peripheral nerves and muscles [1]. A BCI system can help severely disabled people to communicate with computers or control electronic devices through their thoughts. Most BCIs utilize EEG signals to detect distinguishable brain states. These distinguishable brain states are then transformed into external actions through the recognition of EEG signals. Over the past years many evidences have evaluated the possibility to recognize a few mental tasks from EEG signals [2-4]. However, how to improve the recognition performance of EEG signals is still a key problem [5]. The recognition procedure of EEG signals includes three steps: the feature extraction, the feature selection and the classification. This paper mainly concerns the feature selection and the classification. The feature selection is to select an optimal feature subset from all candidate features, which is an optimization problem. The feature selection can improve the generalization performance of the classifier, reduce its complexity and speed up its training process. In the classification, parameters of a classifier affect its classification performance. The selection of parameters of the classifier is also an optimization problem. In previous methods used in BCIs, none of these two optimization problems are considered or they are performed independently K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 307–316, 2007. © Springer-Verlag Berlin Heidelberg 2007
308
B. Yang, S. Ma, and Z. Li
[6-10]. However, the feature selection and the classification are dependent on each other. No optimization or optimizing only one of them is difficult to ensure that these two problems obtain optimal solutions simultaneously. We will explore a novel method that optimizes the feature selection and classifier parameters simultaneously. The support vector machine (SVM) is a relatively new classification technique that has shown to perform strongly in a number of real-world problems, including BCIs [5]. We will use the SVM as the classifier. At the same time, the genetic algorithm (GA) is a global and probabilistic search algorithm that is based on the mechanics of natural selection and population genetics. It can maintain a good balance between searching width and searching deepness [11]. So, we will use an adaptive GA to optimize the Feature Selection and SVM parameters simultaneously, so the method is called GA-FS-SVM.
2 Data Six healthy subjects (sub1-sub6) participated the experiment. They seated in a shielded room with dim lighting. Sounds around the surroundings were not controlled painstakingly considering for further application. A 32-channel elastic electrode cap was used to record EEG. The data were recorded at the sampling rate 100Hz with ESI-128. Each subject repeated the experiment for two sessions. Each session comprised 150 trials. The subjects were asked to imagine performing one of three motor imagery tasks (playing basketball with left hand, playing basketball with right hand, and braking with right foot) in a self-paced mode during each trial. Each trial lasted 5.75s~6.25s (mean 6s) and consisted of three phases: 1) a 0.75s~1.25s (random) resting phase; 2) a 1s preparation phase; and 3) a 4s of motor imagery task phase during which subjects were performing the corresponding motor imagery task according to the direction of the arrow (a left arrow and a right arrow indicate to imagine left hand and right hand movement respectively, a down arrow means right foot). The data during the last 4s of each trial were used to perform analysis. The module of the data acquisition can be seen in Fig.1.
3 Method 3.1 The Feature Extraction Fig.2 depicts the diagram of a simple BCI system. The proposed GA-FS-SVM concerns the feature selection and the classification. As for the feature extraction, we adopt the spectral power as the feature, which is commonly used in BCIs. The most related frequency information during motor imagery is the Mu (8-12Hz) and Beta (18-26Hz) rhythms on the scalp just above the motor cortex [12]. Mean powers within these two bands are calculated as features. So, two dimensional features can be obtained for each EEG channel. Considering the practicality of BCIs system, we use six electrodes (C3, C4, P3, P4, CZ, and PZ, see Fig.3.) which are considered important EEG channels. Then we can obtain a 12-dimensional feature vector F = { f1 , f 2 ,..., f12 } , in which f 1 ~ f 6 are mean powers within the band 8-12Hz
Pattern Recognition for BCIs by Combining SVM with Adaptive GA.
309
Fig. 1. The module of the data acquisition
Fig. 2. The diagram of a simple BCI system
c3
cz
c4
p3
pz
p4
Fig. 3. The used six electrodes
of six channels respectively according to the above channel order and
f 7 ~ f 12 are
mean powers within the band 18-26Hz. 3.2 The Adaptive GA Theory The GA is a global and probabilistic search algorithm that is based on the mechanics of natural selection and population genetics. It can maintain a good balance between searching width and searching deepness. The GA starts to search from a set of initial solutions in a population. An individual (al-so called a chromosome) implies a possible solution to the problem and it consists of many genes. Each gene represents a feature or a parameter. In the feature selection, a binary gene represents a feature in
310
B. Yang, S. Ma, and Z. Li
which a gene bit “1” denotes the corresponding feature is selected and a gene bit “0” denotes the feature is eliminated. The method of optimizing parameters is similar to the feature selection and the only difference between them is that a floating-point gene represents a parameter. The GA consists of many parameters, such as selection operator, crossover operator, mutation operator, etc. The crossover operator is crossed by probability Pc and the mutation operator is mutated by probability Pm . Fixed crossover probability Pc and mutation probability Pm may result in premature and local convergence. So, we adopt an adaptive GA. The adaptive GA can be defined by the following formulas:
Pc = k1 ( f max − f ′) /( f max − f )
(1)
Pm = k 2 ( f max − f ′′) /( f max − f )
(2)
Where k1 , k 2 are constants, k1 , k 2 ≤ 1.0 , the two parameters should be adjusted ac-
cording to a given problem, f max and f are the maximum fitness and the average fitness of a population respectively; f ′ is the larger one of fitness values of the two individuals used to cross; f ′′ is the fitness of the individual used to mutate. The detailed description of the adaptive GA can be seen in [13]. 3.3 The Basic SVM Theory
The SVM is a powerful and relatively new classification method based on statistical learning theory. The SVM has many remarkable characteristics such as good generalization performance, the absence of local minima and sparse representation of the solution. The problem of pattern recognition may be stated as follows: given a data set L , with xi input features, yi classification output and m the number of samples, then
L = {( x1 , y1 ), ( x2 , y2 ),..., ( xm , ym )}
(3)
The SVM finds an optimal separating hyperplane by maximizing the margin between classes. The algorithm consists of solving the following optimization problem: l 1 ( w ⋅ w ) + C ( ξi ) ∑ min 2 w,ξ i =1
(4)
ξ i ≥ 0 , y i ( wφ ( x i ) + b ) ≥ 1 − ξ i . The parameter ξ i is called a slack variable and ensure that the problem has a solu-
where
tion in case the data are not linear separable. The parameter C is a tradeoff variable, w is an adjustable weight vector, ϕ ( x) is a nonlinear function for feature mapping. The decision function is
Pattern Recognition for BCIs by Combining SVM with Adaptive GA.
311
m
f ( x) = ∑ wφ ( xi ) + b
(5)
i =1
It can be described further by the dot product, m
f ( x) = ∑ α i yi (φ ( xi ) ⋅ φ ( x)) + b
(6)
i =1
The dot product can be performed by a Kernel function K ( x, y ) . So, the decision function can be described as follows: l
f ( x ) = ∑ α i y i K ( xi , x ) + b
(7)
i =1
During solving the SVM, finding good Kernel function parameters and a parameter C is an important part of the model selection. The classification performance of the SVM is strongly dependent on values of parameters. 3.4 The GA-FS-SVM Method
The feature vector is F = { f 1 , f 2 , L , f 12 } . We can encode a chromosome with S = {s1 , s2 , L , s12 } , si ∈ {0,1}, i = 1,2,L,12 . Before classification with the
SVM, some SVM parameters need to be given. The most common Kernel functions are polynomial function and radial basis function. We select the polynomial func-
(Gamma * u ⋅ v + Coeff ) Degree , where u , v are input vectors, Gamma, Coeff , Degree are parameters of the kernel function. So the training model of the SVM can be constructed as M = {Gamma, Coeff , Degree, C} , where Gamma, Coeff , Degree, C ≥ 0 . We can encode a chromosome with C = {c1 , c2 , c3 , c4 } , where ci ∈ R , i=1,2,3,4. tion
The GA-FS-SVM method optimizes the feature selection and SVM parameters simultaneously and its structural diagram is shown in Fig.4. The hybrid optimization can be regarded as optimizing H = { F , M } . The chromosome H is encoded with G = {s1 , s2 , L s12 , c1 , c2 , c3 , c4 } . A specified chromosome leads to a feature subset and a SVM model simultaneously. We evaluate the performance (fitness) of a chromosome using the average classification accuracy of the SVM. The calculation of the fitness can be outlined as follows: (1) to a specified chromosome, we randomly select half of all trials as training samples; (2) the selection procedure of training samples is performed ten times and so we can obtain ten values of the classification accuracy; (3) the fitness is obtained by averaging the ten values. When the adaptive GA attains convergence, the optimal chromosome is obtained, i.e. the optimal feature subset and SVM parameters are obtained. The optimized results are used to classify unknown samples.
312
B. Yang, S. Ma, and Z. Li
Feature Subset
F H M Optimized Feature subset and SVM Parameters
G SVM Parameters
SVM
Fig. 4. The structural diagram of the GA-FS-SVM
3.5 Parameters of the Adaptive GA
(1) Parameters initialization: We select the evolution generation t=100, population size p=100. The selected ranges of Gamma, Coeff , Degree, C are [0,2], [0,5], [0,1], [0,500] respectively according to our experience. (2) Selection: we adopt the selection mechanism of proportional fitness and elitism strategy. The chance of reproduction for an individual of the parent generation to the next generation is proportional to its fitness value. Meanwhile, the fittest individual is taken over directly into the next generation. (3) Crossover: we select a single-point crossover mechanism with a probability of pc =0.8 to create new chromosomes in each pair. The crossover probability pc is obtained according to formula (1). (4) Mutation: we adopt a multi-uniform mutation operator combining with a multiGaussian mutation operator, in which the mutation probability of each operator is 0.05. The mutation probability of each operator pm is obtained according to formula (2). In formula (1) and (2), k1 =0.8; 6) k 2 =0.4. k1 and k 2 .are obtained by the experience and adjustments. The other parameters are selected according to common suggestions in [11].
4 Results and Analysis 4.1 Results
In order to verify the performance of the GA-FS-SVM, we perform the following three strategies: (1) GA-FS-SVM: it is described in section 3; (2) GA-FS: the adaptive GA only optimizes the feature selection. As for the classification, we randomly select five groups of SVM parameters: C1{0.1, 1, 0.1, 300}, C2{0.2, 0.1, 0.1, 200}, C3{1, 0.1, 0.1 300}, C4{1.5, 0.1, 0.1, 250}, C5{1.2, 3, 0.1, 400}. (3) GA-SVM: the adaptive GA only optimizes SVM parameters. As for the feature subset, we randomly select five groups: F1 {f1, f2, f5}, F2 {f2, f5, f8, f9, f11}, F3 {f3, f5, f7, f9}, F4 {f5, f6, f7, f9, f10}, F5 {f1 ~ f12}.
Pattern Recognition for BCIs by Combining SVM with Adaptive GA.
313
It should be noted that initial features, GA parameters and the calculation of the fitness used in all above methods are the same. As an example, Fig.5 shows the classification accuracy of training samples of subject 1 (sub1) versus generation t with different strategies. Classification results of testing samples of different subjects with different strategies are shown in Tab.1. It also should be noted that results of the GAFS-SVM are obtained by running the program ten times and then averaging the ten values. We calculate the mean value and the standard deviation of classification accuracies. It should be noted that these values are obtained by calculating C1~C5 in the GA-FS, F1~F5 in the GA-SVM. In addition, we perform the one sample t test to the GA-FS-SVM with other strategies. The mean value, the standard deviation, and the t value are shown in Tab.2. The comparison of mean values of the classification accuracy among different strategies is plotted in Fig.6. 4.2 Analysis
Classification accuracy
From Tab.1 we can see that different strategies can result in different results for any one subject. In addition, different parameters in one strategy can also result in different classification accuracies, which show that the feature selection and SVM parameters all affect the classification performance. The Fig.6 shows that the GA-FS-SVM obtains the best result among all strategies because it optimizes the feature selection and the classification synchronously and so it obtains the optimal feature subset and SVM parameters synchronously. Results of t test in Tab.2 show that the classification accuracy obtained by the GA-FS-SVM is significantly higher than other methods. The GA-FS only optimizes the feature selection and the GA-SVM only optimizes SVM parameters, which means that they lack optimal SVM parameters or the optimal feature subset. So obtained results by them are inferior to the one obtained by the GA-FS-SVM. The GA-FS-SVM obtains an average classification accuracy (mean value of six subjects) 72.0%, which is higher about 8% than the one (64.2%, by averaging six subjects and the GA-FS, the GASVM) obtained by the GA-FS and the GA-SVM. 0.75 0.72 0.69 0.66 0.63
GA-FS-SVM GA-FS (C2) GA-SVM (F2)
0.60 0.57 0
20
40
60
80
100
Generation t Fig. 5. The classification accuracy versus generation t
314
B. Yang, S. Ma, and Z. Li Table 1. Classification results of different subjects with different strategies
GA-FS
name
-SVM
Classification accuracy (%)
Method
GA-FS C1
C2
C3
GA-SVM C4
C5
F1
F2
F3
F4
F5
sub1
70.6
67.2 65.4 67.5 68.6 66.5 64.7 62.9 63.6 62.8 61.7
sub2
72.4
65.6 63.8 64.2 67.5 64.8 66.1 64.8 65.1 64.1 62.8
sub3
68.5
61.8 59.7 60.1 61.6 59.8 62.9 64.1 63.1 63.7 65.1
sub4
74.1
64.3 62.9 61.7 63.6 60.5 68.5 69.1 66.8 65.7 66.2
sub5
68.9
63.5 60.7 61.5 59.8 60.5 63.7 65.8 61.9 62.7 65.3
sub6
75.8
65.8 64.7 62.9 63.1 64.8 69.1 68.7 66.8 67.4 69.1
Table 2. The mean value, the standard deviation and t value
subjects
Sub2
Sub3
Sub4
Sub5
Sub6
GA-FS
67.0
65.2
60.6
62.6
61.2
64.3
GA-SVM
63.1
64.6
63.8
67.3
63.9
68.2
Standard GA-FS deviation GA-SVM (%) t (9) p<0.01
1.19
1.46
1.02
1.52
1.42
1.23
1.11 -7.501
1.23 -18.12
0.88 -10.50
1.47 -10.24
1.66 -9.91
1.06 -12.86
Mean value (%)
classification accuracy (%)
sub1
78 76 74 72 70 68 66 64 62 60
GA-FS-SVM GA-FS GA-SVM
1
2
3 4 5 subject number
6
Fig. 6. The comparison of different strategies
Pattern Recognition for BCIs by Combining SVM with Adaptive GA.
315
5 Conclusions (1) Both the feature selection and SVM parameters play an important role in the classification. Different feature subset and different SVM parameters can result in different classification results. (2) In BCIs, the GA-FS-SVM optimizes the feature selection and the SVM parameters synchronously, which can pick the most promising feature subset and excellent training model to classification. It avoids the disadvantage of optimizing only one of them. Duo to the limited amount of data and subjects, the classification accuracy needs to be further investigated. Based on the very promising results we obtained here, we are investigating the possibility of developing the GA-FS-SVM further. In this paper, we want to show the potential of the hybrid optimization method. Acknowledgements. The research is partly supported by Shanghai Leading Academic Discipline Project (T0103).
References 1. Wolpaw, J.R., Birbaumer, N., Heetderks, W.J.: Brain computer interface technology: a review of the first international meeting. IEEE Trans. Rehab. Eng. 8, 64–73 (2000) 2. Millán, J.d.R., Mouriño, J., Franzé, M., Cincotti, F.: A local neural classifier for the recognition of EEG patterns associated to mental tasks. IEEE Trans. Neural Networks 13, 678– 686 (2002) 3. Millán, J.d.R.: Adaptive brain interfaces. Commun. ACM 46, 74–80 (2003) 4. Wolpaw, J.R., McFarland, D.J., Neat, G.W., Forneris, C.A.: An EEG-based braincomputer interface for cursor control. Electroenceph. Clin. Neurophysiol. 78, 252–259 (1991) 5. schroder, M., Bogdan, M., Rosenstiel, W., Hinterberger, T. (ed.): Automated EEG Feature Selection for Brain Computer Interfaces. In: Proc. 1st Int. IEEE EMBS Conf. Neural Engineering, Capri Island, Italy, pp. 626–629 (2003) 6. Garrett, D., Peterson, D.A., Anderson, C.W., Thaut, M.H.: Comparison of Linear, Nonlinear, and Feature Selection Methods for EEG Signal Classification. IEEE Trans. Neural Syst. Rehab. Eng. 11, 141–144 (2003) 7. Graimann, B., Huggins, J.E., Levine, S.P., Pfurtscheller, G.: Detection of ERP and ERD/ERS patterns in single ECoG channels. In: Proc. 1st Int. IEEE EMBS Conf. Neural Engineering, Capri Island, Italy, pp. 614–617 (2003) 8. Yom-Tov, E., Inbar, G.F.: Feature selection for the classification of movements from single movement-related potentials. IEEE Trans. Neural Syst. Rehab. Eng. 10, 170–177 (2002) 9. Kaper, M., Meinicke, P., Grossekathoefer, U., Lingner, T., Ritter, H.: BCI Competition 2003—Data Set IIb: Support Vector Machines for the P300 Speller Paradigm. IEEE Trans. Biomed. Eng. 51, 1073–1076 (2004) 10. Lal, T.N., Schröder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Schölkopf, B.: Support vector channel selection in BCI. IEEE Trans. Biomed. Eng. 51, 1003–1010 (2004)
316
B. Yang, S. Ma, and Z. Li
11. Guangnan, X., Ruiwei, C.: Genetic Algorithm and Engineering Optimization. Tsinghua, Beijing, China (2004) 12. Brett, D.M., Justin, W., Sebastian Seung, H.: BCI competition 2003-Data Set Ia: combining gamma-band power with slow cortical potentials to improve single-trial classification of electroencephalographic signals. IEEE Trans. Biomed. Eng. 51, 1052–1056 (2004) 13. Guozheng, Y., Ting, W., Banghua, Y.: Automated feature selection based on an adaptive genetic algorithm for brain-computer interfaces. LNCS, pp. 575–582 (2006)
Improved Locally Linear Embedding by Cognitive Geometry Guihua Wen1 , Lijun Jiang1 , and Jun Wen2 1
South China University of Technology, Guangzhou 510641, China
[email protected] 2 Hubei Insitute for NationalitiesEnsi 445000, China
Abstract. Locally linear embedding heavily depends on whether the neighborhood graph represents the underlying geometry structure of the data manifolds. Inspired from the cognitive relativity, this paper proposes a relative transformation that can be applied to build the relative space from the original space of data. In relative space, the noise and outliers will become further away from the normal points, while the near points will become relative closer. Accordingly we determine the neighborhood in the relative space for Hessian locally linear embedding, while the embedding is still performed in the original space. The conducted experiments on both synthetic and real data sets validate the approach.
1
Introduction
As classic linear approaches can be reliably applied only when the data manifolds are linear, two new approaches have recently been developed to deal with the nonlinear data manifold. One is ISOMAP that preserves the manifold geometry at all scales and has better ability to deal with nonlinear manifolds [2]. It has many variants such as incremental extension of ISOMAP [8] . However they will not be investigated here. The other is locally linear embedding (LLE) that is based on local approximation of the geometry of the manifold[1]. LLE has many variants, such as Laplacian eigenmaps [4]and Hessian eigenmaps(HLLE) [5], incremental LLE [7], supervised LLE [10,14], robust locally linear embedding[11], integrated LLE with classic PCA or LDA, integrated LLE with SOM etc. HLLE bears substantial resemblance to the LLE and LE, but shows better excellent performance on highly twisted data. In the asymptotic limit of infinite sampling, similar to ISOMAP, HLLE can recover manifolds that are locally isometric to subsets of a lower dimensional Euclidean space. Particularly for ISOMAP, these subsets must be open and convex, whereas for HLLE, they need only be open and connected[5]. Furthermore it is also more computationally efficient. However, the performance of HLLE heavily depends on how well the constructed neighborhood graph represents the underlying data manifold. Currently approaches to determining the neighborhood for HLLE are easily threatened by noise and outliers, as where the determined neighborhood may be critically distorted. This can in turn lead to drastically incorrect low-dimensional embedding. Currently four kinds of work focus on optimizing the neighborhood. One focuses K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 317–325, 2007. c Springer-Verlag Berlin Heidelberg 2007
318
G. Wen, L. Jiang, and J. Wen
on building the connected neighborhood graph to guarantee that the information on relative positions of the connected components would not be lost[12,13]. The second focuses on discovering the neighborhood with any shape using all kinds of measures such as the geodesic distance and path algebra [21] and special distances for handwriting recognition[2] as well as for labeled data in supervised manifold learning [10,20]. Thirdly, the optimal neighborhood size is expected to be determined automatically by the residual variance and reconstruction error [15,9]. For the unevenly distributed manifolds, the adaptive neighborhood size can be also determined[19]. Finally, the neighborhood is optimized by deleting the noises or outliers from the input data set[17], however sometimes the outliers are useful to data analysis such as outlier mining so that they can not be removed simply. This paper exploits an new way to optimize the neighborhood by simulating cognitive laws. Inspired from the cognitive relativity, it proposes a relative transformation to build the relative space from the original space of data, and then determine the neighborhood relationships for HLLE in relative space, while the embedding is still performed in the original space. The improved HLLE has better stability to deal with the sparsely sampled or noise contaminated data sets. This approach is simple, general and easy to be implemented. It also has clear physical meaning and does not add any parameter.
y x
Fig. 1. Human perception on images is relative
2
Computational Cognitive Geometry
It is experienced that the perception is relative. This can be illustrated as Fig.1. When we observe the circle x, it looks bigger than its original size as it is compared with its around smaller circles. In contrast, when the circle y is observed, it becomes smaller than its original size by comparing with its around circles. Consequently, when we observe x and y simultaneously, generally x is regarded as bigger than y. As a matter of fact, they are the same size. This cognitive characteristic is very beneficial for us to distinguish the object from its around objects. It can be modelizd using geometry model to process the data more efficiently. We call this approach as computational cognitive geometry. For example, we can define a transformation on the original space to build the new space whose dimensions are composed of all points in the original space.
Improved Locally Linear Embedding by Cognitive Geometry
319
The newly created space is called relative space and generated through relative transformation: ΓX : X → Y ⊂ R|X| yi = ΓX (xi ) = (di1 , di2 , · · · , di|X| ) ∈ Y, dij = ||xi − xj ||, j = 1 · · · |X|, where |X| is the number of elements in data set X, the point xi in the original space is mapped to the point yi ∈ R|X| in the relative space, and · is the distance norm. yi is also denoted as xri . Theorem 1: ∀xi ,xj ∈X d(xi , xj ) ≤ d(xri , xrj ) Let xr = (d(xi , x1 ), d(xi , x2 ), · · · , d(xi , xn )), then d(xri , xrj )2 n i = k=1 (d(xi , xk ) − d(xj , xk ))2 = nk=1,k=j (d(xi , xk ) − d(xj , xk ))2 + (d(xi , xj ) − d(xj , xj ))2 = nk=1,k=j (d(xi , xk ) − d(xj , xk ))2 + d(xi , xj )2 ≥ d(xi , xj )2 This theorem indicates that the relative transformation is a kind of distance enlarging transformation. This is much beneficial for us to observe the more detailed topological structure of data. Theorem 2: ∃X∧xi ,xj ,xk ∈X (d(xi , xj ) = d(xi , xk ) ∧ d(xri , xrj ) = d(xri , xrk )) This can be illustrated by an example. It can be observed from 2 that d(x3 , x1 ) = d(x3 , x4 ) in the original space while d(y3 , y1 ) < d(y3 , y4 ) in the relative space. Accordingly the relative transformation is nonlinear. The data can be distinguished in the relative space that can not be identified in the original space. y4 x4 10.4403
6 11.0905
7
9 6
x1
13.4164
x3
y3
8.6023
3 y1
4
x2
5.5677 6.7082 y2
(a)
(b)
Fig. 2. Noises and outliers are refrained in the relative space, where (a) is the original space and (b) is the relative space
The relative transformation is simple but efficient to deal with the outliers and noises, as shown as Fig.2. In the original space the point x4 can be regarded as the noise or the outlier as it is far away from the rest three points. However d(x3 , x1 ) = d(x3 , x4 ) in the original space. This means that the point x4 has the same possibility with the point x1 to be taken as a neighbor of point x3. This is not consistent with our intuition. In the relative space, d(y3 , y1 ) < d(y3 , y4 ), the outlier or noise becomes further away from the normal points. This makes it easy to refrain the influence of noises and sparsity of data on dimensionality reduction. Furthermore, in the relative space, the distances among points vary
320
G. Wen, L. Jiang, and J. Wen
nonlinearly. Possibly it makes closer the points belonging to the same surface of the manifold while it makes the further away the points located at the different surfaces. This is specially useful to sparse data set. It is also concluded that, for a small scale of d-dimensional data set X ⊂ Rd , when |X| < d, it implements the natural dimensionality reduction by building the relative space Y ⊂ R|X| . This is very useful for the small-scale but high-dimensional data sets. Finally, this approach has a simple mathematical basis and it allows a compact mathematical description of arbitrarily shaped neighborhood in the original space.
3
Improved Hessian Locally Linear Embedding
HLLE assumes that the neighborhood for each point is linear. This is not easily satisfied when confronted with noise and outliers. Some critical outliers on noisy data manifold result in short-circuit edges, connecting two submanifolds directly, which often cause a topological instability. For example, a single data point which (a)k=5
(b)k=10
10
(c)k=15
10
10
0
0
0
−10
−10
−10
20 10 0 −10
0
10
20 10 0 −10
(d)k=5
0
10
20 10 0 −10
(e)k=10
10
10
0
0
0
−10
−10
−10
20 10 0 −10
0
10
0
10
(f)k=15
10
20 10 0 −10
0
10
20 10 0 −10
0
10
Fig. 3. Neighborhood graphs on the top line constructed in the original space and neighborhood graphs on the bottom line constructed in the relative space
lies between closely apart surfaces, can connect these two surfaces if the size of neighborhood is relatively large in constructing a neighborhood graph. In such a case, the manifold structure is totally collapsed by an outlier[17], therefore to reduce the number of short circuiting edges is very significant for embedding. This can be reached in the relative space, as where the arbitrary shape of neighborhood in the original space can be discovered. The formed neighborhood graph is obviously better than that in the original space, as shown as Fig.3, where the manifold with 400 points is sampled from Swiss-roll surface[2], but contaminated by adding random Gaussian noise where the mean of the noise is 0 and the variance is 0.4. Obviously in Fig. 3, the neighborhood graphs on the top line constructed in the original space have more short circuit edges than those on the bottom line formed in the relative space for the same neighborhood size. It also can be observed that the larger the k takes, the more short circuit edges occur. It seems that we can take the smaller value for k to reduce the number of short circuit edges, however, it will produce the more reconstruction errors which in turn leads to drastically incorrect embeddings. Therefore k should take
Improved Locally Linear Embedding by Cognitive Geometry
321
the value as large as possible when short circuit edges do not occur. Obviously to determine the neighborhood relationship in the relative space, k can take the larger value. We apply it to determine the neighborhood for HLLE. The improved HLLE is denoted as R-HLLE, which is given as follows for integrity. Algorithm: R-HLLE(X, k, d) /* X = {xi } ⊂ Rn be the high dimensional data set, k be the neighborhood size, d be the dimension of the embedding space,min(k, n) > d. The output is Y = {yi } that is a collection of |X| points in Rd .*/ 1. Transform X into the relative space denoted as X r by yi = ΓX (xi ∈ X) ∈ X r , in which we determine the neighbors for each point yi ∈ X r using k-NN approach. The neighbors index is denoted as Nir (yi ) = {j|yj is the neighbor of yi } 2. For any point xi ∈ X, its k nearest neighbors are determined as Ni = {xj |j ∈ Nir (yi )}, which forms a k×n matrix M i whose rows consist of the re-centered points xj − x¯i , where x¯i = average{xk : xk ∈ Ni }. 3. Obtain Tangent Coordinates. Perform a singular value decomposition of M i , getting matrices U , D, and V . V is k by min(k, n). The first d columns of V give the tangent coordinates of points in Ni . 4. Develop Hessian Estimator. Develop the infrastructure for least-squares estimation of the Hessian. In essence, this is a matrix H i with the property that if f is a smooth function f : M → R, and fi = (f (xi )) , then the vector v i whose entries are obtained from f by extracting those entries corresponding to points in the neighborhood H i , then, the matrix vector product H i v i gives a d(d+1)/2 vector whose entries approximate the entries of the Hessian matrix, ∂f /∂Vi ∂Vj . 5. Develop Quadratic Form. Build a symmetric Hi,j having, in coor matrix i l dinate pair i and j, the entryHi,j = ((H ) (H )l,j ). Here by H i , r,i l r the d(d + 1)/2 by k matrix is associated with estimating the Hessian over neighborhood Ni , where rows r correspond to specific entries in the Hessian matrix and columns i correspond to specific points in the neighborhood. 6. Find Approximate Null Space. Perform an eigenanalysis of H, and identify the (d + 1)-dimensional subspace corresponding to the (d + 1) smallest eigenvalues. There will be an eigenvalue 0 associated with the subspace of constant functions; and the next d eigenvalues will correspond to eigenvecˆ d in which the embedding coordinates tors spanning a d-dimensional space v are to be found. ˆ d which has the property 7. Find Basis for Null Space. Select a basis for v that its restriction to a specific fixed neighborhood N0 (the neighborhood may be chosen arbitrarily from those used in the algorithm) provides an orthonormal basis. The given basis has basis vectors {yi } which are the embedding coordinates. Let V be the N ×d matrix of eigenvectors built from the nonconstant eigenvectors associated to the (d + 1) smallest eigenvalues, and let Vl,r denote the l-th entry in the r-th eigenvector of H. Define the matrix (R)r,s = j∈Nl Vj,r Vj,s . The desired N by d matrix of embedding coordinates is obtained from: Y = V · R−1/2 .
322
G. Wen, L. Jiang, and J. Wen
R-HLLE determines the neighborhood for each point in the relative space in the first step and modifies the second step, where the time complexity increases O(|X|2 ). The rest steps remaind the same as that of HLLE, still performed in the original space. It only requires the solution of N separate k × k eigenproblems, so that it can solve much large-scale data analysis problems.
4
Experimental Results
Several experiments are conducted to directly compare the results of R-HLLE with that of HLLE,LLE and ISOMAP by the performance. In experiments, LLE (k=12), HLLE(k=12) and ISOMAP(k=7) take the same values for parameters as that HLLE used in its experiments. To avoid taking the parameters beneficial to only our approach, R-HLLE also takes 12 for the parameter k, the same as that of HLLE and LLE. 4.1
Swiss Roll Surface
Swiss Roll surface is widely applied to compare non-linear dimensionality reduction approaches[2,1,3,5,8,12]etc. It is like a rolled-up sheet of paper and thus is exactly isometric to Euclidean space. Here we take the sampling procedure that HLLE applied[5]. Instead of sampling parameters in a full rectangle, it samples from a rectangle with a missing rectangular strip punched out of the center. The resulting Swiss roll is then missing the corresponding strip and thus is not convex. Using this model, we take many random samples from the Swiss Roll surface and do experiments on (1)well sampled data without noise (2) well sampled data with noise (3)sparsely sampled data Experiment 1. On well sampled data sets without noise. We take many random sample of 800 points from Swiss Roll surface[5], and then apply LLE, HLLE, ISOMAP and R-HLLE to perform embedding for each sample. It can be observed from these experiments that nonconvexity can have dramatic effect on the resulting embeddings. Although the data manifold is still locally isometric to Euclidean space, in most cases, the effect of the missing sampling region is, in the case of LLE, to make the resulting embedding functions asymmetric and nonlinear with respect to the original parametrization. In the case of ISOMAP, the nonconvexity causes a strong dilation of the missing region, warping the rest of the embedding. On most samples, similar to R-HLLE, HLLE can perform embedding correctly. However, as shown as Fig.4, HLLE fails to embed the results completely on some other samples where R-HLLE can embed the result into the two-dimensional space correctly. Experiment 2. To compare the topologically stability of four approaches on data sets with noise, we take some random samples of 800 points from Swiss Roll surface, but add random Gaussian noise where the mean of the noise is 0 and the variance is 0.4. From these experiments, it can be observed that nonconvexity and noise can have dramatic effect on the resulting embeddings. In most
Improved Locally Linear Embedding by Cognitive Geometry Original Data
323
ISOMAP
500
10 0
0
−10 20 15 10
−500 5 10 −50
5
−500
Regular LLE
0
500
HLLE
R−HLLE 0.05
1
0.05
0 −1
0
−2
−0.05
0
−0.05
−3 −1
0
1
2
−0.1 −0.1 −0.05
0
0.05
−0.05
0
0.05
Fig. 4. Embedding results of four approaches on data sets without noise
cases, The results of LLE and HLLE are completely confused, whereas in the case of ISOMAP, the nonconvexity causes a strong dilation of the missing region, warping the rest of the embedding. But R-HLLE often achieve the good embedding results. One of examples is illustrated as Fig.5 where R-HLLE performs best. Original Data
ISOMAP 500
10 0
0
−10
−500 5 10 −50 −500
20 10 0
Regular LLE
0
500
HLLE
R−HLLE
2 0.05
1 0
0.05
0 0
−1 −0.05
−2 −3 −2
−0.1 0
2
−0.05
0
−0.05 −0.05 0.05
0
0.05
Fig. 5. Embedding results of four approaches on noisy data sets
Experiment 3. Generally, in sparsely sampled data sets, the Euclidean distance between points in neighborhood becomes larger as compared to the distance between different folds of the manifold. This easily makes four approaches face the short circuit problem. To make comparison between four approaches about robust to samples with sparse sampling density, we take some random samples of 400 points from Swiss Roll surface and do experiments. It can be observed that nonconvexity and sparsity of data set can have dramatic effect on the resulting embeddings. In some cases, four approaches can not get the ideal results. But by comparison, R-HLLE behaves best. One of examples can be shown as Fig.6, where LLE and HLLE achieve the completely confused embeddings, while RHLLE performs better.
324
G. Wen, L. Jiang, and J. Wen Original Data
ISOMAP 500
10 0
0
−10 20 15 10
−50
5
5 10−500 −200 0 200 400
Regular LLE
R−HLLE
HLLE
2
0.1
0.1
1
0.05
0.05
0
0
0
−0.05
−0.05
−1 −2
−0.1 −3 −2 −1
0
1
−0.05 0 0.05 0.1
−0.1−0.05 0 0.05
Fig. 6. Embedding results of four approaches on sparse data sets
4.2
Iris Data Set
We take Iris data as a real world example, which consists of 150 records[13,12]. This data set has two main clusters: The class Iris Setosa is separate from the other two classes that are not much separate from each other. As ISOMAP fails to build a connected neighborhood graph in the original space on this data set when k < 25, in this experiments k takes 45. It can be observed from Fig.7 that R-HLLE outperforms HLLE and LLE, as it has less inter-classes and intra-classes overlapping. It performs almost as the same as ISOMAP. HLLE
RHLLE
0.2
0.1 0
0 −0.2 −0.2
0
0.2
Setosa−0.1 Versicolour Virginica −0.2
−0.1
Regular LLE 2
2
0
0
−2 −4
0 0.1 ISOMAP
0.2
−2 −2
0
2
−2
0
2
Fig. 7. Embedding results of four approaches on Iris data
5
Conclusion and Future Work
We have presented an approach to build the relative space from the original space by simulating the cognitive laws, and then determine the neighborhood in the relative space for Hessian locally linear embedding. The improved HLLE can more nicely deal with sparsely sampled or noisy data sets, by comparison with HLLE, LLE and ISOMAP. This approach is simple, general and easy to be implemented. It also has clear physical meaning and does not add any parameter. The proposed idea and approach are likely to be even more useful in combination with other methods in data analysis and statistical learning. In the future, we will investigate it furthermore and find widespread use in these areas.
Improved Locally Linear Embedding by Cognitive Geometry
325
References 1. Roweis, S.T., Saul, L.K.: Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 290, 2323–2326 (2000) 2. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290, 2319–2323 (2000) 3. Balasubramanian, M., Schwartz, E.L.: The ISOMAP Algorithm and Topological Stability. Science 295, 7 (2002) 4. Belkin, M., Niyogi, P.: Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computing 15, 1373–1396 (2003) 5. Donoho, D.L., Grimes, C.: Hessian eigenmaps: Locally linear embedding, techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100, 5591–5596 (2003) 6. Silva, V.D., Tenenbaum, J.B.: Global versus local methods in nonlinear dimensionality reduction. Neural Information Processing Systems 15, 705–712 (2003) 7. Kouropteva, O., Okun, O., Pietikainen, M.: Incremental locally linear embedding. Pattern Recognition 38, 1764–1767 (2005) 8. Law, M.H.C., Jain, A.K.: Incremental nonlinear dimensionality reduction by manifold learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 377–391 (2006) 9. Samko, O., Marshall, A.D., Rosin, P.L.: Selection of the optimal parameter value for the Isomap algorithm. Pattern Recognition Letters 27(9), 968–979 (2006) 10. de Ridder, D., Kouropteva, O., Okun, O., et al.: Supervised locally linear embedding. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, pp. 333–341. Springer, Heidelberg (2003) 11. Chang, H., Yeung, D.-Y.: Robust locally linear embedding. Pattern Recognition 39, 1053–1065 (2006) 12. Yang, L.: Building k-Connected Neighborhood Graphs for Isometric Data Embedding. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(5), 827–831 (2006) 13. Yang, L.: Building Connected Neighborhood Graphs for Locally Linear Embedding. In: The 18th International Conference on Pattern Recognition, pp. 1680–1683 (2006) 14. de Ridder, D., Loog, M., Reinders, M.J.T.: Local Fisher embedding. In: Proceedings of the 17th International Conference on Pattern Recognition, pp. 295–298 (2004) 15. Saxena, A., Gupta, A., Mukerjee, A.: Non-linear dimensionality reduction by locally linear ISOMAPs. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 1038–1043. Springer, Heidelberg (2004) 16. Sung, H.S., Lee, D.D.: The Manifold Ways of Perception. Science 290, 2268–2268 (2000) 17. Choi, H., Choi, S.: Robust kernel ISOMAP. Pattern Recognition 40, 853–862 (2007) 18. Li, D., Liu, C., Du, Y., Han, X.: Artificial Intelligence with Uncertainty. Journal of Software 15, 1583–1594 (2004) 19. Wen, G., Jiang, L., Wen, J., Shadbolt, N.R.: Performing Locally Linear Embedding with Adaptive Neighborhood Size on Manifold. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 985–989. Springer, Heidelberg (2006) 20. Wen, G., Jiang, L., Wen, J., Shadbolt, N.R.: Clustering-based Nonlinear Dimensionality Reduction on Manifold. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 444–453. Springer, Heidelberg (2006) 21. Wen, G., Jiang, L., Shadbolt, N.R.: Using Graph Algebra to Optimize Neighborhood for Isometric Mapping. In: 20th International Joint Conference on Artificial Intelligence(IJCAI-07), India, pp. 2398–2403 (2007)
Predicting the Free Calcium Oxide Content on the Basis of Rough Sets, Neural Networks and Data Fusion Yunxing Shu1,2, Shiwei Yun2, and Bo Ge1,2 1
School of Mechatronic Engineering Wuhan University of Technology Hubei, Wuhan, 430070, China 2 Luoyang Institute of Science and Technology Henan, Luoyan, 471003, China
[email protected]
Abstract. This study first created a model to predict the content of free calcium oxide (fCaO) of the calcined clinker in the rotary kiln by adopting the technologies of rough sets, neural networks and data fusion. And then it was used to predict the quality of the calcined clinker in the rotary kiln and pleasant simulation results were obtained, indicating that the model is valid and has attained the goal of increasing the training speed and precision. Besides, it has solved many problems in the course of cement production, such as big inertia, lagging, time variation, serious nonlinearity, multiple parameters, serious coupling, and difficulty in creating systematic models.
1 Introduction At present, the new dry-process manufacturing technique featuring off-kiln decomposition technology (predecomposition kiln) is representing the state of art in the field of cement manufacturing techniques and technologies and will take the lead in the development direction of cement techniques for a considerably long period of time in the future. Cement quality is mainly determined by clinker quality. To control the quality of cement is to control the quality of clinker. The parameters of describing clinker quality mainly include the content of free calcium oxide, the weight of a cubic liter of clinker, the time of setting, the density of the cement at its various phases, etc., among which the content of free calcium oxide (fCaO) is one of the most important parameters to influence the clinker quality. Cement production is a kind of complex hot working reaction process, which is featured with big inertia, lagging, time variation and serious nonlinearity, and besides, there are many complex parameters and serious couplings between them, and it is hard to build systematic models. Research on artificial neural networks (ANN)[1] has been underway for over 40 years. Applying neural networks to process monitoring systems has become a very active research field. However, neural networks have the following limitations [2]: they pose high requirements to the quantity and quality of the learning sample; the the structures and parameters of the models are poor in their generalizability; models are hard to optimize. In order to overcome the shortcomings of neural
③
K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 326–334, 2007. © Springer-Verlag Berlin Heidelberg 2007
②
①
Predicting the Free Calcium Oxide Content on the Basis of Rough Sets
327
networks, we can manipulate different data so that they can be processed by their own neural networks, and thus high dimensional data spaces can be decomposed into lower dimensional data spaces and complex networks can be changed into simple networks. In this way, the structure of the neural network is simplified and the speed of network training is increased. The output results of the various subnetworks will then undergo further information fusion processing by resorting to the evidence theory [3-5], and finally the processing results are obtained. As a kind of brand-new mathematical tool for processing imprecise and incomplete data, the rough set theory [6] has been widely applied in such fields as machine learning and data mining. Rough sets can effectively simplify samples, eliminate redundant information, and reduce the knowledge base space. As the preprocessing link of the neural networks, rough sets are used to conduct reduced preprocessing of the data to eliminate the noise and redundancy in the samples, thus not only enhancing the prediction precision of the neural networks, but also reducing the learning load at the same time. Therefore, the combination of the two will substantially reduce the learning time of the neural networks. On the basis of the rough set theory and the integration of neural networks, reference [1] proposed a method to intelligently diagnose engine failures. The two are used in combination can obtain better diagnosis results than those obtained by using either one at a time. Reference [7] used rough sets to conduct preprocessing of classified bank data, compressing the data in the aspects of attribute quantities and their values, eliminating the noise and redundancy, and increasing the predication precision of the model from 84% to 96%. References [8-9] combined rough sets with neural networks and eliminated the redundant data on the condition of retaining important information so as to increasing the training speed. In this study we built a rough neural network. First, we used rough sets to realize attribute reduction and extract relevant rules, and then, according to the size of the content of free calcium oxide (fCaO), we identified possible output subspaces and conducted quantitative approximation in these subspaces through the neural network, and finally, we fused the data from the various subspaces, thus attaining our goal of increasing the training speed and the prediction precision.
2 Rough Set Data Analysis The main thrust of the rough set theory is to export resolutions of problems or rules of classification through attribute reduction on the condition of keeping invariant the classifying ability of the information system. However, rough set modeling is a kind of structured non-numerical informationized processing method, which is suitable to processing discrete and qualitative data and is limited in its ability to process continuous data. 2.1 Basic Concepts of Rough Sets
K = (U , A, V , f ) , in which U = {x1 , x2 ,L , xn } is the universe(a finite set of objects), A is the set of attributes
Suppose we have an information system
328
Y. Shu, S. Yun, and B. Ge
(features, variables), V is the domain of all attributes, and f : U × A → V is an information function. If the attribute set A = C ∪ D, C ∩ D = φ , then C is called
the conditional attribute, and D is called the decision attribute. The knowledge representation system which has the conditional attribute and the decision attribute is called the decision table. Definition 1. suppose K = (U , A,V , f ) is an information system,
∀P ⊆ A , define
the equivalence relation on P IND( P ) = {( x, y ) ∈U × U | ∀p ∈ P, f ( x, p) = f ( y, p )}
(1)
Under the circumstances when confusion is not possible, the equivalence relation IND ( P ) can also be designated as P [6]. Definition 2. suppose P, Q ⊆ A , if IND (Q ) ⊆ IND ( P ) , then we say P is dependent on Q , designated as Q ⇒ P [6]. The dependence relation expresses the following rule:
Suppose Q = {q1 , q2 ,L , qn }, P = { p1 , p2 ,L , pk } , for every t = {t1 , t2 ,L , tn } , in which
ti ∈Vqi , uniquely determines the attribute value set s = {s1 , s2 ,L , sk } , in which si ∈V pi , namely: ( ∀ x ∈ U )[( f ( x , q1 ) = t1 , L , f ( x , qn ) = tn ) ⇒ f ( x , p1 ) = s1 , L , f ( x , pk ) = sk )]
(2)
2.2 The Rule’s Degree of Matching and Relevance Grade
Suppose we extract a rule Q ⇒ P , with the quantity of m, from the original data, in which the i th rule R i can be represented as:
R i : if ( f ( x , q1 ) = t1i ,L , f ( x , qn ) = tni then f ( x , p1 ) = s1i ,L , f ( x , pk ) = ski )
(3)
In which t ij ∈Vq j , s ij ∈V p j . For a group of input input = {I1 , I 2 ,L , I n } , define the function i ⎪⎧1, I k = tk , k = 1,2,L , n, i = 1,2,L , m gi (k ) = ⎨ i ⎪⎩0, I k ≠ tk
(4)
Define the degree of matching between the input and the i th rule as n
Ci =
∑ g (k ) i −1
i
n
, i = 1, 2,L, m
(5)
It demonstrates the degree of matching between the input pattern and the i th rule. Because rules are extracted from the training samples, which may contain noises,
Predicting the Free Calcium Oxide Content on the Basis of Rough Sets
329
therefore usually a large quantity of rules will be generated. In order to eliminate such influence, we define the support of the rule. The support of the i th rule is defined as si =
Si U
(6)
In which Si = {x | f ( x, q1 ) = t1i ,L , f ( x, qn ) = tni } means the set which is composed of the samples that have met the conditions of the i th rule. The support of the rule indicates the proportion of the quantity of the samples that have met the conditions of the rule to the total quantity of samples. Consequently, a threshold value for the support can be set so as to delete those rules generated from individual noisy sample instances or those rules whose supports are very low. We adopt the arithmetic product between the support of the rule and the degree of matching of the rule to define the relevance grade μi of the input input in relation to the i th rule, namely
μi = Ci ⋅ si
(7)
3 Modeling Method 3.1 Modeling Process of the Rough Neural Network
The modeling process was divided into three steps: (1) The discretization treatment of the data The discretization treatment of the data refers to the process of discretizing the attribute values of continuous attributes into a finite set of discrete values. The input and output data of any actual system are mostly of the continuous type. The continuous sampling of data attributes is generally limited within a certain range, but theoretically the possible attribute values within this range are limitless, thus rendering the statistics of their supports to lose its original meaning. Therefore, before conducting rough set data analysis, we have to give the data a treatment of discretization. (2) Rough set data analysis This step includes: (A) Attribute reduction: that is to say, eliminate the redundant attributes; After attribution reduction, some of the unnecessary attributes in the decision table can be eliminated. The decision table still meets the requirements of congruity and the compatibility of the rules. In relation to the initial decision table, the adaptability of the rules in the table at this moment has been increased substantially. (B) Value reduction and rule extraction: namely, eliminate the unnecessary attribute values from the rules and obtain the simplest rule; Although the adaptability of the rules in the decision table has been increased substantially after attribute reduction, there still remains a lot of redundant information in
330
Y. Shu, S. Yun, and B. Ge
each rule. Value reduction is to further eliminate the redundant conditional attributes of each sample from the decision table on the basis of attribute reduction. After such treatment, the rules remaining in the decision table no longer have any redundant conditional attributes; that is to say, the quantity of the conditional attributes of each rule has been reduced as possible as we can. (C) Calculation of the support of each rule: set the threshold values, delete some rules whose supports are lower, and eliminate the influence brought about by noises. (3) Training of the artificial neural subnetworks In line with the discretization results obtained from step 1, classify the training samples into k categories according to the decision attributes, namely, {D1 , D2 ,L , Dk } . Suppose X i ⊆ U are objects contained in the i th category. These objects constitute the input and output of the subnetworks with a quantity of k . We use these data to conduct training of the subnetworks, and the subnetworks with a quantity of k are obtained from the training, in which the input-output relation of the i th subnetwork is as follows:
outi = fi (input ) , i = 1,2,L, k In which subnetwork.
(8)
input represents the input data, and outi represents the output of the i th
3.2 Use Rough Neural Networks and Data Fusion to Make Predictions and Decisions
Just now we created a neural network model based on rough sets. The process of using this model to make predictions nad decisions can be divided into steps: (1) Discretize the prediction samples according to the discretized intervals obtained from the modeling process; (2) Judge the input by using the rules extracted from the rough data analysis, and respectively calculate their relevance grades {μ1 , μ2 ,L , μm } ; (3) Fuse the data and work out the output values. m
out =
∑μ i =1
i
⋅ f i ( input )
(9)
m
∑μ i =1
i
4 Simulation Study In order to verify the validity of the model proposed in this study, we selected 7 predicting parameters from the new-type dry-process cement production technique parameters[10] as the input and output, i.e., the system’s raw feed amount A1, the coal feed amount of the rotary kiln A2, the coal feed amount of the decomposing furnace A3, the kiln’s rotational speed A4, the saturation coefficient A5 of the lime in the raw feed, the silicon percentage A6 of the raw feed, the aluminum percentage A7 of the raw feed, and the actually measured content of fCaO in the clinker. All these input technique parameters are
Predicting the Free Calcium Oxide Content on the Basis of Rough Sets
331
main factors that may cause the fluctuation of the calcination quality of the clinker in the rotary kiln. It should be pointed out that other parameters, such as the combustion heat value of the burning coal in the rotary kiln, the excess coefficient of the air in the rotary kiln, the temperature of the gases and materials at the rear of the rotary kiln, the temperature of the clinker coming out of the rotary kiln, and the temperature of the secondary air, also play important roles in the process of cement clinker calcination. Considering the conditions of the data collection, the total quantity of the data and other influencing factors, we just selected the above-mentioned 7 technological parameters. (1) Discretization treatment of the data In actual production, if the content of fCaO is lower than 1.5%, then the cement is qualified, but if it is higher than 1.5%, the cement is not qualified. From our datasheet we can see that the decision table has 7 conditional attributes and 1 decision attribute. First, we discretized the decision attribute, and then we used the software Rosetta provided jointly by the Norwegian University of Science and Technology and the Institute of Mathematics, Warsaw University, Poland to discretize the conditional attributes, and the results are shown in Table 1. From Table 1 we can see that the two attributes of the saturation coefficient A5 of the lime in the raw feed and the silicon percentage A6 of the raw feed can be reduced, and thus the quantity of the attributes is reduced to 5 from 7. Table 1. Discretized intervals of the various continuous attributes Attributes A1 A2 A3 A4 A5 A6 A7 d
Discretized Intervals [∗,179.2), [179.2,185.8), [185.8, ∗) [∗, 5.5), [5.5, ∗) [∗, 9.1), [9.1, ∗) [∗, 3.0), [3.0, 3.1), [3.1, ∗)
[∗,1.44), [1.44, ∗) [∗,1.5), [1.5, ∗)
(2) Reduction of attributes and extraction of rules We used the software Rosetta to analyze the data and obtained the minimal relative reduction after the attribute reduction was completed, and then 28 rules were extracted, and finally according to equation (6), we worked out the supports of these rules respectively. (3) Training of subnetworks In step (1), according to the contents of fCaO, we classified the decisions into two categories so that the system could carry out training in two separate subnetworks, with either subnetwork using one BP network (with 5 input nodes, 5 hidden layer nodes and 1 output node) to carry out training. For the output results of either subnetwork, we used the relevance grades of the input samples to the various rules as
332
Y. Shu, S. Yun, and B. Ge
weight values to fuse the subnetwork output. Besides, we used the 50 testing data presented in reference [10] to conduct testing and comparison, and the comparison of the simulation results is shown in Table 2. Table 3 provides the results of the comparison between main performance indexes. Table 2. Comparison of the simulation results Sample No. Neural network error Rough neural network error Sample No. Neural network error Rough neural network error Sample No. Neural network error Rough neural network error Sample No. Neural network error Rough neural network error Sample No. Neural network error Rough neural network error Sample No. Neural network error Rough neural network error Sample No. Neural network error Rough neural network error Sample No. Neural network error Rough neural network error Sample No. Neural network error Rough neural network error Sample No. Neural network error Rough neural network error
1 0.0071 -0.0306 6 0.1434 0.5261 11 0.9041 -0.2920 16 0.5115 -0.2163 21 0.2037 0.7198 26 -0.3504 0.3744 31 -0.2535 -0.1081 36 0.2362 0.0155 41 -0.1006 0.0281 46 0.6564 0.2083
2 0.5172 -0.1201 7 -0.0723 0.0763 12 0.6184 -0.6615 17 0.4760 -0.4727 22 0.6665 -0.7882 27 0.0953 -0.1108 32 0.7998 -0.4366 37 0.0725 -0.0036 42 0.0213 -0.2102 47 -0.0899 -0.0481
3 0.1935 -0.5228 8 -0.7871 -0.2704 13 -0.7420 -0.0233 18 -0.4471 0.4373 23 -0.1368 -0.3934 28 -0.2979 0.2375 33 -0.4425 0.4778 38 1.0172 -0.4792 43 -0.0387 0.1944 48 0.5898 -0.5547
4 0.6207 -0.7962 9 0.1143 0.5290 14 -0.0363 0.6100 19 0.8740 -1.0229 24 0.1118 0.4936 29 -0.0339 -0.4499 34 0.4838 0.2013 39 0.1359 0.2489 44 -0.7384 0.3091 49 0.4435 -0.0475
5 -0.2145 0.5688 10 0.8595 0.3636 15 -0.6670 0.2215 20 -0.8895 0.5981 25 -1.2990 0.5765 30 -0.2050 -0.3893 35 0.4538 1.1072 40 0.1706 0.4994 45 -0.5625 -0.6132 50 0.4613 -0.2388
Table 3. Comparison between main performance indexes Max. absolute error
Mean absolute error
Mean square error
Neural network
1.2990
0.4172
0.5213
Rough neural network
1.1072
0.3785
0.4574
Frequency of training The total frequency of the network training was 198 times. The total frequency of the training of the two subnetworks was less than 20 times.
Predicting the Free Calcium Oxide Content on the Basis of Rough Sets
333
From the data in Table 2 we can see that as far as training time and errors are concerned, the results obtained from our proposed model on the basis of rough neural networks and data fusion are superior to those obtained from the model using only neural networks as evidenced in reference [10].
5 Conclusions In this study, we first used the model proposed by us on the basis of neural networks and data fusion to predict the content of fCaO of the calcined clinker in the predecomposition kiln used in the new-type dry-process cement production with a daily production size of 2,500 tons of clinker, and then compared our prediction with the actual data, and then carried out a simulated prediction of fCaO of the clinker, thus attaining the goal of guiding cement production and predicting clinker quality. This study built a rough neural networked model, which integrates the advantages of rough sets in knowledge acquisition and those of neural networks in value approximation. Meanwhile, we built a model on the basis of neural networks and data fusion to predict the content of free calcium oxide (fCaO) of the calcined clinker in the rotary kiln. It can be claimed that we have solved a series of problems caused by vast neural network structures and slow training speed due to the increase of technological parameters. We take the recognition results of the neural networks as evidence independent to each other, thus making the training speed and prediction precision increased. Acknowledgments. The authors acknowledge the support from the National Science Foundation of Henan Province of China(No. 0511053100).
References 1. Chen, T., Sun, J.: Aeroengine Gas Path Fault Diagnosis Using Rough Sets and Neural Networks. Journal of Aerospace Power 1, 207–211 (2006) 2. Wang, H., Zhang, X., Yu, J.: Fault Diagnosis Based on Support Vector Machine. Journal of East China University of Science and Technology 2, 179–182 (2004) 3. Liao, R., Liao, Y., Yang, L., et al.: Study on Synthetic Diagnosis Method of Transformer Fault Using Multi-neural Network and Evidence Theory. Proceedings of the CSEE 3, 119– 124 (2006) 4. Li, Y., Jiang, J., Yang, F.: NN-Based D-S Evidence Theory Applied to Multisensor Target Identification. Chinese Journal of Scientific Instrument 6, 652–655 (2001) 5. Chen, T., Sun, J., Hao, Y.: Neural Network and Dempster-Shafter Theory Based Fault Diagnosis for Aeroengine Gas Path. Acta Aeronautica et Astronautica Sinica 6, 1014– 1017 (2006) 6. Pawlak, Z.: Rough Set-Theoretical Aspects of Reasoning about Data, pp. 9–30. Kluwer Academic Publishers, Dordrecht (1991) 7. Hashemi, R.R., Le Blanc, L.A., Rucks, C.T., et al.: A hybrid intelligent system for predicting bank holding structures. European Journal of Operational Research, 390–402 (1998)
334
Y. Shu, S. Yun, and B. Ge
8. Li, M., Zhang, H.: Reserch on the method of neural network modeling based on rough sets theory. ACTA Automatica Sinica 1, 27–33 (2002) 9. He, M., Feng, B., Ma, Z., et al.: Approach to Construct a Rough Neural Networks Based on Rough Set. Journal of Xi’An Jiaotong University 12, 1240–1242 (2004) 10. Yang, L.: Computer Simulation for Calcinations Process of the Precalciner. Wuhan University of Technology (2004)
Classification of Single Trial EEG Based on Cloud Model for Brain-Computer Interfaces Shaobin Li and Chenxi Shao Department of Computer Science, USTC, Hefei, 230027, China
[email protected],
[email protected]
Abstract. A novel feature extraction method based on cloud model for EEG classification is proposed in the paper. The cloud model can transform numerical data to qualitative concept described by a group of characteristics. It provides a new way for concept induction in machine learning. Classification of single trial EEG recorded in a ‘self-paced key typing’ experiment is made through feature extraction based on cloud model and linear regression method for the classification of feature vectors. Results of up to 90% classification accuracy on test data set were obtained. The results show that compared with other methods, the feature extraction and translation method for EEG classification in this paper is simple and effective.
1
Introduction
The creation of an interface between brain and machine is not the new concept, has received a resurgence of concern in these years. Some like ‘brain-machine interface’, ‘brain-computer interface’, ‘brain control’, ‘neural prosthetics’ and so on appear widely in the literature. While, ‘brain-computer interface’ is the generally accepted term. A brain-computer interface (BCI) is a communication system that does not depend on the brain’s normal output pathways of peripheral nerves and muscles [1]. It enable human subjects to control a computer or other devices by means of their brain wave signals. Although still at an exploratory stage, the implications of recent research results are phenomenally exciting. Various experiments have demonstrated the feasibility of control computer or external devices using brain wave signals. Electroencephalogram (EEG) signals can be easily obtained with comparatively inexpensive equipment. Furthermore, using EEG signals recorded from the scalp for BCIs is non-invasive (There is no need for surgery, no invasion to the subject almost). Due to these facts, EEG-based BCI is the best choice for the majority of the BCI research community. Like any communication and control system, an EEG-based BCI has an input, and an output, and a translation algorithm that converts the former to the latter. The central element in each BCI is a translation algorithm that converts electrophysiological input from the user into output that controls external devices [1]. A translation algorithm is a series of computation sequences that extracts features K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 335–343, 2007. c Springer-Verlag Berlin Heidelberg 2007
336
S. Li and C. Shao
from EEG signals, assigns different classes to EEG patterns and generates actual device control commands. Whereas, the classification of EEG signals (especially the single trial EEG) is a challenge for signal processing and machine learning. A large number of cognitive tasks in BCI systems have features that can be observed by EEG, but only after averaging over multiple events. Generally, single trial classification (i.e. discrimination of a single such event, opposed to the activity obtained after averaging over multiple results of the same task) is desired in many BCIs. Therefore, one must be able to detect cognitive activity from a single event of a task, without the benefit of averaging. Here, we present a feature extraction method base on cloud model for EEG classification. The cloud model is a model of transition between qualitative concept and its quantitative description. It can transform numerical data to qualitative concept described by a group of characteristics. The model provides a new way for concept induction in machine learning. We applied cloud model to an EEG data set recorded in the ’self-paced key typing’ experiment for the extraction of task-related features, achieving high classification accuracy with simple linear classification method. The rest of this paper is organized as follows. First, the present state of the art of feature extraction and translation methods in BCI technology is summarized; and the cloud model theory involved in the paper are explained. Then, the experimental procedure and results are given. At the end of the paper, the conclusion is drawn and some discussion on further work is made.
2
State of the Art
Common features extracted in EEG-based BCIs are time domain features (such as slow cortical potentials, P300 potentials)and frequency domain features(such as μ, θ, β rhythms occurring in specific areas of cortex). The methods include the spatial and temporal filters, the scalp electrode type and locations, geometric subspaces, and other signal processing methods used to detect and measure the features. Translation methods involves k-nearest neighbors, linear, quadratic and nonlinear discriminant functions, various models such as logistic regression, Kalman filters, hidden Markov models, etc [2]. The EEG data set acquired by Blankertz et al [3] in the Berlin BCI group that is used in the study consists of a self-paced key typing sessions with no feedback, producing the ERP of a motor stimulus. They have worked on this data set using Fast Fourier Transform (FFT) techniques for their preprocessing and feature selection and used a variety of techniques for the classification, obtaining very high results of up to 96.9%. Other groups have also used this data for their classification work. The Kelly at al [4] has applied an autogressive method approach to the classification problem, achieving accuracies of up to 70.7%, while Garrett et al [5]used genetic algorithms on this data to determine the subset of features of the data that are useful towards the classification process, obtaining 76% accuracy. Yong et al [6] used wavelet decomposition for feature extraction and support vector machines for classification method, achieving 91% accuracy.
Classification of Single Trial EEG Based on Cloud Model
337
At present, most of features extraction and translation methods in BCI technology are concerned with typical signal processing and machine learning techniques. The improvements in BCI technology needs exploration for unconventional methods, such as nonlinear systems analysis and uncertain reasoning, etc.
3
Cloud Model Theory
The cloud model presented in [7] is a model of the uncertainty transition between qualitative concept and its quantitative description (numerical representation). It can transform numerical data to qualitative concept described by a group of characteristics. The cloud model theory provides a new way for concept induction in machine learning. 3.1
Cloud Model
Definition 1. Suppose that U is a quantitative universe of discourse represented by accurate numerical value, and C is a qualitative concept valued on U . If a quantitative value x ∈ U , and x is a random sample of the qualitative concept C. The certainty degree μ(x)(μ(x) ∈ [0, 1]) of x to C is a random number with stable tendency. The μ satisfies that if U → [0, 1], then ∀x ∈ U , there exists x → μ(x). Then the distribution of x on the universe discourse U is called a cloud model. Each x is called a cloud drop [8]. The model describe numerical characters of a certain qualitative concept through three digital characteristics: expected value Ex, entropy En and hyper-entropy He. It combines fuzziness and randomness during qualitative transform together. Ex is the expected value of cloud drops in the universe of discourse, that is, the most representative value for qualitative concept in the universe of discourse. It is the most typical sample of this concept. En is the uncertainty measurement of the qualitative concept. It not only represents fuzziness of the concept but also randomness and their relations. On the one hand, the entropy En reflects discrete degree of cloud drops which can stand for this concept. On the other hand, it reflects the range of value of cloud drops that can be accepted in the universe of discourse. That is, En represents the granularity of the concept. Generally, the larger En is, the higher abstraction the concept is. He is the uncertain degree of entropy En, that is, the entropy of En. Definition 2. The mapping process from quantitative representation to qualitative concept, i.e. a specified realization from three digital characteristics of cloud (Ex, En, He) to a set of cloud drops, is called Forward Cloud Generator (CG) (See Fig.1(a)). While, the Backward Cloud Generator (CG−1 ) is a transition model from quantitative representation to qualitative concept. It can transform numerical data to qualitative concept described by characteristics (Ex, En, He) (See Fig.1(b)).
338
S. Li and C. Shao
Ex
Ex En
CG
drop(x) drop(x)
CG−1
En He
He
(a) Forward Cloud Generator
(b) Backward Cloud Generator
Fig. 1. Forward and Backward Cloud Model Generator
3.2
Backward Cloud Generator
The backward cloud model provides a concept induction method by means of transforming numerical data to characteristics (Ex, En, He) describing qualitative concept. For a set of sample points xi (i = 1, 2, · · · , n) , the CG−1 algorithm can be described as follows: Input: a set of sample points xi , where i = 1, 2, · · · , n. Output: digital characteristics (Ex, En, He) representing qualitative concept. Steps: ¯ = 1 n xi (1) Ex = X i=1 π n 1 ¯ (2) En = 2 × n ni=1 |xi − X| n 1 ¯ 2 (3) S 2 = n−1 i=1 (xi − X) 2 2 (4) He = S − En
4 4.1
Method Data Acquisition
The EEG data set (from [3]) was recorded from a normal subject during a nofeedback session. The subject sat in a normal chair, relaxed arms resting on the table, fingers in the standard typing position at the computer keyboard. The task was to press with the index and little fingers (index fingers at ‘f’, ‘j’ and little fingers at ‘a’, ‘;’) the corresponding keys in a self-chosen order and timing (‘self-paced key typing’). The execution of the typing is voluntary and without explicit external sensory input. The experiment consisted of 3 sessions of 6 minutes each, preceded and postceded by 60 seconds relaxing phase. All sessions were conducted on the same day with some minutes break in-between. A total of 516 keystrokes was done at an average speed of 1 key every 2.1 seconds. 3 events have been rejected due to heavy measurement artifacts. Brain activity was measured with 27 Ag/AgCl electrodes at positions of the extended international 10-20 system, 21 mounted over motor and somatosensory cortex, 5 frontal and one occipital, referenced to nasion, sampled at 1000 Hz using a band-pass filter from 0.05 to 200 Hz.
Classification of Single Trial EEG Based on Cloud Model
339
The supplied data consists of 27 EEG channels in 516 single trials. Windows 1500 ms long were cut out of the continuous raw signals each ending at 120 ms before the respective keystroke, as from that point on there is EMG activity in an significant number of trials, which produce serious artifacts in the data. The signals were down-sampled to 100Hz by picking every 10th data point. 100 trials equally spaced over the whole experiment were defined to be the test set, leaving 413 labeled trials for training. The 3 rejected trials were labeled ‘nan’ (not a number). 4.2
Background
Physiologically meaningful signal features can be extracted from various frequency bands of recorded EEG. The self-paced key typing experiment is based on Laterized Readiness Potential(LRP), also called Bereitschaftspotential(BP). LRP is a special case of Slow Cortical Potentials(SCP) and appears during movement preparation. They are slow negative EEG shifts that develop over the activated motor cortex prior to the actual movement onset for the duration of approximately one second. It is assumed to reflect mainly the growing neuronal activation in a large ensemble of pyramidal cells. Previous studies [9,10] have shown that in most subjects the spatial scalp distribution of the averaged BP correlated consistently with the moving hand, where the focus of brain activity is located contra-laterally to the performing hand [11]. LRP is a time-locked response to the movement event, the θ (0.5-3.5Hz) frequency band is associated with the movement preparation [12]. 4.3
Feature Extraction Based on Cloud Model
The signals used in the experiment are of 100Hz frequency(having been subsampled by simply taking every 10th data point from initially acquired data with a sampling frequency 1000Hz). Only two channels C3 and C4 out of full available set of 27 are used, as they were found to be the most representative positions in the topology of cortex for the left and right hemispherical primary motor cortex, consistently correlating with the performing (right and left) hand. Windows of 1280 time points(from -1400ms to -120ms) are cut from the EEG samples. This provides us with epochs of 128 data points; a sample window is shown in Fig.2(a). To emphasize the late signal content, the data points of the sample window are multiplied by a one-sided cosine function (shown in Fig.2(b)) i·π ))(i = 0, 1, · · · , 127). w[i] = 12 (1 − cos( 128 According to [11], a Fast Fourier Transform (FFT) filtering technique is then applied to the windowed signal, and we discard the baseline DC and all coefficients above 4Hz, such that only the θ rhythm carrying the slow wave potential remains in the signal spectrum. Transforming the low-pass frequency spectrum back into the time domain generates the smoothed signal with inverse FFT(IFFT). The the last 200ms of the smoothed signal (See Fig.3) are subsampled again at 20Hz, then features (Ex, En, He) are extracted according to the CG−1 algorithm described in the subsection 3.2.
340
S. Li and C. Shao
raw signal
windowed signal
30
30
20
20
10
10 potential [μV]
potential [μV]
windowed signal window function
0
0
−10
−10
−20
−20
−30 −1400
−1200
−1000
−800 time [ms]
−600
−400
−30 −1400
−200
−1200
(a) Raw Signal
−1000
−800 time [ms]
−600
−400
−200
(b) Windowed Signal
Fig. 2. Raw and Windowed Signal smoothed signal
smoothed signal
20
20
10
10 potential [μV]
30
potential [μV]
30
0
0
−10
−10
−20
−20
−30 −1400
−1200
−1000
−800 time [ms]
−600
−400
−200
−30 −1400
−1200
(a)
−1000
−800 time [ms]
−600
−400
−200
(b) Fig. 3. Smoothed Signal
The features (Ex, En, He) for C3 and C4 channels (totally six features, each channel has three features Ex, En and He) extracted based on cloud model are well-suited for classification; the Fig.4 shows that features Ex, En and He of EEG signals for C3 and C4 channels could be linearly discriminated on the whole. 4.4
Classification Via Linear Regression
As shown above, the left and right finger movements can be discriminated in principle with the features extracted based on cloud model. Here, least squared error linear regression is employed for the classification of EEG signals. The problem of classification can be described as follows. To predict the labels of the test data set, we used a simple linear classifier, a parameterized mapping f (x; w) from input vectors Rd to labels{−1, 1} to solve the binary classification problem y = f (x; w) = sign(w0 + w1 x1 + · · · + wd xd )
Classification of Single Trial EEG Based on Cloud Model Ex 30 C (left samples) 3
C3(right samples) C4(left samples)
25
C4(right samples)
20
15
10
5
0
−5
−10
0
50
100
150
200
250
(a) Ex En 9 C (left samples) 3
C3(right samples) C4(left samples)
8
C4(right samples)
7
6
5
4
3
2
1
0
0
50
100
150
200
250
(b) En He 3 C (left samples) 3
C3(right samples) C4(left samples) C4(right samples) 2.5
2
1.5
1
0.5
0
0
50
100
150
200
(c) He
Fig. 4. Features (Ex, En, He) for Channels C3 and C4
250
341
342
S. Li and C. Shao
where x (x∈Rd ) is a input vector and y ∈ {−1, 1}, w = [w0 , w1 , · · · , wd ] are parameters we need to set. Given a training set of instance-label pairs (xi , yi )(i = 1, · · · , n), where xi ∈ Rd and yi ∈ {−1, 1}. There is no prior knowledge on the probability distribution of the data available, then a typical objective is to minimize the empirical risk function 1 n Jn (w) = (yi − f (xi ; w))2 i=1 n in terms of squared error on n training samples. The solution for parameters w expressed in matrix notation is W = (X T X)−1 X T Y For classification into left and right hand movement, the training set size used to train linear regression function is 413, and the classifier is tested on a set of 100 trials. 4.5
Results
In the experiment, we apply the method presented above section to EEG data set and obtain accuracy up to 90% on the test data set. Compared with other methods [3,4,5,6], features extraction and translation method for single trial EEG classification given here is simple and novel, and the classification accuracy on the test trials is in the high rank. The corresponding results show that the feature extraction method based on cloud model for EEG classification is effective.
5
Conclusions and Future Work
In this paper, we propose a new feature extraction based on cloud model for EEG classification. The cloud model is a model of transition between qualitative concept and its quantitative description. It can transform numerical data to qualitative concept described by a group of characteristics. This model provides a new way for concept induction in machine learning. Hence, it is well-suited for feature extraction in some applications. Classification of single trial EEG recorded in a self-paced key typing experiment is made through this method for features extraction and simple linear regression method for the classification of feature vectors. The results show that compared with other methods, the feature extraction and translation method in this paper is simple and effective. Further work is underway to refine the method, using more data set and exploring some features selection methods to refine the feature extraction process in order to enable the feature extraction and translation methods are fit for online EEG analysis.
Classification of Single Trial EEG Based on Cloud Model
343
References 1. Wolpaw, J.R., Birbaumer, N., Heetderks, W.J., McFarland, D.J., Peckham, P.H., Schalk, G., Donchin, E., Quatrano, L.A., Robinson, C.J., Vaughan, T.M.: BrainComputer Interface Technology: A Review of the First International Meeting. IEEE Transactions on Rehabilitation Engineering 8(2), 164–173 (2000) 2. Taxonomy of Feature Extraction and Translation Methods for BCI (2005), http://www.cs.colostate.edu/egg/taxonomy.html 3. Blankertz, B., Curio, G., M¨ uller, K.-R.: Classifying Single Trial EEG: Towards Brain Computer Interfacing. In: Advances in Neural Information Processing System (NIPS 01), pp. 157–164. MIT Press, Cambridge, MA (2002) 4. Kelly, S., Burke, D., de Chazal, P., Reilly, R.: Parametric Models and Spectral analysis for Classification in Brain-Computer Interfaces. In: Proceedings of 14th International Conference on Digital Signal Processing, pp. 307–310 (2002) 5. Garrett, D., Peterson, D.A., Anderson, C.W., Thautt, M.H.: Comparison of Linear, Nonlinear, and Feature Selection Methods for EEG Signal Classification. IEEE Transactions on Neural Systems and Rehabilitation Engineering 11(2), 141–144 (2003) 6. Yong, A., Silvestre, G.C.M., Hurley, N.J.: Single-trial EEG Classification for BrainComputer Interface using Wavelet Decomposition. In: Proceeding of 13th European Signal Processing Conference (to appear) 7. Deyi, L., Haijun, M., Xuemei, S.: Membership Clouds and Membership Cloud Generators. Journal of Computer Research and Development 42(8), 32–41 (1995) 8. Deyi, L., Yi, D.: Artificial Intelligence with Uncertainty. National Defense Industry Press, Beijing (2005) 9. Land, W., Zilch, O., Koska, C., Lindinger, G., Deecke, L.: Negative Cortical DC Shifts preceding and accompanying simple and complex sequential movements. Experiments in Brain Research 74, 99–104 (1989) 10. Cui, R.Q., Hutier, D., Land, W., Deecke, L.: Neuroimage of voluntary movement: topography of the Nereitschaftspotential, a 64-channel DC current source density study. NuroImage 9, 124–134 (1999) 11. Kepki, R., Curio, G., Blankertz, B., M¨ uller, K.-R.: Berlin Brain-Computer Interface-The HCI communication channel for discovery. International Journal of Human-Computer Studies 65, 460–477 (2007) 12. Krepki, R.: Brain-Computer Interfaces: Design and Implementation of an Online BCI System of the Control in Gaming Applications and Virtual Limbs. PhD thesis, Technische Universit¨ at Berlin, Fakult¨ at IV – Elektrotechnik und Informatik (thesis) (2004)
The Modified Self-organizing Fuzzy Neural Network Model for Adaptability Evaluation Zuohua Miao1, Hong Xu1, and Xianhua Wang2 1
School of Resource & Environment Engineering, Wuhan University of Science and Technology, 430081 Wuhan, China {Zuohua Miao,Hong Xu,whmzh}@hotmail.com 2 Faculty of Engineering , China University of Geosciences, 430062 Wuhan, China
Abstract. The author proposed a novel approach for evolving the architecture of a multi-layer neural network based on neural network and fuzzy logic technologies. The model is front-network which comprised with five layers architecture which composed of dynamic inference of fuzzy rules where the consequent sub-models are implemented by recurrent neural networks with internal feedback paths and dynamic neuron synapses. An optimal learning scheme with the evaluation guide line which error data embed is applied for training of LFDFNN models. The results of experiment demonstrate that new model have superior performance.
1 Introduction Experiential exponential method and fuzzy synthesize evaluation are traditional adaptability evaluation models with symbolic logic reasoning based on knowledge. Artificial neural networks model is stylebook learning mechanism method based on self-learning. At the present time, these two type models have been widely used in various fields while faced problems including the first type models deeply affected by the reflection of human being but lack self-learning capability and the second models consequence process much more complex and not embed the knowledge of expert. In this paper, the author presented a modify approach based on neural network and fuzzy logic theory for adaptability evaluation. The new model allows the incorporation of heuristics and deep knowledge to exploit the best characteristics of each.
2 Model and Algorithm of FNN The model is front-network and with five layer consists of a number of simple processing units shown in Fig.1. Layer 1 is the input layer and Layer 2 implements fuzzy membership functions. Units in Layer 2 may be compound units so as to implement a desired membership function. The network structure of fuzzy membership function is shown in Fig 2. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 344–353, 2007. © Springer-Verlag Berlin Heidelberg 2007
The Modified Self-organizing Fuzzy Neural Network Model
345
Fig. 1. Modified fuzzy neural network
Fig. 2. Fuzzy membership function structure
The fuzzy membership function in model is such as the following equation:
S ij(1) = −
( xi − mij ) 2
δ ij2
(1)
y ij(1) = exp( S ij(1) ) Where:
Sij(1) describe the nerve cell membership function input data of the ith factor
and jth level, function and
yij(1) is the output data of nerve cell, mij is the center of membership
δ ij2
is Square Error. In order to update membership function, we need to
find the change in parameters i.e. mean values and standard deviations that define membership functions. Again using gradient descent we made the following:
Δmij = − β Where
∂E ∂mij
(2)
346
Z. Miao, H. Xu, and X. Wang
∂E ∂E ∂O j = ∂mij ∂O j ∂mij
(3)
n ∂E − = ∑ δ i wij ∂O j i =1
Layer1. The number of units in this layer is equal to the number of input features. Units in this layer correspond to input features, and they just transmit the input vector to the next layer. Laye2. This layer implements membership functions. We have used five term variables very-low; low; medium; high; very-high for each input feature value. The number of units in layer2 is five times the number of units in Layer1. Layer3. The membership values are normalized in this layer by the following equation:
μ ij ε =
μ ij M
∑μ j =1
(4) ij
Layer4. This layer is normalized value is multiplied by Sij and equation is described as following:
ψ ij = μ ijε ( xi ) ⋅ S ij
(5)
Where: ψ ij Corresponds to the input variable of the MFNN network. Layer5. This layer is output of each layer till layer5 is inferred by the weighted average method: M
f i ( xi ) = ∑ψ ij = ∑ μ ijε ⋅ S ij
(6)
j =1
α i , i th M , Learning rate ς W
Where: i th target output data output data pairs
ized valuesψ ij , Output at each level
# ij
actual output data
αi ,
Number of input– ε
and ς Wij , Membership values μij , Normal-
f i ( xi ) , Mapping rule Sij , Normalization proce-
dure N . 2.1 Fuzzy Inference and Learning Algorithm The membership functions are initially determined based on the minimum and maximum values for input features. In the model the variable fuzzificationed by membership function and fuzzy inference part gained fuzzy rule set. The author employ fuzzy
The Modified Self-organizing Fuzzy Neural Network Model
347
multi-layer and multi-factor decision-making model as the fuzzy inference rule set. The learning algorithm is described below. Step 1: Present a continuous values input vector obtain the output vector
X = ( x1 , x2 ,L , xn ) to layer1 and T
O = ( o1 , o2 ,L on ) at layer layer5. In order to obtain the T
output vector O , calculations are done layer by layer from Layer1 to Layer5 Step 2: Calculate change in weights. In order to calculate change in weights the output vector O is compared with desired output vector or target vector D and the mean squared error is then propagated backward. Step3: Update weights and membership functions. The update procedure is implemented in two phases. During the first phase we update weights and consider membership functions as constants, where as during the second phase update membership functions and keep the updated weights unchanged. Step4: Obtain the mean squared error at Layer3 by the following equation:
E=
1 m ∑ (Oi − Di ) 2 2 i =1
If the error is greater than some minimum value
(7)
ε min then repeat steps 2 through step
4. The defuzzification method used here is based on the center of gravity and a firing parameter is adopted described as following:
∑ (μ ) ⋅ Y = ∑ (μ ) g
K
yi
k =1 K
k =1
Where:
j ,k
k
(8)
g
i ,k
Yk is the kernel or the modal value of the membership function for class k
and is identified during training, g ∈ [0, ∞ ) is a power parameter responsible for “firing the rule’’, which can be properly adjusted to obtain a more efficient performance for the defuzzification method. This parameter is also optimized during training. 2.2 The Weight of Model The weights connecting the elements in the different layers are to be updated for the learning and fuzzy inference. The weight based on a widely known scheme and is illustrated by the following equations:
Wij# (updated ) = W # (old ) + ΔW # Wij (updated ) = W (old ) + ΔW
S ij = Wij# + Wij ⋅ xi
(9)
348
Z. Miao, H. Xu, and X. Wang
These are calculated as following:
Δ W = −ς W ij ⋅ (
∂ Ai ∂A ∂ai ) = −ς W ij ⋅ i ⋅ ⋅ ∂ W ij ∂ a i ∂f i ( x i )
(10)
∂ f i ( x i ) ∂ a ij ∂ Uy ij ⋅ ⋅ ∂ Uy ij ∂ W ij ∂ a ij
3 Training Experiment of Model Author presented description on training experiment by applying the novel model for land adaptability evaluation date of one area. Table1 described the evaluation guide line. The training of new model is summarized by the following steps. Table 1. The standard evaluation guide line Factor Weight Before Traing Soil Texture 0.2 Irrigation Condi0.3 tion Tilth Thickness 0.1 Landform Slope 0.1 Nitrogen Ration 0.04 Phosphorus Ration 0.03 Kalium Ration 0.03 Soil Texture 0.2
Weight After Traing >0.75
Expect Value 0.75~0.55
>0.75
0.75~0.5
>0.8 0~0.2 >0.75 >0.83 >0.8 >0.75
0.8~0.6 0.2~0.32 0.75~0.5 0.83~0.67 0.8~0.56 0.75~0.55
3.1 Training of Fuzzy Membership Function For validate the availability of fuzzy membership function, the author add error data based on the standard evaluation guideline shown as Table2. Table 2. The evaluation guide line with error data for membership function training Factor Organic Matter Soil Texture Irrigation Condition Tilth Thickness Landform Slope Nitrogen Ration Phosphorus Ration Kalium Ration
Weight 0.2 0.2
Grade 1 >0.75 >0.75
Grade 2 0.75~0.5 0.75~0.55
Grade 3 (0.5~0.375) 0.55~0.45
Grade4 (0.375~0) 0.45~0
0.3
>0.75
0.75~0.5
0.5~0.25
0.25~0
0.1 0.1 0.04 0.03 0.03
(>0.72) 0~0.2 >0.75 >0.83 >0.8
(0.72~0.6) 0.2~0.32 0.75~0.5 0.83~0.67 (0.8~0.4)
(0.6~0.32) 0.32~0.6 0.5~0.25 0.67~0.33 (0.4~0.24)
(0.32~0) >0.6 0.25~0 0.33~0 (0.24~0)
At fuzzy membership function training, Only rectify the parameter of fuzzy membership function while the inference layer only transfer error data and not rectify the parameter of membership function. By experiment the model will be surge while learning ration of membership function is setting extremely large. When the learning
The Modified Self-organizing Fuzzy Neural Network Model
349
ration of membership function is to be set very small for example 0.001~0.0001, the model will get perfectly constringent. The parameter of fuzzy membership training shown in Table3. Table 3. The parameter table of membership function training Learning Parameter Membership Function
Learning Ration
0.0001
Momentum Coefficient
0.05
Learning Ration Fuzzy Reference
Constringency Condition Minimum Mean 1 .0 × 10 −6 Square Error Maximal Train1000 ing Number Minimum Error 0 Variational Ration
Momentum Coefficient
Fuzzy membership function trained at stylebook data, the error is 1.8 × 10−6 . The training experiment shows the model has preferable ability at data inner and generalization. The boundary of grade for the evaluation guide line with error data data after trained is shown in Table4. Table 4. The trained result of fuzzy membership function Factor Before Trained Organic After Matter Trained Expect Value Before Trained Soil After Texture Trained Expect Value Before Trained After Kalium Trained Ration Expect Value
Grade 1
Grade 2
Grade 3
Grade4
>0.75
0.75~0.5
0.5~0.375
0.375~0
>0.75
0.75~0.52
0.5~0.28
0.28~0
>0.75
0.75~0.5
0.5~0.25
0.25~0
>0.72
0.72~0.6
0.6~0.32
0.32~0
>0.8
0.8~0.59
0.59~0.36
0.36~0
>0.8
0.8~0.6
0.6~0.4
0.4~0
>0.8
0.8~0.4
0.56~0.24
0.24~0
>0.79
0.79~0.45
0.45~0.25
0.25~0
>0.75
0.75~0.5
0.5~0.375
0.375~0
3.2 Training of Fuzzy Inference Layer The training for inference layer in MFNN model is validate the effect of inference algorithm. Based on the data of standard evaluation guide line, add error data into Organic Matter, Soil Texture and Kalium Ration. The parameter of model , process and weight before and after training shown in Table5,Fig 3 and Table 6.
350
Z. Miao, H. Xu, and X. Wang Table 5. The parameter table for fuzzy inference layer training Learning Parameter Learning Ration
Membership Function
Momentum Coefficient
Fuzzy Reference
Learning Ration
0.01
Momentum Coefficient
0.08
Constringency Condition Minimum Mean 1.0E-6 Square Error Maximal Train1000 ing Number Minimum Error 0 Variational
Fig. 3. The descend condition of mean square error during training for fuzzy inference layer Table 6. The result of weight befor and after traing for fuzzy inference layer Factor Organic Matter Soil Texture Irrigation Condition Tilth Thickness Landform Slope Nitrogen Ration Phosphorus Ration Kalium Ration
Weight Before Traing 0.1 0.1 0.3 0.2 0.1 0.08 0.06 0.06
Weight After Traing 0.197909 0.199077 0.300801 0.100999 0.100164 0.041144 0.030445 0.029712
Expect Value 0.2 0.2 0.3 0.1 0.1 0.04 0.03 0.03
3.3 Training of Entire Model As the above sections training experiment, fuzzy membership function and fuzzy inference layer can obtain prominent representation. This section, the author made teht training work for entire mode with membership function parameters and factors weight exist error for validating the astringency and availability of model. In training process, the author add error data into the weight and classification boundary of organic matter factor, tilth thickness factor and kalium ration factor. Table7. shown the new guide line with error data.
The Modified Self-organizing Fuzzy Neural Network Model
351
Table 7. The evaluation guide line with error data for entire model training Factor Organic Matter Soil Texture Irrigation Condition Tilth Thickness Landform Slope Nitrogen Ration Phosphorus Ration Kalium Ration
Weight (0.1) (0.1)
Grade 1 >0.75 >0.75
Grade 2 0.75~0.5 0.75~0.55
Grade 3 (0.5~0.375) 0.55~0.45
Grade4 (0.375~0) 0.45~0
0.3
>0.75
0.75~0.5
0.5~0.25
0.25~0
(0.2) 0.1 (0.08)
(>0.72) 0~0.2 >0.75
(0.72~0.6) 0.2~0.32 0.75~0.5
(0.6~0.32) 0.32~0.6 0.5~0.25
(0.32~0) >0.6 0.25~0
(0.06)
>0.83
0.83~0.67
0.67~0.33
0.33~0
(0.06)
>0.8
(0.8~0.4)
(0.4~0.24)
(0.24~0)
After input the evaluation guide line with error data showed in table2 and construct the original state of model,the training of entire network of model is to be commenced. The parameter of training is showed in Table8.. Mean square error is 0.98E-8 by the trained number is 5331. Fig 4.describe the descend condition of mean square error during training process. From figure 5, the descend speed of mean square error is gradually decrement. In order to gain speediness constringency during practicality implement, the author should setup appropriated minimum mean square error. The error of trained MFNN model is 2.8 × 10−8 , that shown the effective generalization capability. The trained result of evaluation guide line with error data is shown in Table 8.
Fig. 4. The descend condition of mean square error during training for entire model Table 8. The parameter table of entrie model training Learning Parameter Membership Function
Fuzzy Reference
Learning Ration Momentum Coefficient
0.0001 0.05
Learning Ration
0.01
Momentum Coefficient
0.08
Constringency Condition Minimum Mean 1.0E-8 Square Error Maximal Train6000 ing Number Minimum Error 0 Variational
352
Z. Miao, H. Xu, and X. Wang
4 Experiments and Conclusion Experimental result by applied the novel mode is shown in Fig5. From Fig5, the map of land adaptability evaluation by applying traditional method has lots of stylebook data located at wrong grade area. The error condition located at the boundary if grade and the continuum grade area have sporadic stylebook data. The error condition presents the transcendental experience and knowledge has un-integrity and un-accuracy. From Fig6 , the map of land adaptability evaluation by applying MFNN method has remedy the error condition, the evaluation result and stylebook data retain upper coherence.
(a)
(b
Fig. 5. The map of land adaptability evaluation by traditional model and MFNN method
In the paper, the author proposed a novel approach for evolving the architecture of a multi-layer neural network based on neural network and fuzzy logic technologies. In order to validate the astringency and availability of model, the author made training experiment of fuzzy membership function, fuzzy inference layer and entire mode by using guide line embedding error data. The final experiment results reveal that the novel model exhibits superior performance and precision.
References 1. Promcharoen, S., Rangsanseri, Y., Ongsomwang, S., Jaruppat, J.: Supervised Classification of Multispectral Satellite Images using Fuzzy Logic and Neural Network, Hong Kong, China (1999) 2. Kulkarni, A.D.: Neural-fuzzy models for multispectral image analysis. International Journal of Applied Intelligence 8, 173–187 (1998) 3. Oh, S., Roh, S., Kim, Y.: Design of Genetic Fuzzy Set-Based Polynomial Neural Networks with the Aid of Information Granulation. In: Wang, J., Liao, X.-F., Yi, Z. (eds.) ISNN 2005. LNCS, vol. 3496, pp. 428–433. Springer, Heidelberg (2005)
The Modified Self-organizing Fuzzy Neural Network Model
353
4. Gedeon, T.D.: A Hybrid Bidirectional Network. In: Proceedings of 2nd International Symposium of Hungarian Researchers on Computational Intelligence, Budapest, Hungary, pp.115–124 (November 2001) 5. Oh, S.K., Pedrycz, W., Park, B.J.: Self-organizing neurofuzzy networks based on evolutionary fuzzy granulation. IEEE Transactions on Systems, Man, and Cybernetics – Part A: System and Humans 33 (2003) 6. Melin, P., Mancilla, A., Lopez, M., Mendoza, O.: A hybrid modular neural network architecture with fuzzy Sugeno integration for time series forecasting. Applied Soft Computing 7, 1217–1226 (2007) 7. Castillo, O., Melin, P.: Automated mathematical modelling for financial time series prediction combining fuzzy logic and fractal theory, Soft Computing for Financial Engineering, pp. 93–106. Springer, Germany (1999) 8. Jakubeka, S., Keuth, N.: A local neuro-fuzzy network for high-dimensional models and optimization. Engineering Applications of Artificial Intelligence 19, 705–717 (2006) 9. McBratney, A.B., Odeh, I.O.A.: Application of fuzzy sets in soil science: Fuzzy logic, fuzzy measurement and fuzzy decisions. Geoderma 77, 85–113 (1997) 10. Schdmit, A., Bandar, Z.: A modular neural network architecture with additional generalization abilities for high dimensional input vectors. In: Proceedings of ICANNGA’97, Norwich, England (1997) 11. Becerikli, Y., Konar, A.F., Samad, T.: Intelligent optimal control with dynamic neural networks. Neural Networks 16, 251–259 (2003) 12. Yi, S., et al.: Global optimization for NN training. IEEE computer 3, 45–54 (1996) 13. Yaolin, L., Limin, J.: Model of Land Suitability Evaluation Based on Computational Intelligence. Geomatics and Information Science of Wuhan University 30(04), 283–288 (2005) 14. Guler, I., Ubeyli, E.D.: A mixture of experts network structure for modeling Doppler ultrasound blood flow-signals. Computers in Biology and Medicine 35, 565–582 (2004b) 15. Lucek, P., Hanke, J., Reich, J., Solla, S.A., Ott, J.: Multi-locus nonparametric linkage analysis of complex trait loci with neural networks. Hum. Hered. 48, 275–284 (1998) 16. Park, B.J., Pedrycz, W., Oh, S.K.: Fuzzy Polynomial Neural Networks: Hybrid Architectures of Fuzzy Modeling. IEEE Transaction on Fuzzy Systems 10, 607–621 (2002)
Predicting Functional Protein-Protein Interactions Based on Computational Methods Luwen Zhang and Wu Zhang School of Computer Science and Technology, Shanghai University, Shanghai, 200072, China
[email protected]
Abstract. Protein-protein interactions play a crucial role in the cellular process. In recent years, yet new experimental techniques have been developed to discover the protein-protein interaction networks, the accuracy and coverage has still proven to be limited. Computational approaches come into being essential both to assist in the design and validation of experimental studies and for the prediction of interacted proteins. This paper presents a survey of the major computational methods for detecting protein-protein interactions, and expatiates on their key contribution by introducing the experimental methods, the typical computational methods and the improvement over them in turn.
1 Introduction Theoretically the protein-protein interactions can be divided along the following hierarchical classification [1]: 1. direct physical interactions between the proteins 2. indirect physical interactions (i.e., the proteins contained in the same protein complex) 3. the proteins are part of a single metabolic pathway 4. the proteins take part in the same cellular process 5. pairs of proteins of which at least one is hypothetical 6. proteins with known functions between which no functional interactions are known Many developed experimental methods have facilitated the large scale proteinprotein interactions analysis, though the number of revealed and undoubted interactions is still limited comparing to the available protein sequences of different organisms [2-4]. Among these methods, affinity chromatography, two-hybrid assay, co-purification, coimmunoprecipitation, and cross-linking are used to purify proteins that are associated physically with one another [5]. However, evident limitations of experimental techniques are unavoidable for detecting protein-protein interactions. In parallel, several approaches [5-8] have been developed in effort to the computational prediction of protein–protein interactions in an attempt to assisting with experimental methods. A novel approach is to infer protein-protein interactions based on K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 354–363, 2007. © Springer-Verlag Berlin Heidelberg 2007
Predicting Functional Protein-Protein Interactions Based on Computational Methods
355
domain-domain interactions [11, 12]. Another focuses on finding and analyzing subsequences affecting protein–protein interactions from raw protein sequences [9]. The technique of analyzing the physicochemical properties or tertiary structure of proteins has also been applied [10]. The remainder of this paper is organized as follows. On presenting the reformative methods, computational methods are described from the data handling pattern viewpoint in terms of retrieving and accessing metadata, clustering data, classifying data and prediction, respectively. And then a brief assessment is given on the prediction techniques and some comparable experimental results are listed. Finally, we conclude and present further research works.
2 Methods of the Protein-Protein Interactions Prediction 2.1 Experimental Methods on Protein-Protein Interactions Given their biological importance, the development of methods to detect and characterize protein–protein interactions is a major theme of functional genomics and proteomics. At present, two main types of experimental methods are used to discover protein-protein interactions: the yeast two-hybrid screen (Y2H) [13], and the combination of large-scale tandem affinity purification with mass spectrometry (MS) to detect and characterize multi-protein complexes [14]. First applied to yeast, these methods revealed the dense network of interactions linking proteins in the cell, but they have some limitations. For instance, some proteins are “auto activators”, meaning they can activate the reporter gene without an interactions partner, these reported interactions are non-specific and their reliability is probably low. In the protein purification process the shortcomings has been figured by Yu et al. that different experimental condition may impact the accuracy of detecting interacted proteins, and loosely connected proteins may get lost[15]. As discussed above, these limitations among experimental methods have prompted keen interest in the development of computational methods for inferring protein– protein interactions. In what follows, we will describe some typical and improved computational methods on PPI in details. 2.2 Typical Computational Methods on Protein-Protein Interactions Prediction Major algorithms like the Association Method [16] and the Maximum likelihood Estimation [5] have been used extensively to detect protein-protein interactions, even though those techniques do not show obvious competitive advantages in both prediction quality and computational efficiency. Nevertheless, they suggest crucial basic algorithms for the other improved methods. 1. Association Method (AM) The association-based method mentioned by Sprinzak et al. demonstrates that characteristic sequence-signatures can be used as identifiers to predict interacting proteins, and defines an interactions probability of domain pair as following:
356
L. Zhang and W. Zhang
(d m , d n ) Pr( d m , d n ) = I mn
N mn
(1)
Imn is the number of interacting protein pairs that contain domain pair (dm, dn), and Nmn is the total number of protein pairs that contain domain pair (dm, dn). To extend the Association Method, then a simple method ASNM (association numerical method) [17] are proposed to infer the ratios of protein-protein interactions. From the computational point of view, a problem is defined as MAX SNP-hard to maximize correctly classified examples of protein-protein interactions, and ASNM outperformed other existing similar methods in terms of predicting accuracy.
Pr( d m , d n ) =
Where
ρij =
οij Ζ
∑ {P
ij
| D mn ∈ Pij } ρ ij N mn
(2)
Defines a ratio interactions between one protein pair Pi and Pj. Oij is
the number of experimental confirmed interactions between Pi and Pj. Z is the total number of the experiments. 2. Maximum Likelihood Estimation (MLE) Domain-domain interactions are independent and two proteins may interact if and only if at least one pair of domains within them interacts, the maximum likelihood estimation (MLE) method based on the two assumptions has been applied to estimate the probabilities of interactions between every pair of domains. Deng et al. measured the accuracies of the predictions at the protein level and showed robustness in analyzing incomplete data set. The probability of a potential interaction between a protein pair (Pi, Pj) is
Pr ( Pij = 1) = 1 − where
∏
( d m , d n )∈( Pi , Pj )
(1 − λmn ),
(3)
λmn is the probability of the interactions between dm and dn. This method lets
Pij = 1 if protein Pi and protein Pj interact with each other, and otherwise Pij = 0. 2.3 Improved Protein-Protein Interaction Prediction Methods Generally, two types of protein-protein relations are taken into consideration though there exists seven classes in detailed hierarchical classification: one is physical protein–protein association and the other is functional links (16) mostly detected by comparative genomics methods which make various biological assumptions. Using different data sources, the major methods in the first type and are based on different assumptions, according to which they can be roughly grouped into three categories: mapping interactions between two different organisms (18), inferring protein-protein interactions based on domain-domain interactions, as well as, utilizing the
Predicting Functional Protein-Protein Interactions Based on Computational Methods
357
physicochemical properties of amino acids to predict interactions. As for the functional links between proteins, the methods include gene neighbor (19), gene fusion (7, 20), and phylogenetic profile (21). From key points of the research view, firstly many researchers focus on the sequence and genome analysis, and get significant results in the research [22]. However, prediction approaches based on sequence and genome analysis do not provide fully reliable answers regarding the presence or absence of putative interactions. In these cases, looking at the structural details of the putative interactions using an experimentally determined or even a predicted structure can be of help. This leads to another type of interactions prediction methods listed as figure 1.
Fig. 1. The computational approaches on the protein-protein interactions
According to previous research, most prediction methods are interested in protein data integrity and data maintain which both utilize data to construct the PPI prediction network. Therefore, in this paper we will introduce the improved prediction method in the term of handling patterns of the protein data.
.
1 Retrieving and Accessing the Available Metadata To expedite the progress of functional bioinformatics, it is important to develop scalable learning methods to process large amounts of biomedical data efficiently. A promising approach for making such huge amounts of information manageable and easily accessible is established to integrate the heterogeneous distributed metadata including protein-protein interactions database, and biomedical literature. So Data Grid appears as a significant approach that enables coordinated sharing of heterogeneous distributed storage resources and digital entities based on local and global policies across administrative domains in a virtual enterprise [23].
358
L. Zhang and W. Zhang
BioGRID is a typical freely accessible database of physical and genetic interactions [24]. It offers an internally hyper-linked web interface which allows for rapid search and retrieval of interactions data. Full or user-defined datasets are freely downloadable as tab-delimited text files and PSI-MI XML. Our research group also develops a data grid orient the biologic data to provide a data platform, which integrates the heterogeneous distributed biologic data resources {SWISS-PROT (http://www.ebi.ac.uk/swissprot/), PIR (Protein Identification Resource, http://pir.georgetown.edu/), HPRD (Human Protein Reference Database, http://www.hprd.org/index_html), etc.} seamlessly and easily, and allows access to authorized users via a number of protocols and interfaces including ODBC, JDBC and SOAP. On this BD-Grid all the metadata are classified into the related catalog, so the users can find their interest data with ease. We also launch a friendly web portal, through which the users can browse the interest data without regard to what the data is, where it resides, or how to access it, edit their own SQL to search the required data result, and integrate their own data resource into this platform. Data retrieving has been widely used to mine interactions information from scientific literature [25]. In recent research, Mamitsuka, H. [26] used a stochastic model for combining the data of protein-protein interactions with existing knowledge of proteins, and utilized the class of proteins as a latent variable in the stochastic model.
.
2 Clustering the Data Clustering is a process by which a data set is grouped into clusters so that similarity among group members is greater than that of those among groups (clusters). A biomedical literature data mining system SPIE-DM (Scalable and Portable Information Extraction and Data Mining) have been designed to extract and mine the protein-protein interactions network from biomedical literature in [27]. SPIE-DM consists of two phases: a scalable and portable method (SPIE) and a novel clustering method SFCluster. Whereas the former was developed to extract the protein-protein interactions which formed a scale-free network graph from the biomedical literature, the latter was used to mine the protein-protein interactions network. MSSC (maximum-specificity set cover procedure) shows a new way of integrating protein interactions and domain data, ultimately allowing us to predict previously unknown protein interactions. Compared to MLE, MSSC is an attractive alternative, of comparable quality but faster execution time. Meanwhile, for the latest couple of years, domain based protein interactions prediction methods have been extensively studied [5, 12], the problems of which can be addressed as low accuracy in prediction and seldom provision of an impact means to discern which protein pair is more probable to interact with each other than others in multiple protein pairs [28]. Chen, X. et al. [29] introduces a domain-based random forest of decision trees to infer protein interactions. The proposed method is capable of exploring all possible domain interactions and predicting based on domain data sets. Han, D. et al. [30] focuses on a possibility ranking method for multiple protein pairs. An interacting probability equation for a certain protein pair is developed and the rankings for multiple protein pairs are decided by the interacting probabilities.
Predicting Functional Protein-Protein Interactions Based on Computational Methods
359
.
3 Classifying Data Classification is to map data into predetermined groups or classes. Support Vector machines (SVM) are frequently used to predict protein-protein interactions site because that SVM are considered to be a powerful technique for making binary decisions. Dohkan, S. et al. [31] presents a new interacted proteins prediction method that associates domains and other protein features by using SVM. It reports the results of investigating the effect of those protein features on the prediction accuracy. Crossvalidation tests revealed that using with this SVM can obtain more accurate result than the predictions reported previously. One of the proposed classification methods [11] is to train a learning set of protein interactions to classify them in a phylogenetic neighbor organism. Using this method, a 10-fold cross validation with the Helicobacter pylori interactions map with a sensitivity and accuracy of 70-80% was estimated. However, one shortcoming of this type of approaches lies in the fact that there are no highly reliable and complete or near complete protein interactions maps for any organism. The closest to a complete may would be for yeast, but this organism is evolutionarily distant from human --- a complex multicellular organism, thus, limiting the number of orthologs that can be identified.
.
4 Prediction Prediction is as similar as classification, except that the records are classified to show some predicted future behavior or estimated value. Some studies have been made on the assumption that interacting proteins whose transcripts being co-expressed are more likely to be credible [32]. However, recent research shows that interactions in genome-wide datasets have only a weak relationship with gene expression owing to different degradation rates [33]. These methods need whole genome-scale proteinprotein interactions dataset to assess the reliability of each related protein pair. Moreover, it is ambiguous to define the cutoff value to classify between true positives and false positives. Hence, a new model to assess the reliability of individual protein interactions pair is required. Min Su Lee et al. [34] develops a new reliability assessment system for proteinprotein interactions dataset that is capable to distinguish real interacting protein pairs from loud noisy dataset. The system uses a neural network algorithm based on the three characteristics of interacting proteins. First, interacting proteins share similar functional category. Second, interacting proteins must locate in close proximity, at least transiently. Third, an interacting protein pair is tightly linked with other proteins in the protein interactions network. Mamitsuka, H. [35] proposes a new probabilistic model for protein-protein interactions by considering the latent knowledge of proteins. An efficient learning algorithm is further launched for this model, based on the EM algorithm. It successfully integrates a discrete co-occurrence data set and a table of binary values.
3 Analysis and Assessment of the PPI Prediction Methods As many predicting methods are developed, a practical consequence that follows naturally how to assess the method efficiently and effectively. The general work is to
360
L. Zhang and W. Zhang
estimate the predicting accuracies. But it is difficult to estimate the predicting accuracies at the domain level because that few domain-domain interactions are known whereas large volume of protein-protein interactions data exists. This restriction yearns for the necessity to use inferred domain-domain interactions to predict protein-protein interactions and assess the prediction accuracies at the protein level. The accuracies of the prediction are measured by specificity and sensitivity respectively. Specificity is denoted as SP which refers to the ratio of the number of matched interactions between the predicted set and the observed set over the total number of predicted interactions. Sensitivity is denoted as SN which refers to the ratio of the number of matched interactions over the total number of observed interactions. Here, the protein-protein interactions data sets of Uetz [36] and Ito [37] are used to predict domain–domain interactions (DDI) in yeast proteins. The protein-domain information is obtained from a protein-domain family database called PFAM (http://PFAM.wustl.edu). Table 1. Number of Proteins, Domains, and PPI in the Uetz, the Ito, the Uetz and Ito Combined, and the Overlap Data Sets
Date set Uetz Lto Combined Overlap
Protein 1337 3277 3729 855
Domains 1643 3685 4131 1179
PPI 1445 4475 5719 201
Table 2. The test result of the AM, MLE and ASSC
Method AM MLE ASSC
SP 55.5% 42.6% As similar MLE
as
SN 55.0% 77.6% > AM
Speed 6 hours 73 seconds
From the comparative result, one of the limitations of AM is addressed as that the method relies on the accuracy of the observed data. The observed interactions are treated as real interactions in the cases. Nevertheless, it computes domain-domain interactions locally, by which it means that it ignores other domain-domain interactions information between the protein pairs, thus, does not make full use of all of the available information. And another limitation is that AM ignores experimental errors. MLE was then developed as a global approach to incorporate all of the proteins and domains, as well as experimental errors. However, the approach ignores the following biological facts. To begin with, for the independence of domain-domain interactions, whether two domains interact or not may depends on other domains in the same protein or other environmental conditions. Secondly, using domain-domain interactions inferring protein-protein interactions is based on the assumption that some subunits with special structure are essential to protein-protein interactions.
Predicting Functional Protein-Protein Interactions Based on Computational Methods
361
Actually, these subunits may be different from PFAM domains obtained through multiple alignments. Furthermore, PFAM A and PFAM B have domains in different levels but been used in the same level.
4 Conclusion and Future Works Cellular operation can only be comprehended by considering the individual properties of proteins and genes in the context of their complex relationships. It is therefore unsurprising that the study of these interactions and complexes is establishing itself as the main task in the biologic research field. The experimental prediction methods designed to detect protein–protein interactions. However, show high rates of errors, in terms of false negatives and false positives. One of the reasons has been known that every single experimental method is biased to certain kinds of proteins and interactions. Computational methods which range from data retrieving-based methods to data integration-based approaches have also been introduced to tackle the problem of inferring protein interactions network. Unfortunately, the shortcomings of experimental techniques affect both the further development and the fair evaluation of computational prediction methods (prediction of physical protein-protein interactions). In addition, an interactions map with high quality and in nearly global coverage has not been provided yet by those approaches. As the future work, although more and more biologic metadata occurred and data integration platform developed, there lack an assessment system to validate the metadata, thereby a web portal should be launched in the next work. In this portal, some available biologic metadata are offered. Firstly the research can obtain the required metadata, and then do some research based on the metadata, and finally give feedback to the validity of used metadata. Thus, the main aim of this portal is to mark the available biologic metadata based on the users’ feedback. On the other hand, it is likely that a combination of both experimental approach and computational approach will be most fruitful. For instance, the date validate by experimental prediction method may allow to extract rules that may be useful in the context of machine learning for prediction purposes. While some progress has been made in efforts, the field of interacted protein detection is absolutely in its infancy and much work will be required to bring the prediction of protein-protein interactions to a robust and reliable state.
References 1. Huynen, M., Snel, B., Lathe III, W., Bork, P.: Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences. Genome Res. 10, 1204–1210 (2000) 2. Tucker, C.L., Gera, J.F., Uetz, P.: Towards an Understanding of Complex Protein Networks. Trends Cell Biol. 11, 102–106 (2001) 3. Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P.: Comparative Assessment of Large-scale Data Sets of Protein-Protein Interactions. Nature 417, 399–403 (2002)
362
L. Zhang and W. Zhang
4. Bader, G.D., Hogue, C.W.V.: Analyzing Yeast Protein-Protein Interactions Data Obtained From Different Sources. Nature Biotechnology 20, 991–997 (2002) 5. Deng, M., Mehta, S., Sun, F., Chen, T.: Inferring Domain–Domain Interactions From Protein–Protein Interactions. Genome Res. 12, 1540–1548 (2002) 6. Enright, A.J., Iliopoulos, I., Kyrpides, N.C., Ouzounis, C.A.: Protein Interactions Maps for Complete Genomes Based on Gene Fusion Events. Nature 402, 86–90 (1999) 7. Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, D.O., Eisenberg, D.: Detecting Protein Function and Protein-Protein Interactions From Genome Sequences. Science 285, 751–753 (1999) 8. Ng, S., Zhang, Z., Tan, S., Lin, K.: InterDom: A Database of Putative Interacting Protein Domains for Validating Predicated Protein Interactions and Complexes. Nucleic Acids Res. 31, 251–254 (2003) 9. Enright, A.J., Ouzounis, C.A.: Protein-Protein Interactions: A Molecular Cloning Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor New York (2002) 10. Bock, J.R., Gough, D.A.: Prediction of Protein- Protein Interactions from Primary Structure. Bioinformatics 17, 455–460 (2001) 11. Wojcik, J., Schachter, V.: Protein-Protein Interactions Map Inference Using Interacting Domain Profile Pairs. Bioinformatics 17, S296–S305 (2001) 12. Han, D., Kim, H., Seo, J., Jang, W.: Domain Combination Based Probabilistic Framework for Protein- Protein Interactions Prediction. Genome Informatics 14, 250–259 (2003) 13. Fields, S., Song, O.: A novel genetic system to detect protein-protein interactions. Nature 340, 245–246 (1989) 14. Yates III, J.R.: Mass Spectrometry: From Genomics to Proteomics. Trends Genet. 16, 5–8 (2000) 15. Yu, J., Fotouhi, F.: Computational Approaches for Predicting Protein–Protein Interactions: A Survey. J. Med. Sys. 30, 39–44 (2006) 16. Sprinzak, E., Margalit, H.: Correlated Sequence-Signature as Markers of Protein-Protein Interactions. J. Molecular Biology 311, 681–692 (2001) 17. Hayashida, M., Ueda, N., Akutsu, T.: A Simple Method for Inferring Strengths of ProteinProtein Interactions. Genome Informatics 15, 56–68 (2004) 18. Matthews, L.R., Vaglio, P., Reboul, J., Ge, H., Davis, B.P., Garrels, J., Vincent, S., Vidal, M.: Identification of Potential Interactions Networks Using Sequence-based Searches for Conserved Protein–Protein Interactions or ”Interologs”. Genome Res. 11, 2120–2126 (2001) 19. Dandekar, T., Snel, B., Huynen, M., Bork, P.: Conservation of gene order: A fingerprint of Proteins that Physically Interact. Science 23, 324–328 (1998) 20. Enright, A.J., Iliopoulos, I., Kyrpides, N.C., Ouzounis, C.A.: Protein Interactions Maps for Complete Genomes Based on Gene Fusion Events. Nature 402, 86–90 (1999) 21. Pellegrini, M., et al.: Assigning Protein Functions by Comparative Genome Analysis: Protein Phylogenetic Profiles. Proc. Natl. Acad. Sci. 96, 4285–4288 (1999) 22. Szilágyi, A., Grimm, V., Arakaki, A.K., Skolnick, J.: Prediction of Physical ProteinProtein Interactions. Phys. Biol. 2, S1–S16 (2005) 23. Moore, R., Rajasekar, A., Wan, M.: Data Grids, Digital Libraries, and Persistent Archives: An Integrated Approach to Sharing, Publishing, and Archiving Data. Proceedings of the IEEE 93(3), 578–588 (2005) 24. BioGRID: http://www.thebiogrid.org 25. Hu, X., Song, I.Y., et al.: Extracting and mining protein-protein interactions network from biomedical literature. Computational Intelligence in Bioinformatics and Computational Biology, 244–251 (2004)
Predicting Functional Protein-Protein Interactions Based on Computational Methods
363
26. Nafar, Z., Golshani, A.: Data Mining Methods for Protein-Protein Interactions, Electrical and Computer Engineering. In: Canadian Conference on, pp. 991–994 (May 2006) 27. Huang, C., Morcos, F., Kanaan, S.P., Wuchty, S., Chen, D.Z., Izaguirre, J.A.: Predicting Protein-Protein Interactions from Protein Domains Using a Set Cover Approach, Computational Biology and Bioinformatics. IEEE/ACM Transactions on 4(1), 78–87 (2007) 28. Pearl, F.M.G., et al.: Assigning Genomic Sequences to CATH. Nucleic Acids Res. 28, 277–282 (2000) 29. Chen, X., Liu, M.: Prediction of Protein-Protein Interactions Using Random Decision Forest Framework. Bioinformatics 21, 4394–4400 (2005) 30. Han, D., Kim, H., Jang, W., Lee, S.: Domain Combination Based Protein-Protein Interactions Possibility Ranking Method. In: Proceedings of the Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE’04) 31. Dohkan, S., Koike, A., Takagi, T.: Prediction of Protein-Protein Interactions Using Support Vector Machines. In: Proceedings of the Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE’04) 32. Kemmeren, P., van Berkum, N.L., Vilo, J., Bijma, T., et al.: Protein Interactions Verification and Functional Annotation by Integrated Analysis of Genome-Scale Data. Mol. Cell. 9, 1133–1143 (2002) 33. Gygi, S., Rochon, Y., Franza, B.R., Aebersold, R.: Correlation between Protein and MRNA Abundance in Yeast. MCB 19, 1720–1730 (1999) 34. Lee, M., Park, S., Kim, M.: A Protein Interactions Verification System Based on a Neural Network Algorithm. In: Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference Workshops (CSBW’05) 35. Mamitsuka, H.: Essential Latent Knowledge for Protein-Protein Interactions: Analysis By an Unsupervised Learning Approach, Computational Biology and Bioinformatics. IEEE/ACM Transactions on 2(2), 119–130 (2005) 36. Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., et al.: A Comprehensive Analysis of Protein– Protein Interactions in Saccharomyces Cerevisiae. Nature 403, 623–627 (2000) 37. Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S., Sakaki, Y.: Toward a Protein-Protein Interactions Map of the Budding Yeast: A Comprehensive System to Examine Two-Hybrid Interactions in All Possible Combinations Between the Yeast Proteins. Proc. Natl. Acad. Sci. 97, 1143–1147 (2000)
The Chaos Model Analysis Based on Time-Varying Fractal Dimension Jianrong Hou1, Dan Huang1, and Hui Zhao2 1
School of Management, Shanghai Jiaotong University, Shanghai, China, 200052
[email protected] 2 Software Engineering Institute, East China Normal University, Shanghai, China, 200062
[email protected]
Abstract. An evaluation formula of varying-time Hurst index is established by wavelet and the algorithm of varying-time index is presented, which is applied to extract the characteristics of the atrial fibrillation in this paper. The diagnosis of atrial fibrillation curve figure can be done at some resolution ratio level. The results show that the time-varying fractal dimension rises when atrial fibrillation begins, while it falls when atrial fibrillation ends. The begin and the end characteristics of atrial fibrillation can be successfully detected by means of the change of the time-varying fractal dimension. The result also indicates that the complexity of heart rate variability (HRV) decreases at the beginning of atrial fibrillation.
1 Introduction The chaos characteristic of heart rate variability is of great importance in disease diagnosis of atrial fibrillation from HRV[1-4]. Atrial fibrillation/flutter is a heart rhythm disorder. It usually involves a rapid heart rate, in which the upper heart chamber is stimulated to contract in a very disorganized and abnormal manner. Atrial fibrillation data series show the non-linear and fractal characters in the process of time-space kinetics evolution. In the case of unknowing the fractal dimension of atrial fibrillation , the process of querying the similarity of diagnosis curve figure will be affected to a certain degree. Because of the long-range power-law correlation characteristic of HRV, HRV signal is thought to belong to an approximate 1 f fractal signal[5-6]. But local dynamic characteristic extraction of atrial fibrillation is a crucial factor in diagnosis. The determinate fractal dimension value educed by scale relation can not depict the space-time kinetics process of object evolution completely yet, except that it can reflect the self-similarity construction rule of static structure. The dimension is always set to be constant when the similarity of some nature phenomena is studied. Actually, the evolution of nature phenomena in one dimension time world may often lead to changeable similarity. The thought of unchangeable dimension in the aforementioned research papers does not accord with the objective fact[9-10]. This paper proposes another alternative technique based on time-varying fractal dimension and gives its application in the detection of atrial fibrillation. Wavelet analysis has been proven to be a very efficient tool in dealing with non-stationary and self-similarity[7-8]. Thus, K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 364–369, 2007. © Springer-Verlag Berlin Heidelberg 2007
The Chaos Model Analysis Based on Time-Varying Fractal Dimension
365
wavelet transformation will be taken as a leading actor in the process of evaluating varying-time index. This paper is organized as follows. Section 2 presents the local self-similarity process definition of non-linear time series based on statistic self-similarity. Daubechies wavelet to transform the local self-similarity process is utilized and an evaluation expression of Hurst index is established by the least square method. Section 3 introduces the algorithm of gaining the varying-time Hurst index. In the end the application of varying-time Hurst index to the atrial fibrillation diagnosis is illustrated by atrial fibrillation examples.
2 The Evaluation of Hurst Index for Non-stationary Time Series Let
Y (t ) be a stochastic process with zero mean value. If its covariance Γt (s1 , s 2 )
satisfies the following expression
Γt (s1 , s 2 ) − Γt (0,0 ) = − q (t ) s1 − s 2 where
2 H (t )
{1 + o(1)}, ( s1
+ s2 → 0)
(1)
q(t ) ≥ 0 , then Y (t ) is called a stochastic process with local self-similarity. For
the given
s1 − s 2 , local self-relativity of process Y (t ) will also wear off when
H (t ) decreases from one to zero. Thus a rough sample orbit with gradually increased error is appeared.
H ( t ) ∈ ( 0,1) .
ϕ ( x ) be the mother wavelet. WY (a, t ) is the wavelet transformation about self-similarity process Y (t ) at scale a and position t . Well then, Let
WY (a, t ) = a
−1
2
∫ ϕ ( )Y (u )du = a ∫ ϕ (x )Y (t + ax )dx u −t a
1
2
It is educed by the formula (1) and the above formula that
(
E WY (a, t )
2
)= a
−1
∫∫ ϕ ( )ϕ ( )E[Y (u )Y (v )]dudv u −t a
v −t a
= a ∫∫ ϕ ( x )ϕ ( y )E [Y (t + ax )Y (t + ay )]dxdy
~ C1 a1+ 2 H (t ) (a → 0)
where
C1 = −q (t )∫∫ x − y Let
2 H (t )
ϕ ( x )ϕ ( y )dxdy
yt (a ) = log WY (a, t )
2
(2)
366
J. Hou, D. Huang, and H. Zhao
⎧⎪ ⎪⎩
ε t (a ) = log ⎨ WY (a, t )
2
1
(
E WY (a, t )
2
⎧⎪ ⎡ ⎫⎪ 2 ⎬ − E ⎨log ⎢ WY (a, t ) 1 2 E WY (a, t ) ⎪⎭ ⎪⎩ ⎣⎢
)
(
⎤ ⎫⎪ ⎥⎬ ⎦⎥ ⎪⎭
)
Then
[(
y t (a ) = log E WY (a, t )
2
)]+ C
2
+ ε t (a )
(3)
where
{ [
(
C 2 = E log WY (a, t ) / E WY (a, t ) When
2
2
)]}
a is very small, A regression model can be described by the following formula. yt (a ) ≈ (log C1 + C 2 ) + [2 H (t ) + 1]log a + ε t (a )
(4)
A small-scaled series is constructed as follows.
a1 > a 2 > L > a L , a j = 2 − j , j = 1,2,L, n .
Let x j
= log a j , y j = y t (a j ) , j = 1,2, L , n . The least square method is used to
get an evaluator of
H (t ) in formula (4) for the couples
{(x , y ), j = 1,2,L, n} : j
j
⎡ ∑ (x j − x )( y j − y ) ⎤ − 1⎥ Hˆ (t ) = 12 ⎢ 2 ⎢⎣ ∑ (x j − x ) ⎥⎦
(5)
where
x=∑ It can be proved that
xj n
Hˆ (t ) is a consistent result [8] of H (t ) .
3 Algorithm Description of Time-Varying Hurst Index For a stochastic time series process Y (t ) on the discrete and equally spaced points, the time points may be limited in
(
[0,1) .
The sample size is
)
2 J , t i = (i − 1) / n ,
n = 1,2,L,2 J . y j ,k k = 0,1,L,2 j −1 , j = 0,1,L J − 1 is an evaluated value of
WY (2 − j , K 2 − j ) . The latter is the discrete value by wavelet transformation WY (a, t ) in a = 2 − j , t = K 2 − j . Wavelet transformation is carried on by Daubechies’ compactly-supported wavelet bases with M moments[8].
The Chaos Model Analysis Based on Time-Varying Fractal Dimension
Step1.
[0,1) is partitioned into 2 l
each other.
[
equal-length sub-section
367
I m without interacting
)
I m = (m − 1)2 − j , m 2 − j ;1 ≤ l ≤ ( J − 1), m = 1,2, L ,2 l . Hˆ (t ) is regarded as the average value of H (t ) in the correspond sub-section I m . The appropriate time spot of Hˆ (t ) is chosen at the point 2 − l −1 (2m − 1) in the
Step2.
middle of
Im .
The double variables set is defined as follows:
[ ⎩
{( X m , Ym )} = ⎧⎨ log(2 − j ), log
( y )]k 2 2
j ,k
−j
∈ I m ⎫⎬ ⎭
(6)
0 ≤ k ≤ 2 − 1,0 < j ≤ J − 1 j
Hˆ (t ) is evaluated by formula (5) on each I m . Step3. The evaluated value of Hˆ (t ) is smoothed by using local multinomial to form a
curve that can be approximately regarded as the approach of the real curve of H (t ) .
4 The Chaos Characteristics Based on Time-Varying Fractal Dimension In order to evaluate the effectiveness of the proposed method in searching the diagnosis characteristics of any HRV time data series. Here is the original atrial fibrillation from HRV series samples in Fig.1from MIT-BIH atrial fibrillation database. The result reported in this section addresses in the following issues.
Fig. 1. Atrial fibrillation from HRV
368
J. Hou, D. Huang, and H. Zhao
Fig. 2. Time-varying fractal dimension of atrial fibrillation chaos
The results show that the time-varying fractal dimension rises when atrial fibrillation begins, while it falls when atrial fibrillation ends. The beginning and the end characteristics of atrial fibrillation can be successfully detected by means of the change of the time-varying fractal dimension. The results also indicate that the complexity of heart rate variability (HRV) decreases at the beginning of atrial fibirillation. Fig.2 indicates the change of time-varying fractal dimension of atrial fibrillation in the period of 0:00.224---1:03.500. The fractal dimension rises quickly from 0.68 up to 1.2, after that the fractal dimension returns to the ordinary value 0.8 nearby. Here we adopt wavelet base db4. Fig.2 shows that evolution of time-varying Hurst index is of great importance in atrial fibrillation diagnosis strategies.
5 Conclusions In this paper a new computation method of atrial fibrillation chaos based on time-varying fractal dimension is put forward, by which dynamic characters of HRV data series can be completely depicted at some resolution ratio level of wavelet. The algorithm of fractal time-varying Hurst index curves is proposed. The change of time-varying fractal dimension also indicates that the complexity of HRV decreases after atrial fibrillation begins.
References 1. Goldberger, A., Rigney, D.R., Mietus, J., Antman, E.M., Greenwald, S.: Experientia. 44, 11–12 (1988) 2. Peng, C.K., Havlin, S., Stanley, H.E., Gold berger, A.L.: Chaos 5, 82–92 (1995) 3. Ruan, J., Cai, Z.J., Lin, W.: IEEE-EMBS Asia-Pacific on Biomed. Egin., 363-368 (2000) 4. Peng, C.K., Mietus, J., Hausdorff, J.M., Havlin, S., Stanley, H.E., Goldberger, A.L.: Phys. Rev. Lett. 70, 1343–1352 (1993) 5. Wornell, G.W.: Signal Processing with Fractals: a Wavelet-Based Approach. Prince Hall Inc. (1996)
The Chaos Model Analysis Based on Time-Varying Fractal Dimension
369
6. Kobayashi, M., Musha, T.: IEEE Trans., Biomed. Eng. 29, 456–462 (1982) 7. Jianrong, H., Guoxiang, S.: Application of Wavelet Analysis in the Estimation of Hurst Index. Journal of Xidian University (Science Edition) 1, 121–125 (2002) 8. Daubechies, I.: The wavelet transform: Time-Frequency Localization and Signal Analysis. IEEE. Trans. On Information Theory 36 (1990) 9. Brockwell, P.J.: Time series: Theory and Methods. Springer, New York (1991) 10. Mandebrot, B.B., Van Ness, J.W.: Fractional Brownian motions, fractional noises and applications. SIAM Review 10, 422–437 (1968)
Bi-hierarchy Medical Image Registration Based on Steerable Pyramid Transform Xiuying Wang1 and David Feng1,2 1
Biomedical and Multimedia Information Technology (BMIT) Research Group, School of Information Technologies, J12, The University of Sydney, NSW 2006, Australia 2 Centre for Multimedia Signal Processing, Department of Electronic and Information Engineering, Hong Kong Polytechnic University {xiuying, feng}@it.usyd.edu.au
Abstract. Image registration is playing a more and more important role in facilitating the smart use of widely available and complementary information from multiple imaging resources. To improve the efficiency and accuracy of medical image registration, in this paper, a bi-hierarchy registration method is presented. Firstly, on the basis of steerable pyramid transform with property of transformation-invariance, the images will be decomposed into multi-scale and multi-band representation. Then, to avoid transformation error accumulation and magnification during the parameter transmission, the registration will be performed only in the lowest-resolution hierarchy and the highest-resolution hierarchy. The experiments on medical images demonstrate that the proposed registration is of high performance and is robust to noise.
1 Introduction Widely available digital images acquired from diverse imaging sensors (such as infrared, laser, SAR, sensor networking) and modalities (such as MR, PET, CT) are becoming an indispensable information resource for applications in remote sensing, multimedia areas, and systems physiology and life science. Particularly, in current clinical practice, proper registration and integration of considerable amounts of medical imaging data collected from different medical imaging devices and over multiple time intervals is essential to high-quality healthcare. By integrating and presenting multiple and heterogeneous medical datasets in a common coordinate system simultaneously, medical image registration is able to provide a more complete insight into patient data, and to facilitate a better and safer diagnosis and treatment. For instance, registration of cardiac images provides a non-invasive tool for detection and diagnosis of heart disease which is the main cause of death in developed countries. By providing effective mechanism for early identifying tumor, monitoring disease progress, and assessing the treatment response, medical data registration is of significance for reducing the morbidity and mortality of cancers [1]. After almost three decades’ extensive studies and investigations, a large number of medical image registration algorithms have been proposed. Registration methods seek to optimize a certain similarity measure which defines how well two image sets are registered. The similarity measures can be based on the distances between K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 370–379, 2007. © Springer-Verlag Berlin Heidelberg 2007
Bi-hierarchy Medical Image Registration Based on Steerable Pyramid Transform
371
homogeneous features or the differences of raw values in the two image sets to be registered. Correspondingly, biomedical image registration can be categorized into feature-based registration or intensity-based registration. In feature-based registration, the transformation required to spatially match the identified features (such as points, curves, and surfaces) can be computationally promptly determined and applied to the images. However, a preprocessing step is usually needed to extract these features manually or semi-automatically, which makes this category of registration operator intensive and dependent. Alternatively, intensity-based registration methods have the advantage of directly exploiting the raw data without requiring segmentation or extensive user interactions. In these methods, the datasets are iteratively transformed until the similarity criterion is satisfied, which makes these registration methods computationally expensive and inefficient [1]. It is a challenging task to generate automatic, rapid and effective registration techniques for general applications. To provide improved registration performance and accelerated computation, hierarchical registration strategy has been investigated. In such a “coarse-tofine” scheme, the image datasets will be divided into multi-resolution hierarchies to build registration pyramids and registration will be performed from low-resolution hierarchies to high-resolution hierarchies. Wavelets [2] lend themselves naturally to building registration pyramids due to their capability of distinguishing image data into different frequencies and preserving information at different resolutions [3-5]. However, fast discrete wavelet transforms lack translation- and rotation- invariance, which makes the discrete wavelet based registration difficult. Steerable pyramid [6-7], with the features of rotation- and translation-invariance, may provide a more suitable multi-scale, multi-orientation data representation scheme for image registration. In this paper, a registration approach on the basis of two resolution hierarchies is presented. Firstly, the images will be separated into multi-scale, multi-orientation representation by the steerable pyramid. The median pyramid hierarchies might not be able to make significant contributions to improve registration accuracy and might slow down the computation, and even worse, might lead to error accumulation. Therefore, in this paper, only the lowest-resolution hierarchy and the highest-resolution hierarchy will be used for the registration purpose. In each of these two hierarchies, a new “magnitude sub-band” will be constructed from the multi-orientation band-pass coefficients to extract edge information. Then, to improve registration accuracy, both raw coefficients in low-pass sub-bands and the edge features in magnitude sub-band will be used as registration feature space. The registration approach is computationally efficient because both rotation and scaling parameters will be calculated in the lowest-resolution hierarchy with small data size, while only translation parameters will be derived from registration in the highest resolution hierarchy.
2 Multi-scale Image Representation Via Wavelets and Steerable Pyramid 2.1 Wavelet Decomposition Wavelets have been widely applied in image processing and coding due to their superior capability of separating image data into multi-scale and multi-band while
372
X. Wang and D. Feng
preserving information at different resolutions. The wavelet decomposition is a key component for automated extraction of features, which then allow efficient image processing. Multi-resolution analysis (MRA) [2] is important for the construction of fast two-dimensional wavelets from one-dimensional ones. For a given twodimensional image, the wavelet-based image decomposition can be achieved by convolving wavelet low-pass filter and high-pass filter and then down-sampling by a factor of 2 along rows and columns independently. The image pyramid representation can be obtained by MRA and used in hierarchical image registration. However, coefficients of the shifted and rotated versions of the same image may be distributed differently. The lack of translation invariance and rotation invariance, which are the basic requirement for image registration procedure, makes the discrete wavelet based pyramid representation not suitable for, especially the intensity or coefficient based, registration. Feature-based methods offer a potential solution to the problem of rotation- and translationinvariance in wavelet-based multi-resolution registration. A wavelet-based coarse-tofine matching method using “points of interest” as feature space was proposed by [3-4] and surface alignment approach using multiresolution wavelet representation has been introduced by [5]. Research efforts have been devoted to break the barrier of translation- and rotation-invariance, and innovative techniques such as steerable pyramids have been proposed [6-7]. 2.2 Steerable Pyramid Transform The steerable pyramid can be used to construct the multi-scale, multi-orientation image representation with the property of rotation- and translation-invariance. In this pyramid representation, the image is decomposed into subbands by the basis functions which are directional derivative operators with different sizes and orientations. As all the basis functions are derived by translations, dilations, as well as rotations of a single function, steerable pyramid can be thought as a “wavelet”.
H0
L0
H0
0
B0
1
B0
B1
2
B1
BK
K+1
BK
L
K+2
L
Decomposition
Reconstruction
Fig. 1. System diagram for the steerable pyramid
L0
Bi-hierarchy Medical Image Registration Based on Steerable Pyramid Transform
373
Similar to conventional orthogonal wavelet decompositions, the steerable pyramid is implemented by recursively dividing an image into a set of oriented subbands and a low-pass sub-image (Fig.1) [7]. The input image is initially decomposed into high- and low-pass bands by highpass filter H 0 and low-pass filter L0 . The low-pass band is then further decomposed into a set of K + 1 subbands by band-pass oriented filters B0 L BK and a lowerfrequency band by L . The low-pass band is subsampled by a factor of two along xand y- directions, and the recursive construction of a pyramid is achieved by performing low-pass subband decomposition at node K + 2 . The multi-scale representation of the image is obtained from this recursive procedure. These filters are designed to be polar-separable in Fourier domain to prevent spatial aliasing in the subsampling of the subbands, and hence translation-invariance can be achieved [7]. However, as no down-sampling in band-pass subbands, steerable pyramid is overcomplete. For a two-hierarchy pyramid with K band-pass filters, the transform function in frequency domain can be represented as: ⎧ 2 2 2 X ( ω ) = ⎨ H 0 ( ω ) + L 0 ( ω ) ( L1 ( ω ) + ⎩
K
∑
k =0
2 ⎫ B k (ω ) ) ⎬ ⎭
(1)
To eliminate the spatial aliasing, the L1 filter should be constrained to have a zero response
for
frequencies
L1 (ω ) = 0, for | ω |> π
2
higher
than
π
in
2
both
directions,
i.e.,
; to avoid amplitude distortion, the transfer function of the K
2 system should be equal to one, i.e., H (ω ) 2 + L (ω ) 2 ( L (ω ) 2 + ∑ Bk (ω ) ) = 1. In 0 0 1 k =0
addition, the low-pass band should not be affected by the insertion of the recursive procedure, i.e., L (ω / 2 ) 2 = L (ω / 2 ) 2 ( L (ω ) 2 + K B (ω ) 2 ). 1
1
∑
1
k =0
k
The rotation invariance can be achieved from filters with property of steerability. Filters will be regarded as steerable if they satisfy: (1) filters are rotated copies of each other; (2) the function filtered at an arbitrary orientation can be synthesized by linear combination of the function filtered with the basis filters [8]. To satisfy the steerability, the band-pass filters B0 L BK −1 can be expressed as [9]:
Bk (ω ) = Bk (ω )(− j cos(θ − θ k )) K where θ = arg(ω ), θ k = kπ/(K + 1) for k ∈ [0,...K] , Bk (ω ) directional derivative, and B(ω ) =
K
∑ B (ω ) k =0
k
2
(2) is the kth order
.
Steerable pyramid retains some advantages of orthonormal wavelet transforms such as basis functions are localized in space and spatial-frequency domain, and transform is a tight frame. More importantly, aliasing is eliminated and shiftable orientation and translation can be achieved. Steerable pyramid has been applied in
374
X. Wang and D. Feng
Fig. 2. 3-hierarchy, 4-orientation steerable pyramid of abdominal CT image. Four band-pass
{
sub-images oriented at θ ∈ 0, π 4 , π 2 , 3π 4
}
at each hierarchy and low-pass image
various image processing tasks such as denoising, texture extraction and retrieval [10], and image registration [11-12]. In this paper, the filters described in [7] have been adopted to decompose medical images. By using such filters, given a set of basis functions with different orientations centered at the same spatial location and scale, the function filtered at an arbitrary orientation can be synthesized by linear combination of the function filtered with the basis functions at that scale. For instance, Figure 2 illustrates an image decomposition by four basis function orientations θ j spaced equally in [0, π ] , i.e., θ j = jπ 4 , ( j = 0,1,2,3) .
3 Bi-hierarchy Medical Image Registration Based on Steerable Pyramid In our bi-hierarchy registration scheme, images to be registered will be firstly decomposed into multi-resolution hierarchies by steerable pyramid transform, and registration feature space will be determined from multiple sub-bands of each hierarchy in the pyramid. Then, registration, which is an optimization procedure to search the transformation on the feature space to maximize the registration similarity, will be performed from the lowest-resolution hierarchy then directly to the highestresolution hierarchy. Registration results of the former will be used to initialize the registration of the latter scale to improve the registration performance and accuracy (Fig. 3). A similar technique can be found in [12].
Bi-hierarchy Medical Image Registration Based on Steerable Pyramid Transform
375
Fig. 3. Image Pyramid and Registration Procedure
3.1 Registration Feature Extraction from Multiple Sub-bands Registration features, which provide base for registration similarity measurement, should be able to represent important image information, and at the mean time be able to suppress noise artifacts. As low-pass sub-bands preserve important global image information as well as salient spatial frequencies, registration directly explore lowpass coefficients will be performed to correct global orientation displacements between the images. As mentioned in Section 2, multiple band-pass sub-bands presenting multiple orientations ensure the transformation steerability. However, more decomposed bandpass sub-bands also raise the challenges of searching more suitable and feasible registration features. As coefficient magnitudes are usually able to exhibit significant structural information [13], and multi-scale edge information can be automatically and directly extracted by sum of magnitudes of band-pass coefficients at different orientations, in our proposed method, a new “magnitude sub-band” will be constructed for each of registration pyramid scale. As features would give rise to coefficients values in neighborhoods, 50% the most significant coefficients in the new “magnitude sub-bands”, which contain the extracted “edges”, will be used as registration features to reduce the distortions of noise on the registration performance. 3.2 Registration Scheme and Procedure The primary task of image registration is to search a transformation to establish a spatial correspondence between the images acquired from multiple sensors, at different times, or at different view points for analysis or visualization. In homogenous coordinate system, two-dimensional rotation, translation, and scaling transformation that compose affine transformation can be expressed as:
376
X. Wang and D. Feng
⎡cos θ R = ⎢ sin θ ⎢ 0 ⎣
− sin θ cos θ
For a reference image
0
0⎤ ⎡t x ⎤ ⎡sx 0⎥ ; T = ⎢t ⎥ ; and S = ⎢ 0 ⎥ ⎢ y⎥ ⎢ 1⎦ ⎢⎣ 1 ⎥⎦ ⎢⎣ 0
0 sy 0
0⎤ . 0⎥⎥ 1⎥⎦
(3)
I R and a study image I S , the affine transformation
correspondence between pi = ( x, y ) ∈ I R and qi = ( x' , y ' ) ∈ I S can be established as:
pi = SRqi + T . In this paper, an efficient registration will be achieved by the feature space extracted from only two resolution scales in the registration pyramid: the lowest-resolution scale ln and highest-resolution scale l1 . Registration in Low-Resolution Hierarchy. Because the lowest-resolution pyramid scale n mainly preserves global image information while the local high spatial frequencies such as noise have been filtered out, global displacements of rotation and scaling can be corrected in this registration hierarchy. Firstly, global rotation displacements are to be corrected by the registration based on low-pass sub-band coefficients or magnitudes, which preserve significant orientation information very well. High registration accuracy can be achieved in the coefficients-based registration due to very little noise distortion. Then, at the same hierarchy, scaling parameter can be calculated from both magnitude sub-band registration and low-pass sub-band registration procedure. Because the size of low-pass sub-band is only a quarter that of magnitude sub-band, the final scaling parameter will be determined by the weighted combination of parameter calculation results from low-pass and magnitude sub-bands. However, as any orientation can be synthesized by linear combination of coefficients in band-pass sub-bands, this magnitude sub-band is not able to contribute to rotation parameter calculation. Registration in High-Resolution Hierarchy. Because the iterative decomposition and sub-sampling by steerable pyramid transform (node K + 2 in Fig. 1) leads to translation parameter value decreases in lower-resolution hierarchies, the translation differences can only be partially corrected by the registration of magnitude sub-bands. Although in most of multi-scale registration methods, the obtained transformation parameters at current scale are used to initialize the registration at higher-resolution registration by doubling translation parameter while keeping rotation and scaling parameter unchanged, such parameter passing or transmission mechanism is problematic. On the one hand, as translation parameter error cannot be completely removed in each registration scale, this parameter passing scheme may obviously lead to error accumulation, while the doubling even will magnify the errors. On the other hand, as the global rotation and scaling parameters can be corrected at very lowresolution hierarchies, the transmission of these parameters (kept unchanged) will not be capable of making great contributions to improve registration accuracy. Instead, such a hierarchy-by-hierarchy multi-scale registration strategy will tremendously slow down the registration procedure. To speed up registration and at the mean time to avoid error accumulation, in this proposed registration approach, only rotation and
Bi-hierarchy Medical Image Registration Based on Steerable Pyramid Transform
377
scaling parameters from the lowest-resolution scale will be directly passed to the highest-resolution hierarchy, where the translation can be corrected by registration of magnitude sub-bands. Estimation Criteria and Optimization Strategy. Registration similarity measures of normalized cross correlation, mutual information, and normalized mutual information have been used, and Powell optimization strategy is applied to search for optimal registration solution.
4 Experimental Results and Discussion To validate the performance of the proposed registration approach, both CT images (512*512) with high resolution and structural details, and PET images (128*128) which exhibit functional information, are used in our series of experiments. The experiments on the artificially deformed medical images with “ground truth” known beforehand aim to investigate how the factors of the pyramid hierarchies and number of band-pass filters would affect the registration performance. 4.1 Low-Pass Sub-band Registration for Rotation Displacement Correction
Low-pass sub-bands provide global orientation information, and hence can be used to correct rotation displacements. Our experimental results (Table 1) show that performance difference between registration based on raw coefficients and registration based on coefficient magnitudes is not significant, although the registration accuracy of the later is slightly higher than the former. Experiments also have been carried out to investigate the influence of pyramid hierarchies on the registration performance. Experiments on CT images show that registration based on 3-hierarchy pyramids and registration based on 4-hierarchy pyramids result in very similar accuracy; however, the later is much faster than the former. Table 1. Registration of low-pass sub-bands for rotation changes Registration Feature Space
Raw Coefficients Coefficient Magnitudes
Pyramid Hierarchies 3—(64*64) 4—(32*32) 3—(64*64) 4—(32*32)
Rotations (5, 7 and 15 degrees) Registration Results -4.9704 -6.9737 -14.9278 -4.9130 -6.8051 -14.5958 -4.9060 -6.9776 -14.9345 -4.9625 -6.9061 -14.7039
From our experiments, we also find out that registration pyramids with more hierarchies perform better and faster than those pyramids with fewer hierarchies for bigger rotation displacements. This is mainly because that low-pass sub-band in the higher pyramid contains more global orientation information while filters out most of noise. Four-hierarchy pyramid for CT images (the size of low-pass sub-band is 32*32) provides good balance between accuracy and efficiency. Two-hierarchy pyramid representation with 32*32 low-pass sub-bands for low-resolution PET
378
X. Wang and D. Feng
images is more suitable, otherwise, the registration will be easily trapped into local minimum if the size of low-pass sub-band is too small (for instance, 16*16). In addition, registration of low-pass subbands is robust, and no pre-processing of removing noise from the images is required before the registration for both CT and PET data sets. 4.2 Magnitude Sub-band Registration for Scaling and Translation Parameter Calculation
The number of band-pass filters (value of K ) could be important for different applications. Experiments have been performed to investigate the influence of the number of band-pass sub-bands with different orientations on registration performance of scaling and translation. The CT images have been scaled by factors of 0.7, 0.8, 0.9, and 1.1 in both x- and y- directions. Experiments (Table 2) demonstrate that the accuracy of scaling parameter calculation is not tremendously affected by the value of K. However, from Table 2, it can seen that the bigger the K is (up to 5 with 6 orientation in each pyramid scale), the better registration performance on scaling deformation correction. Experiments on PET images show that the low resolution images are more sensitive to the changes of K , and the best registration can be achieve when K = 5 . Table 2. Registration performance of magnitude sub-bands for scaling deformation K
0.7
1 3 5
Scaling Deformation Errors (*10-3) 61 38 25 17 56 35 22 16 47 33 19 12
0.8
0.9
1.1
The experimental results of registering images with different translation differences are consistent with our analysis in Section 3, i.e., the translation can be corrected by registration of magnitude sub-bands in the highest-resolution hierarchy without the median pyramid hierarchies required. The experiments also show that the value of K does affect the registration performance. With K decrease, the registration performance will degrade which because more concrete and delicate edge information can be extracted with more band-pass orientation and filters.
5 Conclusions In this paper, a registration approach is presented on the basis of bi-hierarchy pyramids constructed from steerable transform. A new “magnitude” sub-band is constructed to extract “edge” features in each registration pyramid hierarchy. To improve registration accuracy, both raw coefficients of low-pass sub-band and the magnitude sub-bands are used. To avoid parameter estimation error accumulation and to speed up computation, our registration is only performed at the lowest-resolution
Bi-hierarchy Medical Image Registration Based on Steerable Pyramid Transform
379
hierarchy and the highest-resolution hierarchy. In the lowest-resolution scale, the rotation differences and scaling deformations are corrected and then directly passed to initialize registration of highest-resolution scale to correct translation displacements. Our experiments demonstrate that the proposed algorithm has good registration performance and accuracy, and can be used to register medical images from different imaging devices. Acknowledgments. This work is supported by the ARC of Australia, RGC of Hong Kong.
References 1. Wang, Stefan, S., Fulham, M., Som, S., Feng, D.: Data Registration and Fusion. In: Feng, D. (ed.) Biomedical Information Technology, ch. 8. Elsevier publishing, Amsterdam (in press) 2. Mallat, S.: A theory of multiresolution signal decomposition: The wavelet representation. IEEE trans. Pattern Anal. Machine Intell. PAMI-11, 674–693 (1989) 3. You, J., Bhattacharya, P.: A wavelet-based coarse-to-fine image matching scheme in a parallel virtual machine environment. IEEE Transactions on Image Processing 9, 1547– 1559 (2000) 4. Wang, X., Feng, D.: An efficient wavelet-based biomedical registration for abdominal images. Journal of Nuclear Medicine 46(suppl.), 161 (2005) 5. Gefen, S., Tretiak, O., Bertrand, L., Rosen, G.D., Nissanov, J.: Surface alignment of an elastic body using a multiresolution wavelet representation. IEEE Transactions on Biomedical Engineering 51, 1230–1241 (2004) 6. Simoncelli, E.P., Freeman, W.T., Adelson, E.H., Heeger, D.J.: Shiftable multi-scale transforms or What’s wrong with orthonormal wavelets. IEEE Trans. Information Theory, Special Issue on Wavelets 38, 587–607 (1992) 7. Simoncelli, E.P., Freeman, W.T.: The steerable pyramid: A flexible architecture for multiscale derivative computation. In: Proc. 2nd IEEE Int’l. Conf. on Image Processing, Washington, DC, pp. 444–447. IEEE Computer Society Press, Los Alamitos (1995) 8. Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Trans. Patt. Anal. And Machine Intell. 13, 891–906 (1991) 9. Karasaridis, A., Simoncelli, E.P.: A filter design technique for Steerable Pyramid image transforms. In: Proc Int’l Conf Acoustics Speech and Signal Processing, Atlanta, Georgia, pp. 1–4 (1996) 10. Tzagkarakis, G., Beferull-Lozano, B., Tsakalides, P.: rotation-invariant texture retrieval with Gaussianized steerable pyramids. IEEE Transactions on Image Processing 15, 2702– 2718 (2006) 11. Liu, Z., Ho, Y.K., Tsukada, K., Hanasaki, K., Dai, Y., Li, L.: Using multiple orientational filters of steerable pyramid for image registration. Information Fusion 3, 203–214 (2002) 12. Wang, X., Feng, D.: Medical image registration via steerable pyramid. Accepted by 29th IEEE EMBS, to be held in Lyon France (August 23-26 2007) 13. Portilla, J., Simoncelli, E.P.: A parametric texture model based on joint statistics of complex wavelet coefficients. International Journal of Computer Vision 40, 49–71 (2000)
A Multiagent Quantum Evolutionary Algorithm for Global Numerical Optimization* Chaoyong Qin, Jianguo Zheng, and Jiyu Lai School of Business and Management, Donghua University, Shanghai, China, 200051
[email protected]
Abstract. In this paper, a novel kind of algorithm, multiagent quantum evolutionary algorithm(MAQEA), is proposed based on multiagent, evolutionary programming and quantum computation. An agent represents a candidate solution for optimization problem. All agents are presented by quantum chromosome, whose core lies on the concept and principles of quantum computing, live in table environment. Each agent competes and cooperates with its neighbors in order to increase its competitive ability. Quantum computation mechanics is employed to accelerate evolution process. The result of experiments shows that MAQEA has a strong ability of global optimization and high convergence speed.
1 Introduction Recent developments in quantum technology have shown that quantum computers can provide dramatic advantages over classical computers for some problems[1][2]. Quantum mechanics is one of the greatest achievements in the 20th century. Quantum algorithms rely upon the inherent parallel qualities of quantum mechanics to achieve their improvement. In recent years, quantum theory has been widely used in intelligence computation[3-5]. Quantum-inspired genetic algorithm(QGA), which were firstly developed by Narayanan[6], is a novel algorithm . QGA achieves much better performance than classical genetic algorithm. Reference [7] combined evolutionary theory and quantum theory to propose a new kind of evolutionary programming, the Quantum Evolutionary Programming(QEP), achieved rapid convergence and good global search capacity. The application of quantum parallelism to classical intelligence programming has been promising. Agent-based computation is a branch of distributed artificial intelligence. According to [8], an agent is anything that can perceive its environment through sensors and act upon that environment through effectors. Multiagent systems are computational systems in which many agents interact or work in order to achieve goals. Reference [9] solved the 7000-queen problem by an energy-based multiagent model. Reference[10] integrates multiagent systems with genetic algorithms, Multiagent Genetic Algorithm(MAGA), to solve the global numerical optimization problem. They achieve a good performance for minimizing the objective functions with high dimensions. *
Surported by Shanghai natural science foundation, P.R. China (06ZR14004).
K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 380–389, 2007. © Springer-Verlag Berlin Heidelberg 2007
A Multiagent Quantum Evolutionary Algorithm for Global Numerical Optimization
381
Enlightened by them, this paper integrates multiagent systems with quantum evolutionary computation to form a new algorithm, multiagent quantum evolutionary algorithm (MAQEA), for solving the global numerical optimization problem. In MAQEA, all agents live in a table environment. MAQEA realizes the ability of agents to sense and act on the environment that they live in by making use of the evolutionary mechanism of QEP. Each agent increases its competitive ability as much as possible during the process of interacting with the environment and other agents .
2 Multiagent Quantum Evolutionary Algorithm According to [8], when multiagent systems are used to solve problems, four elements as follow should be defined: 1) 2) 3) 4)
The purpose of each agent. The global environment of agents. The local environment of each agent. The behaviors that each agent can take to achieve its purpose.
The definitions of these elements for global numerical optimization problems are described hereinafter. 2.1 Agent for Numerical Optimization A global numerical optimization can be formulated as solving the following objective function
min f ( x), x = ( x1 , x2 ,..., xn ) ∈ S ,
(1)
where S = [ x, x ]n and S ⊆ R n defines the search space which is a n-dimensional space bounded by the parametric constraints
xi ≤ xi ≤ xi , i = 1, 2,..., n .
Here, x presents the low
bound of search space, x presents the upper bound of search space. Definition 1. An agent represents a candidate solution to the optimization problem. The value of its competitive ability is equal to the negative value of the objective function,
L ∈ S , competeAbility( L) = − f ( L)
(2)
In MAQEA, quantum chromosome is used to present an agent. One qubit is defined with a pair of complex numbers (α , β ) . An agent represented by quantum chromosome with m-qubit is defined as:
⎡α j α 2j ... α mj ⎤ L = {q1 , q2 ,..., qn }, q j = ⎢ 1 ( j = 1,..., n) , j j j⎥ ⎣β1 β 2 ... β m ⎦
(3)
382
C. Qin, J. Zheng, and J. Lai
where | α i |2 + | β i |2 = 1 . This presentation contains any superposition of agent states and has a better characteristic of diversity than classical version. For example, if there is a two-qubit system with two pairs of complex numbers such as: ⎡ ⎢ ⎢ ⎢ ⎢⎣
1 2 1 2
1 ⎤ . 2 ⎥ ⎥ 3⎥ 2 ⎥⎦
(4)
the states of the system can be represent as: 3 1 1 3 . | 11 > | 00 > + | 01 > + | 10 > + 2 2 2 2 2 2 2 2
the
result
above
means
that
the
appear
(5) probability
of
states
| 00 >, | 01 >, | 10 >, | 11 > are 1 , 1 , 3 , 3 . In this example, two qubits are enough to represent 8 8 8 8
four states, but in the classical representation, at least four chromosomes (00),(01),(10),(11) are needed. Definition 2. All agents live in a table environment. The size of this table environment is Lsize × Lsize . Each agent is fixed on a cell in the table and it can only interact with its
(i, j ) is represented as Lij , i, j = 1, 2,K , size , then the neighbors of Lij , LocalEnvLij , are defined as
neighbors.
Suppose
that
the
agent
located
at
follows:
LocalEnv Lij = {Li′j , Li′j′ , Li′j′′ , Lij ′ , Lij′′ , Li′′j ′ , Li′′j ′′ , Lij } ,
where
(6)
⎧i − 1 i ≠ 1⎫ ⎧i + 1 i ≠ Lsize ⎫ i′ = ⎨ ⎬, i′′ = ⎨ ⎬, 1 L i = ⎩ size ⎭ ⎩ 1 i = Lsize ⎭ . ⎧ j − 1 j ≠ 1⎫ ⎧ j + 1 j ≠ Lsize ⎫ j′ = ⎨ ⎬, j′′ = ⎨ ⎬ j = Lsize ⎭ ⎩ Lsize j = 1⎭ ⎩ 1
In MAQEA, all agents are represented by quantum chromosome. Each agent is on probabilistic representation of states. Definitely states of each agent can only be observed through qutbit collapsing. One quantum chromosome contains the information of several agents. Using quantum chromosome to describe individual agent bring a better diversity than MAGA and other classical algorithms. To achieve its purpose, each agent will compete and cooperate only with its neighbors so that it can improve its competitive ability. There is no global fitness and global selection at all, which is absolutely necessary in classical genetic algorithm and QGA. In this way, the evolution manner of MAQEA is very close to the real evolutionary mechanism in nature.
A Multiagent Quantum Evolutionary Algorithm for Global Numerical Optimization
L1,1
L1,2
L1,3
…
L1,size
L2,1
L2,2
L2,3
…
…
L3,1
L3,2
L3,3
…
…
…
…
…
…
…
Lsize,1
…
…
…
Lsize, size
383
Fig. 1. Model of agent table environment. This shows each table cell represents an agent, the neighbors of L 2, 2 is LocalEnvL2, 2 = {L1,1 , L1, 2 , L1,3 , L2,1 , L2,3 , L3,1 , L3, 2 , L3, 3 } .
2.2 Quantum Evolutionary Operators
Each agent takes some behaviors to increase its competitive ability and achieves its purpose. Three quantum evolutionary operators are designed in this paper. The neighborhood competition quantum- evolution operator realizes the behaviors of competition. The neighborhood quantum crossover operator realizes the behaviors of cooperation. The self-learning operator realizes the behaviors of learning. Suppose the agent system is on t-th generation, the three operators are performed on the agent located at (i, j ) , Ltij = (q1t , q2t ,..., qnt ) , max Ltij = (mq1t , mq2t ,..., mqnt ) is the agent with maximum t
competitive ability among the neighbors of Lij , and bestLt = (bq1t , bq 2t ,..., bq nt ) is the agent with maximum competitive ability among the whole system till t-th generation. Neighborhood competition quantum-evolution operator: If the competitive ability of agent max Lij is larger than the competitive ability of agent Lij , agent Lij will be replaced by a new agent newLij = (eq1 , eq 2 ,..., eq n ) . There are two strategies to produce a new agent. Given probability pt , if U (0,1) < pt ,occupying strategy 1 is selected; otherwise occupying strategy 2 is selected, where U (0,1) is a uniform random number generator. In strategy 1:
eqk = U * mqk , k = 1,2,..., n .
(7)
eqk = U * qk , k = 1,2,..., n .
(8)
In strategy 2:
where U is a quantum rotation gate:
384
C. Qin, J. Zheng, and J. Lai
⎡cosθ U (θ ) = ⎢ ⎣ sin θ
− sin θ ⎤ , cosθ ⎥⎦
where θ is the rotation angle, we can look up it from reference [7]. In strategy 1, the evolution operator makes use of the information of local winner and current maximum individual. In strategy 2, the evolution operator makes use of the information of loser and current maximum individual to increase the probability toward global best individual. Neighborhood quantum crossover operator: Neighborhood quantum crossover operator is performed on agent Lij and its neighbors, the same number of agents are generated. The half of these agents whose competitive ability are higher than the others are retained. Quantum chromosome of an individual and its neighbors are listed in table 1. Table 2 shows the results after perform neighbor quantum crossover operator. Table 1. Quantum chromosome of an individual and its neighbors
A1 B1 … H1
A2 B2 … H2
A3 B3 … H3
… … … …
… … … …
Table 2. Quantum chromosome after perform quantum crossover operator
A1 B1 … H1
H2 A2 … G2
G3 H3 … F3
… … … …
… … … …
In classical genetic algorithm and evolutionary programming, crossover operator was only taken between two individuals and the information was shared in a very small scale. In MAQEA, this operator share the information of agent Lij and its neighbors among local environment, and achieve the purpose of cooperation. Self-learning operator: Agents have knowledge which is related to the problems that they are designed to solve. MAGA uses a small scale MAGA to realize the self-learning operator for numerical optimization problems. Enlightened by it, we propose the self-learning operator which uses a small scale MAQEA to realize the behavior of using knowledge. The self-learning operator used in MAQEA can be seen as a local searcher. Self-learning operator can be described as: Firstly, a sub-agent system subLsubSize × subLsubSize is generated by giving search radius sR . All new agents here are generated by observing quantum chromosome, or called qubit collapsing. Then, the neighborhood competition quantum-evolution operator and the neighborhood quantum crossover operator are iteratively performed
A Multiagent Quantum Evolutionary Algorithm for Global Numerical Optimization
385
on the sub-agent system. Finally, find out the agent with maximum competitive ability from the sub-agent system. 2.3 Multiagent Quantum Evolutionary Algorithm
In MAQEA, all agents are on probabilistic states. The representation of agents by quantum chromosome has the advantage of representing any superposition of states and contains all possible agents. MAQEA uses quantum Hadamart gate[11] to initialize the 1
system and set the probability of every qubit of quantum chromosome as 1/ 2 2 .Each qubit has the same probability to collapse toward 0 or 1. This property make MAQEA avoid trapping into local maximum at the very beginning of searching. The neighborhood competition quantum-evolution operator is performed on each agent. As a result, the probability to get the agents with low competitive ability decrease and the probability to get the agents with high competitive ability increase. The neighborhood quantum crossover operator make each agent can cooperate with its neighbors. In order to reduce the computational cost, the self-learning operator is only performed on the best agent in each generation, but it has an important effect on the performance of MAQEA. To make quantum chromosomes suitable for numerical optimization problems, the search space should be mapped to the space [0,2m ] ,where m is the length of quantum chromosomes. The details of MAQEA are described as follow: 0
Step 1 Initialize L (0) and bestL , set the probability of each qubit of quantum 1
chromosome 1 / 2 2 , t ← 0 ; Step 2 Observe(collapse) L (t ) states to get Q (t ) ; Step 3 Compute each agent competitive ability among Q (t ) ; Step 4 Perform the neighborhood competition quantum-evolution operator on each agent in L (t ) ,obtaining L (t ) ′ ; Step 5 Perform the neighborhood quantum crossover operator on each agent in L (t )′ , obtaining L (t )′′ , observing L (t )′′ to get Q (t )′ , find out bestL ; t
t′
t
Step 6 Perform self-learning operator on bestL , obtaining bestL ; Step
7
If
competeAbility(bestLt ′ ) >
competeAbility(bestLt )
,then
t′
bestL ← bestL ; t
Step
8 If termination criteria are otherwise t = t + 1 , go to Step 2.
reached,
output
bestLt and stopm,
3 Performance of MAQEA on Functions with 30 Dimensions [10] compared MAGA with four famous algorithms, fast evolutionary programming(FEP) [12], orthogonal genetic algorithm with quantization (OGA/Q)
386
C. Qin, J. Zheng, and J. Lai
[13], breeder genetic algorithm(BGA)[14] and adaptive evolutionary algorithms (AEA) [15] and the results showed that MAGA outperforms all the other methods. In order to test the performance of MAQEA , we execute the MAQEA to solve the following test functions and compare with MAGA. f1 ( x) = ∑i =1 ( − xi sin | xi | ), S = [ −500,500]n , f min = −418.983n;
(9)
f 2 ( x) = ∑i =1[ xi2 − 10 cos(2πxi ) + 10], S = [−5.12,5.12], f min = 0;
(10)
1 n 2 ∑ xi ) − n i =1 1 n exp( ∑ i =1 cos( 2πx i )) + 20 + e, S = [ − 32 ,32 ] n , f min = 0; n
(11)
n
n
f 3 ( x ) = − 20 exp( − 0 .2
1 4000 n xi cos( i =1
f 4 (x) =
∑
∏
i
f 5 ( x) =
n i =1
x i2 −
) + 1, S = [ − 600 , 600 ] n , f min = 0 ;
π⎧
(12)
k −1
2 2 ⎨10 sin (πy1 ) + ∑ ( yi − 1) × n⎩ i =1
}
n
[1 + 10 sin 2 (πyi +1 )] + ( yn − 1) 2 + ∑ u ( xi ,10,100,4), i =1
(13)
⎧ k ( xi − α ) m xi > α ⎪ − α ≤ xi ≤ α , u ( xi , α , k , m) = ⎨ 0 ⎪k ( − x − α ) m xi < −α i ⎩ 1 yi = 1 + ( xi + 1), S = [ −50,50]n ; 4
{
1 sin 2 ( 3 π x 1 ) + 10 ( x i − 1 ) 2 [1 + sin 2 ( 3 π x i + 1 )] +
f6 (x) = n −1
∑
i =1
( x n − 1 ) [1 + sin 2
2
( 2 π x n )]
S = [ − 50 , 50 ] n ;
(14) n
}+ ∑
i =1
u ( x i , 5 ,100 , 4 ),
n
f 7 ( x) = ∑ xi2 , S = [ −100,100]n ; i =1
(15)
A Multiagent Quantum Evolutionary Algorithm for Global Numerical Optimization
n
n
i =1
i =1
f 8 ( x) = ∑ | xi | +∏ | xi |, S = [−10,10]n ; n
i
i =1
j =1
387
(16)
f 9 ( x) = ∑ (∑ x j ) 2 , S = [−100,100]n ;
(17)
f10 ( x ) = max{| xi |, i = 1,2, K, n}.
(18)
f 1 ~ f 6 are multimodal functions where the number of local minima increases with the problem dimension. f 7 ~ f 10 are unimodal functions.In the following experiments, the parameters are assigned as: Lsize = 10 , pt = 0.2, subLsubSize = 5 , sR = 0.1, m = 15, ε = 10 −4 .The termination criterion of MAQEA is one of the objectives: if
f min = 0 then | f best |< ε ,or | f min − f best |< ε | f min | ,
where f min and f best represent the global optimum and the best solution found until the current generation, respectively. We performed 50 independent runs on each test function and recorded the mean number of function evaluations and the mean function value. Table 3. Comparison between MAGA and MAQEA on functions with 30 dimensions
Function
f min
MAGA
MAQEA
Mean number of function evaluations MAGA MAQEA
Mean function value
f1
-12569.5
-12569.866
-12568.98
10862
7542
f2
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 6.66×10-5 4.13×10-5 1.41×10-6 2.43×10-6 0 4.38×10-5
11427 9656 9777 10545 11269 9502 9591
8669 9141 6914 6864 7518 8363 7437
0 0
0 0
0 0
9479 9479
7938 7470
f3
f4 f5 f6 f7
f8 f9 f 10
Table 3 shows the comparison between MAGA and MAQEA on test functions with 30 dimensions. We see that the mean function values are close to the optimal ones. For termination criterion reason, mean function values of MAQEA are not good as those of MAGA .But the mean number of function evaluations of MAQEA are much better than
388
C. Qin, J. Zheng, and J. Lai
those of MAGA. Therefore, the computational cost of MAQEA is much lower than those of MAGA. Since MAGA outperforms FEP, OGA/Q, BGA and AEA, and MAQEA outperforms MAGA, we can conclude that MAQEA is more competitive than all other five algorithms on the problems studied.
4 Conclusions In this paper, a new kind of algorithm, multiagent quantum evolutionary algorithm (MAQEA), is proposed based on the combination of multiagent, quantum computation and evolutionary theory. Each agent perceives its environment and acts upon that environment. MAQEA can represent a linear superposition of all possible solutions due to probabilistic representation of agents by quantum chromosome. The neighborhood competition quantum-evolution operator leads the loser in competition to increase its competitive ability by using information of itself and the current maximum agent. The neighborhood quantum crossover operator generates new individuals with higher competitive ability by using information of neighbors. Agents make full use of knowledge of local environment while performing the self-learning operator. To summarize, rapid convergence and good global search capacity characterize the performance of MAQEA.
References 1. Shor, P.: Algorithms for Quantum Computation: Discrete Logarithms and Factoring. In: Proceedings 35th Annual Symposium on Foundations of Computer Science, pp. 124–134 (1994) 2. Grover, L.: A Fast Quantum Mechanical Algorithm for Database Search. In: Proceedings of the 28th Annual ACM Symposium on the Theory of Computing, pp. 212–219 (1996) 3. Kak, S.C.: On Quantum Neural Computing. Information Sciences, 143–160 (1995) 4. Chrisley, R.: Quantum learning. In: Pylkkonen, P., Pylkko, P. (eds.) New Directions in Cognitive Science: Proc. of Int. Symp. on Finish Association of Artificial Intelligence, Lapland, pp. 77–89 (1995) 5. Ventura, D., Martinez, T.R.: An Artificial Neuron with Quantum Mechanical Properties. In: Proceedings of the International Conference on Artificial Neural Networks and Genetics Algorithms (1997) 6. Narayanan, A., Moore, M.: Quantum-inspired Genetic Algorithm. In: Proceedings of IEEE International Conference on Evolutionary Computation. IEEE Press, Piscatawa (1996) 7. Yang, S., Jiao, L.: The quantum evolutionary programming. In: Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA’03). IEEE, Los Alamitos (2003) 8. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, New York (1995) 9. Liu, J., Jing, H., Tang, Y.Y.: Multi-agent Oriented Constraint Satisfaction. Artif. Intell. 136, 101–144 (2002) 10. Zhong, W., Liu, J., Xue, M., Jiao, L.C.: A Multiagent Genetic Algorithm for Global Numerical Optimization. IEEE Transactions on Systems, Man, and Cybernetics-part B: Cybernetics 34 (2004)
A Multiagent Quantum Evolutionary Algorithm for Global Numerical Optimization
389
11. Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information, pp. 1128–1141. Cambridge University Press, London (2000) 12. Yao, X., Liu, Y., Lin, G.: Evolutionary Programming Made Faster. IEEE Trans. Evol. Comput. 3, 82–102 (1999) 13. Leung, Y.W., Wang, Y.: An Orthogonal Genetic Algorithm with Quantization for Global Numerical Optimization. IEEE Trans. Evol. Comput. 5, 41–53 (2001) 14. Mühlenbein, H., Schlierkamp-Vose, D.: Predictive Models for the Breeder Genetic Algorithm. Evol. Computat. 1, 25–49 (1993) 15. Pan, Z.J., Kang, L.S.: An Adaptive Evolutionary Algorithms for Numerical Optimization. In: Simulated Evolution and Learning. LNCS (LNAI), pp. 27–34. Springer, Heidelberg (1997)
Developing and Optimizing a Finite Element Model of Phalange Using CT Images Qingxi Hu, Quan Zhang, and Yuan Yao Rapid Manufacturing Engineering Center, Shanghai University 99 Shangda Road, Shanghai 200444, P.R. China {huqingxi,wood,yaoyuan}@shu.edu.cn
Abstract. It is very important to construct the three-dimensional finite element (FE) model of the human skeleton for research on the mechanism of skeleton blunt-impact injury and on the assessment of injury severity. A feasible and efficient method of the skeleton finite element modeling using CT images was proposed in the paper. There are many problems during the FEA processing of complex models such as distortions of the element, numerical divergence. Three methods of mesh optimization based on stereolithography (STL) were provided. According to the analysis on the complexity of the ANSYS processing and the precision of the result, the optimizing process of developing a finite element model of phalange was proposed. The described method can generate detailed and valid three dimensional finite element model of phalange with different inner constructions and complex geometry. This method is rapid and can readily be used for other medical applications.
1 Introduction It is a well-established claim that mechanical testing is paramount, not only in aerospace, civil engineering and the automotive industry, but also in health care. Finite element modeling has been a very valuable tool for structural analysis in orthopedic biomechanics. Since 1972, many researchers have built models to study various physiological conditions [1]. In the past few years, as the modeling techniques developed, it is now possible to generate finite-element models. It takes into account the morphology of the bone segment and the distribution of the bone tissue mechanical properties of the specific subject [2][3]. Whatever the methods adopted, these models derive the necessary information to build the bone model from medical imaging data. Computed tomography (CT) represents, at present, the method of choice for the generation of these subject-specific finite-element models, since from CT data it is possible to define the geometry and the local tissue properties of the bone segment to be modeled [5][6]. However, the data processing techniques used to extract this information from the CT data may frequently be affected by no-negligible errors that propagate in an unknown way through the various steps of the model generation, affecting in an unpredictable way the accuracy of the model predictions. Furthermore, three-dimensional models sometimes require a prohibitive number of many hours to build, because of the complex geometries encountered in the human anatomy. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 390–398, 2007. © Springer-Verlag Berlin Heidelberg 2007
Developing and Optimizing a Finite Element Model of Phalange Using CT Images
391
This paper describes three methods of developing and optimizing a finite element model of human skeleton based on Materialise's Interactive Medical Image Control System (MIMICS), by using CT scan data. The same method can also be used to create patient-specific models from any other body part using either MRI or CT data [4].
2 Methods of Finite Element Modeling 2.1 Three-Dimensional Modeling of Phalange A three-step procedure was followed to generate a 3D model of an extracted human phalange.
Fig. 1. CT-scan data as seen in MIMICS 9.11. The hand is presented in three different cross-sectional views. Masks have been applied to soft tissue scale (gray) and bone scale (yellow) according to voxel density threshold, 3D representation of the bone of hand as a result of segmentation in MIMICS.
Image acquisition. The CT dataset of a hand, of a Chinese of unknown age, approved by Shanghai 9th People Hospital, is used for this paper. CT scans, in axial orientation, with 2.50mm slice thickness were obtained by the GE MEDICAL SYSTEMS. A total of 1128 slices of the bones in hand were used for modeling. Segmentation. The different hard tissues visible on the scans were identified using an interactive medical image control system (MIMICS 9.11, Materialise, Leuven, Belgium). MIMICS imports CT and MRI data in a wide variety of formats and allows extended visualization and segmentation functions based on image density threshold
392
Q. Hu, Q. Zhang, and Y. Yao
Fig. 2. CT-scan data of phalange as seen in MIMICS 9.11. Masks have been applied to phalange (blue) according to CT data extraction, and 3D representation of phalange as a result of region growing in MIMICS.
(Fig. 1). 3D objects are automatically created in the form of masks by growing a threshold region on the entire stack of scans (Fig. 1). Phalange Modeling. As developing and optimizing a finite element model is focused on in this paper, the 3D model of the phalange is the objective of the research and analysis. The phalange CT dataset is extracted from the whole hand CT data and the phalange gray region is confirmed by Region growing process. So the 3D model of phalange can be calculated. (Fig. 2) 2.2 Preprocess of the Model In order to acquire the high quality finite element model, the quality of surface triangle must be improved, including three steps: 1. Reduce the amount of triangles of your object. 2. Improve the quality of the triangles of your object. 3. Remove extra shells. Using Mimics Remesh module, the 3D model and the triangle quality histogram are shown in the screen (Fig.3 A). The quality histogram shows the amount of triangles that have a certain quality. The Remesh module, in which only some parameters should be entered, was therefore used to automatically reduce the amount of triangles and simultaneously improve the quality of the triangles while maintaining the geometry (Fig. 3B). During remesh, the tolerance variation from the original data can be specified (quality of triangles does not mean tolerance variation from the original data). The
Developing and Optimizing a Finite Element Model of Phalange Using CT Images
393
quality is defined as a measure of triangle height/base ratio so that the file can be imported in the finite element analysis software package without generating any problem. The three steps can be repeated until the quantity and quality of triangles are satisfied. The quantity of the phalange surface triangles is reduced and the quality of triangles is controlled in a suitable range (Fig.3B). The similar triangles are of avail for the choice of element type and the element meshing process. And then the remeshed surface mesh can be saved and then be exported.
Fig. 3. (A) Stereolithography triangulated (STL) file of phalange obtained through the Remesh module within MIMICS. The density and quality (aspect ratio and connectivity) of the triangles is not appropriate for use in finite element analysis. (B) STL file optimized for FEA using the Remesh module within MIMICS. Note the improved triangle shape and the intact geometry compared to Fig. 3A in spite of a significant reduction in number of triangles. And the quality of triangles is in a satisfied range, which is another important factor in next procedure, FE modeling.
2.3 Finite Element Modeling Using MIMICS Export module, phalange model was then separately converted into DXF files (*.DXF), stereolithography files (*.STL, bilinear and interplane interpolation algorithm) and Ansys files (*.lis, data of key points and areas). In order to compare the three methods of processes, the element type was chosen the same type, solid186 element, which is a higher order 3-D 20-node solid element that exhibits quadratic displacement behavior. And the meshing procedure is uniform volume free meshing.
394
Q. Hu, Q. Zhang, and Y. Yao
Method 1. The DXF files were imported into Unigraphics NX software. The surfaces of the model were extracted into faces, then reduced and reconstructed some shape faces according to the result of the face analysis. In the Shape Studio module with UG NX, some faces were simplified and smoothed. Preserving the volumetric details during the smoothing process should be paid more attention. At last, the faces were sewed into a solid (Fig.4A) body and the body was exported as Parasolid files (*.x_t) which is more suited for ANSYS interface. The Parasolid files were import into ANSYS. The phalange volume was automatic meshed into solid elements (Fig.4B). There are 15 of the 29081 selected elements violate shape warning limits in the meshing process.
(A)
(B) Fig. 4. (A) The solid body model of phalange as seen in UG screen; (B) The element model using method 1 as seen in ANSYS
Method 2. The *.lis files was read input into ANSYS, only the data of key points and areas included (Fig.5A). Using the modeling function in preprocessor module within ANSYS, the volume was created from picking all areas. And the meshing process is the same as method 1.
Developing and Optimizing a Finite Element Model of Phalange Using CT Images
395
(A)
(B) Fig. 5. (A) The areas model of phalange as seen in ANSYS; (B) The element model using method 2 as seen in ANSYS
Shape testing revealed that 287 of the 227584 new or modified elements violate shape warning limits (Fig.5B). Method 3. Native STLs are improper for use in FEA because of the aspect ratio and connectivity of the triangles in these files. A strereolithography handling software (MAGICS 11.0, Materialise, Leuven Belgium) was used in order to reestablish the congruence of the interfacial mesh of the phalange. The STL model was processed by automatic fixing, smoothing, triangles reduction and filter sharp triangles, which can be done more than once, in order to get the optimal model. The number of triangles was reduced to 2200 (Fig.6A). And then, it was exported into ANSYS files (*.lis) in the Remesh module within Magics. And the following processes are the same as method 2. Shape testing revealed that 62 of the 50370 new or modified elements violate shape warning limits (Fig.6B).
396
Q. Hu, Q. Zhang, and Y. Yao
(A)
(B) Fig. 6. (A) The optimal STL model of phalange as seen in Magics; (B) The element model using method 3 as seen in ANSYS
3 Comparing and Analysis of Methods As illustrated in the Table 1, the quantity of areas is least using method 1, so as the quantity of elements. Because the smooth process in UG software is better than other methods, the shape areas are less and the elements can be chosen as coarse size. In method 1, the quantity of elements was decided on an acceptable level, so the complexity and precision of FEA were dominated. However, using another CAD software is its disadvantage which improved the requirement of the operators and workstations. And what’s worse, the inner characters were not protected during the processes. The biggest advantage of method 2 is that the whole model preprocessing was completed only in Mimics. But the number of areas wasn’t reduced to a feasible value, which made ANSYS systems spend more time on modeling volume. ANSYS has its advantage on finite element analysis rather than model construction. It will be in need of high-powered workstation and more time to do the bone modeling in ANSYS. What’s more, the need of space of data store is much improved. The quantity of nodes
Developing and Optimizing a Finite Element Model of Phalange Using CT Images
397
and elements is raised one order of magnitude. Otherwise, the number of distortion elements is increased, as the model was not simplified and smoothed, which improved the probability of calculation divergence. Table 1. The quantity of some import parameters No. of methods 1 2 3
Nodes 46875 527623 87898
Areas 1936 9600 2200
Elements 29081 227584 50370
Distortion elements 15 287 22
Another Materialise’s system, Magics RP is also applied to in Method 3. Mimics and Magics have the same standards and interfaces for the input and output of STL files. Stereolithography triangulated (STL) file obtained through the STL+ module within MIMICS. The density and quality of the triangles is not appropriate for use in finite element modeling, proved by the results of Method 2. Magics RP afford the powerful functions on the operation of STL models. The STL file is optimized for FEA using some operation commands in Magics. It can be clearly seen the quantity of elements and the proportion of the distortion elements are both controlled within a suitable range. Compared with other two methods, it can acquire more suitable FE model for the human skeleton’s FEA.
4 Discussing Previous attempt to generate 3D models resulted in much coarser meshes [2,3], mainly due to the limitation of the geometry acquisition method, another reason being the increased memory requirements for 3D models, which did not allow fine representation of the geometry. Different approaches were proposed to access the inner anatomical detail without extrapolation and accelerate the production of the models. The approach used in the present study suggests that maximum anatomical detail is obtained by surface/interface based meshing using stereolithography (STL) surface data acquired by CT images and preprocessed within Mimics. The different parts of the model featuring different mechanical properties are identified first (segmentation process) and meshed accordingly. Elements do not overlap the different structures but strictly follow the internal boundaries, resulting in a smooth and very well controlled representation of interfaces. Significant advantages, when using STL, which can be farther optimized within Magics, also are the sophisticated visualization tools (shaded wire frame 3D views, section views etc.) and possibilities offered by the other digital CAD operations. The very user-friendly graphic interface allows for rapid modifications of the different parts and generations of new STL that can be instantly exported and volumetrically meshed the FEA program.
398
Q. Hu, Q. Zhang, and Y. Yao
5 Conclusions This investigation describes a rapid method for the generation of finite element models of human skeleton structures and restorations. Detailed three dimensional finite element models of a phalange with different methods were generated. This method 3 is rapid (a bone model may be obtained by a skilled operator in less than a workday) and can readily be used for other medical applications to create patient-specific models from any other body part using either MRI or CT data. Further, this methodology could facilitate optimization and understanding of biomedical devices prior to animal and human clinical trials. Acknowledgements. The authors thank Shanghai Education Foundation (grant 05A281) for financial support as well as Shanghai Tissue Engineering Research Center for scanning the experimental sample. The authors also wish to express their gratitude to Materialise. This study was supported by Materialise’s software (MIMICS/MAGICS products).
References 1. Huiskes, R., Chao, E.Y.S.: A Survey of Finite Element Analysis in Orthopedic Biomechanics: The First Decade. J. Biomechanics 16(6), 385–409 (1983) 2. Lin, C.L., Chang, C.H., Wang, C.H., Ko, C.C., Lee, H.E.: Numerical investigation of the factors affecting interfacial stresses in an MOD restored tooth by auto-meshed finite element method. J. Oral Rehabil. 28, 517–525 (2001) 3. Lin, C.L., Chang, C.H., Ko, C.C.: Multifactorial analysis of an MOD restored human premolar using auto-mesh finite element approach. J. Oral Rehabil. 28, 576–585 (2001) 4. Magne, P.: Efficient 3D finite element analysis of dental restorative procedures using micro-CT data. Academy of Dental Materials (2006) 5. Wang, Z., Li., H.: A novel 3D finite element modeling based on medical image for intervertebral disc biomechanical analysis. In: Engineering in Medicine and Biology 27th Annual Conference, Shanghai, China, pp. 1–4 (September 2005) 6. Taddei*, F., Martelli, S., Reggiani, B.: Finite-Element Modeling of Bones From CT Data:Sensitivity to Geometry and Material Uncertainties. IEEE Transactions on Biomedical Engineering 53(11) (November 2006)
Reverse Engineering Methodology in Broken Skull Surface Model Reconstruction Luyue Ju1, Gaojian Zhong1, and Xia Liu2 1 Shanghai University, 200444, Shanghai, China
[email protected] 2 Guizhou University 550003 Guiyang, China
Abstract. For various reasons, many people suffer from the bone defects, and how to solve this problem has become a hot topic internationally in the field of tissue engineering. Nowadays, the doctor diagnoses the state of the bone defect illness mainly depending on observing the CT images of patient; it mainly depends on the technique and experience of the doctor. In order to solve this problem, this paper focuses on using the Reverse Engineering technology to make the Skull-Repair-Technology efficient. During this process, we can construct the Repair-Sheet that fits the skull surface and then mold it. This method can shorten the time of operation, reduce the risk of operation.
1 Introduction The preparation of manuscripts which are to be reproduced by photo-offset requires special care. Papers submitted in a technically unsuitable form will be returned for retyping, or canceled if the volume cannot otherwise be finished on time. 1.1 Skull-Repair-Technology Because of being hit, dropped down, traffic accidents ,and etc. in the life, some patients suffer the skull bone defects, and it is so painful and difficult to recover, so the operations of skull repairing is necessary in the modern medicine. Presently, doctors use the CT measurements to diagnose the state of an illness, judging the situations of the injury with their experience and based their imagination of the 3D structure of bones and the tissues which surround the defect to make the plan of the operation. Before the operation starts, preparing the net-boards made of titanium alloy and trim it accord the square of the defect by handwork, then rivet it on the skull with the bolts. Due to the titanium materials are hard to form, the net-boards need to shape several times. It prolongs the time of the operations, and reduce the plasticity and intensity of the titanium net-boards, increase the number of bolts, make the cost too much, and the pains and dangers increase as well. 1.2 The Reverse Engineering and the Mainly Process Reconstruction Reverse engineering is an emerging technology that promises to play a role in reducing product development time. Reverse engineering in this paper refers to the process K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 399–405, 2007. © Springer-Verlag Berlin Heidelberg 2007
400
L. Ju, G. Zhong, and X. Liu
of creating engineering design data from existing parts. It creates or clones an existing part by acquiring its surface data using a scanning or measurement device. The developments of the reverse engineering and the RP-Technology make the digital Skull-Repair-Technology possible. First getting the data of the damaged skull from the CT devices, and extracting the data of profiles meanwhile—series of curves include the apart curves on the injured skull. And then use the reverse technology to repair the defect data, fitting the contour lines, blending the apart curves, finally construct the repaired body surface through the blended curves, and then export the STL files from the system. Then producing the defect of the skull and make the whole skull model integrate with the RP (Rapid Prototyping) method from the STL files mentioned above. The doctors analyze the models which were reconstructed, work out the plan of the operation to transplant the net-boards on the patient‘s broken skull.
2 Reconstruction of Broken Skull Surface Model Based on Reverse Engineering The reconstruction of the broken skull from 3D models contains two parts: A. The reconstruction of the skull images (digital data) with the defect; B. The reconstruction of the skull 3D model. The details will present in the following paragraphs. For A, if we want to get a well-proportioned image data, i.e. an uniform points cloud, should do some steps as following,1.Image filtering;2.Image segmentation; 3.Extract the profiles For B, using the method of blending profiles-curves, to build integrated closed curves of the skull, and then construct the surface of the transplantation, output the STL files of the modeling result. After machining the transplantation in the way of RP, the preparation of the operation is finished. Figure 1 show the steps of the data reconstruction.
Fig. 1. The steps of the skull reconstruction
2.1
Skull Data Acquisition and Processing [1][2][5]
The mainly task of the data acquisition is to obtain the 3D information of the skull profiles. With the reverse engineering technology to repair the skull is basically on
Reverse Engineering Methodology in Broken Skull Surface Model Reconstruction
401
data acquisition, and it is the source of the 3D points which to reconstruct the skull surface. Computed Tomography (CT), can deal with both outside and inside surface feature exactly measurement. The measured data are sequential points which scatter in layers and they form the section scanning lines, the points are dense and integrated. The scanned skull points cloud is show in Figure 2, the vacancy in the middle of the points cloud is the defect to repair. A Image Acquisition Image data are usually from the medical imaging equipment such as Computed Tomography (CT), Magnetic resonance imaging (MRI). The data of this paper are from the CT scanner. B Digital Image Processing (1) Noise reducing and filtering. The electrical equipment such as CT scanner will produce some noisy signals when it works, so the images should be reduced the noise by filtering. median filtering 、 neighborhood filtering 、 grayscale-opening filtering 、 grayscale corrosion filtering, etc. Take 3x3median filtering for an example, the grayscale of a pixel in the middle of the 3x3 array is the average (or weighted average) grayscale value of the 8 neighborhood pixels in this 3x3 array.
Fig. 2. The points cloud data of a broken skull
(2) Image division. In the study and application, we just interest in some parts of the whole image, we call them target, and the other we call background. Both target and background are special areas in the image we get. So we extract the target to recognize, analyze the parts farther that the pursuit of we need. Figure 3 is a skull CT image and a grayscale histogram of this image. It shows that in the histogram the grayscale are sharper in some areas than the others, because these areas contain the ectocinerea, white matter of brain, and the bones. So it is useful to analyze the image visualized after dividing the image. We can use the grayscale histogram to find out a grayscale threshold interval, ignore the areas which out of the interval, and make them be the background. Then the
402
L. Ju, G. Zhong, and X. Liu
Fig. 3. A skull CT image and its grayscale histogram
image is made up of solid area (colored black) and non-solid area (colored white). The two-valued change simple the complicated information which we get from the image, make the subsequent work convenient. (3)Profiles extracting. The profiles curves information in the graph is the gradient of the grayscales computing between the pixels, and the maximum gradients are identified the profiles. The Roberts, Kirsch, Gaussian-Laplacian arithmetic operators can be used in profile curves extracting. The profile line is shown as figure 4.
Fig. 4. The profile of the skull
2.2 Surface Modeling The modeling is the key point of the skull repairing, and is the most difficult step ,because there are no points cloud data on the defect, we can not build the surface directly with the lofting and scanning methods. Considering the shape feature of the skull, it is not smooth and the boundary shape around the skull is very complicated, so the regular methods for constructing the surface are unavailable here. The new method is to construct a group of contour lines on the defect part, on account of no points cloud of the defect part, we can use the points around to build the broken contour lines, and then blend them. Finally the blending curves can be used in surface construction.
Reverse Engineering Methodology in Broken Skull Surface Model Reconstruction
403
Fig. 5. The profile curves with defects
A The Profile Curves Construction Bezier curve is widely used in geometric modeling, and it is the most important and elementary tool in surface modeling. Quadratic and cubic Bezier curves are used frequently in engineering, the higher degree curves can describe the more complicated shapes, but the calculation of the generating the curves also increase more. So we need to divide high degree Bezier curves into quadratic and cubic curves, and keep the Bezier curves continuous in G0 or G1 The very first step we need to do is to construct the Bezier curves with the profile curves which we have extracted. And in this paper we use the cubic Bezier curves to reconstruct the skull profile. Define cubic Bezier curves as following: 3
P(t ) = ∑ Pi Bi , 3 (t )
t ∈ [0,1]
n =0
(1)
In the function Pi -Control Point, Bi ,3 (t ) -Bernstein function
Pi = [xi , yi ]
T
Bi ,3 (t ) = C ni t i (1 − t )
n −i
t ∈ [0,1]
(2)
From (1) and (2), we get ⎡1 − 3 3 − 1⎤ ⎢0 3 − 6 3 ⎥ 2 3 ⎥T = GM BT ( G = [P0 , P1 , P2 , P3 ], T = 1, t , t , t ) P(t ) = G ⎢ ⎢0 0 3 − 3⎥ ⎥ ⎢ 0 0 0 1⎦ ⎣
[
]
In the end, we get the basis matrix M B of the cubic Bezier curve from P(t ) . And the profile curve is generated as figure 6.
404
L. Ju, G. Zhong, and X. Liu
Fig. 6. A single blended curve
B Curves Blending After the previous work, we can get a series of profile curves. Because of the defect, the broken curves need to blend to be a series of continuous integrated curves. Selecting a curve for an example, as figure 6, A and B are two broken curves we get from the skull data, and C is the curve we want to blend. For connecting perfectly between A B and C, we expect the constructed curve C could connect A and B continuous in G2. Figure 7 is shown the integrated curves after blending.
Fig. 7. The blended integrated profile curves
C Surface Reconstruction [3] Surface reconstruction is the most important step in our work .We expect the surface we construct simple but exactly. Considering the surface feature of skull and the data we have got, we start with Bezier patch for the surface. Bezier curve has a characteristic polygon and Bezier patch has a characteristic polyhedron similarly. The points of characteristic polyhedron are Pij(i=0,1,2,3,…m; j=0,1,2,3,…n) as are called control points. There are (n + 1) × (m + 1) control points in all and they construct control polygons along u axis and v axis severally, which construct the characteristic polyhedron (control characteristic mesh). Be similar with the Bezier curves, the Bezier surfaces also have shape feature control points which determine the general shape of Bezier surface S(u, w), and P(u, w) approximate the feature meshes. The surface is bicubic Bezier surface. When m=n=3 ,the bicubic Bezier surface Pij(i=0,1,2,3,; j=0,1,2,3) contains 16 control points, the parameter equation is: 3
3
S (u , w) = ∑∑ Bi ,3 (u )B j , 3 (w)Pij i =o j =0
u, w ∈ [0,1]
Reverse Engineering Methodology in Broken Skull Surface Model Reconstruction
405
Fig. 8. The repaired surface of the skull with a defect
We finish the integrated Bezier surface through a series of curves which generate from the original data from the CT scanner. The broken skull finally is repaired after the operation, shown as figure 8. 2.3 Rapid Prototyping [4]
We can make a modification to the skull defects surface which we construct to work out defects repairing models that match the skull. Then transfer to STL format which can be used to rapid prototyping. In the end, import STL file to rapid prototyping machine to produce the repairing parts
3 Conclusions Skull fracture is a kind of high energy injuries which is mostly caused by press, crush, roll or purler from high. Traditional surgery is not so efficiency here. We use reverse engineering technology to repair skull in the research. As a result, diagnose and cure for bone defects will be less reliant on the experiences of the doctor. In this way, we not only provide stability technical base for skull rapid repairing but also shorten the surgery time, reduce surgery risk, alleviate the patients’ pain. Otherwise, the technology has a property of flexible and efficient as we adopt digital produce technology, so the costs will be reduce greatly.
References 1. Tao, J., Shuiguang, T., et al.: Reverse engineering technology. China Machine Press, Beijing (2003) 2. Hong, X.: The study of basing on ICT CAD modeling technology. Northwest polytechnic university, pp. 21–23 (1998) 3. Xinxiong, Z., et al.: Free style curves and surface modeling technology. Science Press, Beijing (2000) 4. Weijun, L.: Rapid prototyping and applications. China Machine Press, Beijing (2005) 5. Liu, S., Ma, W.: Seed-growing segmentation of 3D surface from CT contour data. Computer-aided Design 31(8), 485–540 (1999)
Identification and Application of Nonlinear Rheological Characteristics of Oilseed Based on Artificial Neural Networks Xiao Zheng1, Guoxiang Lin1, Dongping He2, Jingzhou Wang1, and Yan You1 1
Department of Mechanical Engineering, Wuhan Polytechnic University, Wuhan 430023, P.R. China
[email protected],
[email protected],
[email protected],
[email protected] 2 Department of Food Science and Engineering, Wuhan Polytechnic University Wuhan 430023, P.R. China
[email protected]
Abstract. Oilseed would display the characteristics of viscous-elastic-plasticity during pressing. The apparatus and method were successfully developed to measure the rheological properties of rapeseed and dehulled rapeseed, in which creep test was used under different stress. By using of artificial neural networks, the identification model of nonlinear rheological characteristics of rapeseed and dehulled rapeseed were developed on the basis of the creep test. Results indicated that the model simulated the nonlinear rheological characteristics very well. Compared to date fitting method and theoretical analysis method, the method of identification of rheological characteristic for oilseeds by using artificial neural networks is both simple and effective. The critical pressing time of rapeseed and dehulled rapeseed were determined by using of simulated creep curves.
1 Introduction Rapeseed oil is important edible oil in the world, especially in China. The mechanical pressing is the most common method for rapeseed oil extraction in the world [1-2]. Vegetable oilseed expresses complex mechanics behavior during pressing. Pore space is gradually dwindled and gas is gradually vented inside oilseed bed with increasing pressing pressure. Oilseed becomes close granular materials due to elastic-plastic deformation. Then oilseed becomes fluid-solid coupling material owing to the cell wall of oilseed and granule broken, and oil extracted. Last, oilseed becomes oilseed cake for broken oilseed granule bonds each other [3]. Hysteretic deformation inside oilseed takes place for oil flows sluggishly due to impediment. Oilseed is viscouselastic-plasticity material because of complex mechanics behavior [1]. The deformation of viscous-elastic-plasticity is nonlinear, which relates to both pressing pressure and pressing time. The nonlinear rheological characteristic is macro-behavior for oilseed. It describes macro-mechanics characteristics for granular material, fluidsolid coupling material and cake material during pressing. The rheological K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 406–413, 2007. © Springer-Verlag Berlin Heidelberg 2007
Identification and Application of Nonlinear Rheological Characteristics of Oilseed
407
characteristic is the most important performance of physical mechanics of oilseeds, which is foundation for pressing theory. The mechanical and physical properties for pressing of oilseeds are essential to rigorous theoretical analysis of mechanisms and physical processes. The rheology of oilseed indicates that volume of oilseed changes as changing pressing time under constant pressing pressure. There is a limiting compression for oilseed volume under fixed pressing pressure. Therefore, there is a critical pressing time for oilseed. The identification of nonlinear rheological characteristics and critical pressing time of oilseed provide scientific basis for designing presser and optimization of pressing process [4]. There is a lack of understanding and clear insight into the mechanics behavior during pressing [2]. Several of studies are now directed at understanding the mechanism of operation in order that better presser may be developed [1-4]. But the researches for rheological characteristics and critical pressing time of rapeseed and dehulled rapeseed have not been reported as yet in the world up till now. It was found very difficult to develop the theoretical model of viscous-elasticplasticity for oilseeds as the complexity result of physical mechanics performance during pressing [1, 4]. At present, multivariable nonlinear regression analysis is most common method to develop empirical formula to simulate viscous-elastic-plasticity for complex material. However, the difference in the variable used in the analytical model and the details of the experiment will lead to significant diversity in the calculation formulas, and furthermore there is usual a difficulty to select suitable regression equation used in multiple regression analysis, which requires considerable technique and experience due to understanding of the data characteristic of stressstrain experiment. The objectives of this study were to develop the neural network modeling to identify the nonlinear rheological characteristic, and determine the critical pressing time of rapeseed and dehulled rapeseed during uniaxial pressing.
2 Rheological Experiment The detail of the schematic of the test cell was described in [1-4], which consisted of a loading piston, an outer cylinder, an inner cylinder, a sealing ring, a support plate, a porous stone, and a base plate. The diameter and the deep of cylindrical chamber were 44 mm and 95 mm, respectively. The loading piston and the base plate were provided with rubber sealing ring for the purpose of preventing leakage of oil from the block cell. The support plate made of stainless-steel with several 3 mm diameter traverse holes distributed uniformly was designed to prevent porous stone from breaking. In order to ensure uniform fluid pressure injected into and out of oilseed cakes, the bottom of loading piston and the top of base plate were provided with radial and circular grooves 5 mm width ×5 mm depth. Having been cleaned and sieved, the samples were chosen, which were made up from uniform granules. A 30g sample was chosen as the testing specimen for the experiment. After the specimen poured into the compression-permeability cell, the cell was mounted in a computer-controlled precision universal test machine capable of applying compressive loads of 300KN. Six series of the experiment for rapeseed and dehulled rapeseed had been carried out respectively. Six pressing pressures
408
X. Zheng et al.
(10, 20, 30, 40, 50, and 60MPa) were used for the specimens of rapeseed and dehulled rapeseed. The controlled pressure method was adopted, that was; the pressing pressure was maintained for 30 minutes when desired pressure was reached [5].
3 Test Results Defined the pressing stress σ and the axial strain ε are as follows
F A
(1)
ΔH H0
(2)
σ=
ε=
Where F is the pressure act on the material surface, and A is the area of section of the material, and H0 is the initial height of the material, and ΔH is the displacement of the material. Fig.1-2 showed the creep curves of rapeseed and dehulled rapeseed when the pressing pressure is invariable, which indicated that the nonlinear rheological characteristic of rapeseed and dehulled rapeseed is obvious, All the creep of rapeseed
(a)
(b)
Fig. 1. Creep test result. (a) Rapeseed; (b) Dehulled rapeseed.
(a)
(b)
Fig. 2. Creep curve. (a) Rapeseed; (b) Dehulled rapeseed.
Identification and Application of Nonlinear Rheological Characteristics of Oilseed
409
and dehulled rapeseed are include two stages, which are the attenuation creep and the constant speed creep, and the duration of attenuation creep is short while the constant speed creep is long.
4 Identification of Nonlinear Rheological Characteristics 4.1 BP Neural Networks Algorithm and Structure The artificial neural networks are the architecture models of the human brain neuron, which is a kind of nonlinear kinetics system. One of the basics functions of artificial neural networks is the approaching of function and curve [6-7]. It had been proved in theory that feed-forward neural networks trained with the back propagation (BP) can approximate continuous function and curve with arbitrary precision. BP Neural Network, which is a back propagation network, consists of input layer, hidden layer and output layer. The input layer distributes the input data to the processors in the next layer, the output layer transmits the response of the network to the real world, and the hidden layer carries out nonlinear mapping. Learning train is divided into two processes, called forward-propagation and back- propagation respectively. The forward propagation transmits input information to the output layer though the input layer by dealing with the hidden layer. If the output result, which is generated by output layer in the forward propagation, does not match the expectation, the back propagation will be carried out and the error signal will be back though original path and the weights of every layer will be modified at the same time. Forward and back propagation are repeated until the prescribed error is met. The training learning process of artificial neural networks is actually one process of identification. So, BP neural network have been widely used in system identification to identify complex nonlinear system [8-9]. The system identification is an equivalent model, which is obtained from a group of given model on the basis of input and output data gained by experiments for an unknown system. The experiment indicated that the rheology of oilseed during pressing was nonlinear. In this study, neural networks modeling techniques with BP network was used to identify the nonlinear rheological characteristics. Fig.3 is the network model, which have r-inputs and one hidden layer. s1× r s 2 × s1
r×q
s1× q s2 × q
s1×1 s 2 ×1
Fig. 3. Neural network model. P is input matrix, which has r-input nerve nodes, W1 is the weight matrix of the input layer, B1 is the deviation matrix of the input layer, F1 is the active function of the hidden layer, A1is the output matrix of the hidden layer, which has s2 nerve nodes, W2 is the weight matrix of the output layer, B2 is the deviation matrix of the output layer, F2 is the active function of the output layer, and A2 is the output matrix of the output layer.
410
X. Zheng et al.
A. Forward Transfer of Information The ith node output of the hidden layer is r
a1i = f 1(∑ w1ij Pj + b1i ), i = 1,2,⋅ ⋅ ⋅, s1
(3)
j =1
Where a1i is the ith node output of the hidden layer, f 1(⋅) is the active function of the hidden layer,
w1ij is the connection weight form the jth input node to the ith hidden
node, Pj is the jth input, and b1i is the ith node bias value of the hidden layer. The kth node output of the output layer is s1
a 2 k = f 2(∑ w2 ki a1i + b 2 k ), k = 1,2,⋅ ⋅ ⋅, s 2
(4)
i =1
Where a2k is the kth node output of the output layer, f 2(⋅) is the active function of the out layer, w2ki is the connection weight form the ith output node of the hidden layer to the kth output node of the output layer, and b2k is the kth node bias value of the output layer. Adopting the error function as follows
E (W , B ) =
1 s2 ∑ (t k − a 2 k ) 2 2 k =1
Where E (W , B ) is the error function of the output,
(5)
tk is the kth node objective value
of the output layer, and a 2 k is the kth node output of the output layer. B. Change Weight Using Gradient Descent Algorithm The weight from the ith input to the kth output is
Δw2 ki = −η
∂E ∂a 2 k ∂E = η (t k − a 2 k ) f 2’a1i = ηδ ki a1i ⋅ = −η ∂a 2 k ∂w2 ki ∂w2 ki
(6)
Where Δw2ki is the change in weight of the output layer, η is the learning rate, f 2 ' is the active function derivative of the output layer, δ ki = (t k − a 2 k ) f 2 ' = ek f 2 ' ,
ek = t k − a 2 k , where δ ki is the error form the ith output node of the hidden layer to the kth output node of the output layer, and ek is the kth output error of the output layer. In the same way
Δb 2 k = −η
∂E ∂E ∂a 2 k = −η ⋅ = η (t k − a 2 k ) f 2 ' = ηδ ki ∂b 2 k ∂a 2 k ∂b 2 k
Where Δb2k is the change of the kth node bias value of the output layer.
(7)
Identification and Application of Nonlinear Rheological Characteristics of Oilseed
411
The weight from the jth input to the ith output is
Δw1ij = −η
∂E ∂E ∂a 2 k ∂a1i = −η ⋅ ⋅ ∂w1ij ∂a 2 k ∂a1i ∂w1ij
(8)
s2
= η ∑ (t k − a 2 k ) f 2 w2 ki f 1 p j = ηδ ij p j '
'
k =1
Where
Δw1ij is the weight change of the hidden layer, f 1' is the active function s2
derivative of the hidden layer,
δ ij = ei f 1' , ei = ∑ δ ki w2 ki , δ ki = ek f 2 ' , k =1
ek = t k − a 2 k , where δ ij is the error form the jth input node of the input layer to the ith output node of the hidden layer, and
ei is the ith node output error of the hidden layer.
In the same way
Δb1i = ηδ ij Where
(9)
Δb1i is the bias value change of the hidden layer.
4.2 Neural Network Identification One hidden BP neural network was used, which had one input node representing press-time-sequence and five hidden nodes. The Sigmoid function f 1( s ) = (1 + e − s ) −1 was chosen as active function f1(s). The output layer had one node representing axial-strain-sequence, and linear function was selected as active function f2(s). The result of rapeseed and dehulled rapeseed testing had been taken as samples. 9 and 2 data were chosen randomly as training and testing sample
(a)
(b)
Fig. 4. Identification of nonlinear rheological. (a) Rapeseed; (b) Dehulled rapeseed.
412
X. Zheng et al.
respectively. The error function is E =
1 9 ∑ (tk − a 2k )2 . 0.001 and 1000 were used as 2 k =1
the error tolerance and the maximum number of training cycle respectively. Fig. 4 showed the identification results of the nonlinear rheological characteristics of rapeseed and dehulled rapeseed, which met the given error of. In fact, the train times were less than 100.
5 Identification of Critical Press Time The identification result of the nonlinear rheological characteristic showed that all the creep curves of rapeseed and dehulled rapeseed under different press pressure tended towards horizon finally. It indicated that the oil creep was zero, the oil volume tended towards stabilization, the oil extraction tended towards zero, and the oil pressing reached balanced state at that time. So, the creep curves can be used to identify the critical press time under different press pressure. The starting time of horizon creep curve can be defined as critical press time. Oil creep tended towards infinitesimal when press time tended towards infinite in theory, so the press time is infinite. However, the critical press time can not be infinite and is very finite in fact. Fig.4. showed that the creep decreased rapidly as increasing press time, and tended towards horizon at fixed value. In actual engineering, the critical press time can be selected by the creep curves according to Fig.4. The creep was very small and the creep curve tended towards horizon when the press time was greater than a fixed value, so the press was considered to reach balance state, and the critical press time of rapeseed and dulled rapeseed could be determined approximately, which were shown in Table1. Table 1. Critical press time Pressing pressure σ/MPa
10
20
30
40
50
60
Critical press time of rapeseed/min
35
40
45
50
60
70
Critical press time of dehulled rapeseed/min
35
45
50
60
70
80
6 Conclusions In pressing process, rapeseed and deulled rapeseed clearly exhibits nonlinear rheological characteristics, which can be identified well using artificial neural networks. Compared to date fitting method and theoretical analysis method, the former need to predetermine mathematic function and the later need complex viscouselastic-plasticity theory, artificial neural networks is simple and convenient, which only need a simple network configuration. In this paper, the neural networks simulating the nonlinear rheological characteristic of rapeseed and dulled rapeseed only consisted of one input layer, one hidden layer and one output layer, the network structure was simple, but the simulation precision was higher than that of nonlinear viscous-elastic-plasticity constitutive model established by theoretical model
Identification and Application of Nonlinear Rheological Characteristics of Oilseed
413
integrating empirical model. The identification result, which was a perfect method for its popularization and application to other oilseeds, is both feasible and effective. Characteristics can be identified perfectly using artificial neural networks. The critical pressing time of rapeseed and dehulled rapeseed were determined using simulated creep curves under different pressure.
References 1. Xiao, Z., Guoxiang, L., Zhi, L., Shaomei, W.: Nonlinear viscous-elastic-plasticity constitutive model of rapeseed and rapeseed kerne. Transactions of the Chinese Society for Agricultural Machinery 11, 87–91 (2005) 2. Xiao, Z., Zhi, L., Guoxiang, L., Shaomei, W.: Research on stress-strain of rapeseed by uniaxial pressing under double-surface for flow of fluids through a porous medium. Journal of Agricultural Mechanization Research 6, 187–189 (2004) 3. Xiao, Z., Shan, Z., Guoxiang, L., Shaomei, W.: Research on stress-strain of rapeseed and decorticated rape seed by uniaxial cold pressing under single surface for flow of fluids through a porous medium. China Oils and Fats 7, 11–14 (2004) 4. Xiao, Z., Dongping, H., Guoxiang, L., Yan, Y., Jingzhou, W.: Identification of nonlinear rheological characteristics of sesame and peanut by using of neural networks. Cereals and Oils Processing 11, 59–61 (2006) 5. Ning, X.: Study of some problems in rheological test of soils. Journal of Yunnan Institute of Technology 4, 76–82 (1994) 6. Jian, Y., Xu, B., Yang, H.: Noise identification for hydraulic axial piston pump based on artificial neural networks. Chinese Journal of mechanical engineering 1, 120–123 (2006) 7. Tao, S., Guangyi, C., Xinjian, Z.: Nonlinear modeling of PEMFC based on neural networks identification. Journal of Zhejiang University Science 5, 365–370 (2005) 8. Jiangang, Y.: Neural networks. Press of Zhejiang University, Hangzhou (2001) 9. Wei, W., Yaolin, Z.: A new method of recognition by using neural network. Process Automation Instrumentation 6, 18–20 (2002)
Prediction of Death Rate of Breast Cancer Induced from Average Microelement Absorption with Neural Network Shouju Li1,*, Jizhe Wang1,2, Yingxi Liu1, and Xiuzhen Sun2 1
State Key Laboratory of Structural Analysis for Industrial Equipment, Dalian University of Technology, Dalian 116024, China
[email protected] 2 The second Affiliated Hospital Dalian Medical University, Dalian 116024, China
,
Abstract. Breast cancer is one of the leading causes of deaths from cancer for the female population in both developed and developing countries. The average microelement absorption can affect death rate of breast cancer. Artificial neural networks have been successfully applied to problems in the prediction of death rate of breast cancer induced from average microelement absorption. To predict the death rate of breast cancer induced from average microelement absorption using artificial neural network is feasible and a well trained artificial neural network by Levenberg-Marquardt algorithm reveals an extremely fast convergence and a high degree of accuracy. The investigation demonstrates that the proposed training and forecasting procedure is almost 100 times faster than that of classical BP algorithm and poses higher forecasting precision. With the growth of the database, more and more cases will be collected and used as training set.
1 Introduction Breast carcinoma is the second leading cause of cancer related deaths in women of the western world. Each year, 211,240 women are diagnosed with breast cancer and 40,870 of them will die in 2005. This high death rate has stimulated extensive researches in breast cancer detection and treatment. Despite significant improvements in cancer diagnosis and treatment approximately a quarter of breast cancer patients will die of their disease. Clinical decision-making is a challenging, multifaceted process. Its goals are precision in diagnosis and institution of efficacious treatment. Achieving these objectives involves access to pertinent data and application of previous knowledge to the analysis of new data in order to recognize patterns and relations. This process may be difficult because the data may be incomplete, imprecise or unavailable. Additionally, the analysis may be incomplete or imprecise, particularly if the data are viewed through a limited or incorrect window[1]. Diagnosis of diseases may be considered as a pattern classification task. In clinical settings, cardiologists gather diagnostic data in several formats, including medical history, physical examination, electrocardiograms, echocardiograms, nuclear studies and cardiac catheterization and angiographic studies. Practitioners apply various statistical techniques in processing these data to assist in clinical decision-making and to facilitate the management of patients. The conventional approach to build an expert system requires the formulation of rules by which the input data can be analyzed. The formulation of such rules is K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 414–421, 2007. © Springer-Verlag Berlin Heidelberg 2007
Prediction of Death Rate of Breast Cancer
415
very difficult with large sets of input data. Neural network (ANN) has been applied as an alternative to conventional rule-based expert system. ANNs can be trained without encapsulating the knowledge derived from these rules. Hence ANN has been found to be more helpful than a traditional expert system in the diagnosis of diseases. Selvi (2000) developed a new approach for designing and developing an artificial neural network model, diagnosing diseases and solution strategy for solving XOR problem [2]. The proposed model has also been tested for solving an important benchmark problem known as XOR problem. Li (2007) employed a new method, mega-trenddiffusion technique, to improve the accuracy of gene diagnosis for bladder cancer on a very limited number of samples. The modeling results showed that when the number of training data increased, the learning accuracy of the bladder cancer diagnosis was enhanced stably, from 82% to 100%. Compared with traditional methods, this study provides a new approach of a reliable model for small dataset analysis[3]. Artificial neural networks are computational tools for pattern recognition that have been the subject of renewed research interest. Artificial neural networks, employing several formats and learning algorithms, are being used in academic research and industrial applications. Applications of artificial intelligence for data analysis have been reported in the prediction of several medical diagnoses [4,5,6]. The aim of the paper is propose a prediction method for the death rate of breast cancer induced from average microelement absorption using artificial neural network.
2 Neural Network Model for Prediction and Diagnosis Breast cancer is one of the leading causes of deaths from cancer for the female population in both developed and developing countries. The most useful way to reduce deaths due to breast cancer is to treat the disease at an earlier stage. Earlier treatment requires early diagnosis, and early diagnosis requires an accurate and reliable diagnostic procedure that allows physicians to differentiate benign breast tumors from malignant ones. The most frequently adopted medical imaging studies for the early detection and diagnosis of breast cancers include mammography and ultrasonography. The task of data classification using knowledge obtained from known historical data has been one of the most intensively studied subjects in statistics, decision science, operations research, and computer science. It has been applied in problems of medicine, social science, management, and engineering. Cancer results from a combination of many factors including inherited mutations or polymorphism of cancer susceptibility genes, environmental agents that influence the acquisition of somatic genetic changes and several other systemic and local factors. The microelements, such as Se, Cu, Zn, Cd, Cr, Mn and As, can flow to women breast through the capillary vessel in women breast. The investigation demonstrates that microelement absorption can affect the death rate of breast cancer and to search for reasonable microelement absorption is very important for increasing life time of breast cancer patients. The concept of environment is often used with a broad scope in the medical literature, including all non-genetic factors such as diet, lifestyle and infectious agents. In this broad sense, the environment is implicated in the causation of the majority of human cancers. In a more specific sense, however, environmental factors include only the (natural or man-made) agents encountered by humans in their daily life, upon
416
S. Li et al.
Death rate/%
30 25
Se
20 15 10 5 50
60
70
80
90
100
Element Se absorption
Fig. 1. Influences of microelement Se absorption on the death rate of breast cancer
Death rate/%
25 20 15
Cr
10 5 15
17
19
21
23
25
Element Cr absorption
Fig. 2. Influences of microelement Cr absorption on the death rate of breast cancer
which they have no or limited personal control. The influences of microelement absorption on the death rate of breast cancer are plotted in Fig.1 and 2. Neural network is capable of identifying relations in input data that are not easily apparent with current common analytic techniques. The functioning artificial neural network's knowledge is built on learning and experience from previous input data. On the basis of this prior knowledge, the artificial neural network can predict relations found in newly presented data sets. An artificial neural network model is a system with inputs and outputs based on biological nerves. The system can be composed of many computational elements that operate in parallel and are arranged in patterns similar to biological neural nets. A neural network is typically characterized by its computational elements, its network topology and the learning algorithm used. Among the several different types of ANN, the feed-forward, multilayered, supervised neural network with the error back-propagation algorithm, the BPN, is by far the most frequently applied neural network learning model, due to its simplicity. The architecture of BP networks, depicted in Figure 4, includes an input layer, one or more hidden layers, and an output layer. The nodes in each layer are connected to each node in the adjacent layer. Notably, Hecht-Nielsen proved that one hidden layer of neurons suffices to model any solution surface of practical interest. Hence, a network with only one hidden layer is considered in this study. Before an ANN can be
Prediction of Death Rate of Breast Cancer
Q jk
Input layer
417
P ij
Hidden layer
Output
Fig. 3. Topography structure of artificial neural network
used, it must be trained from an existing training set of pairs of input-output elements. The training of a supervised neural network using a BP learning algorithm normally involves three stages. The first stage is the data feed forward. The computed output of the i-th node in output layer is defined as follows[7] Nh
Ni
j =1
k =1
yi = f (∑ ( μ ij f (∑ν jk x k + θ j ) + λi ))
(1)
Where μjj is the connective weight between nodes in the hidden layer and those in the output layer; vjk is the connective weight between nodes in the input layer and those in the hidden layer; θj or λi is bias term that represents the threshold of the transfer function f, and xk is the input of the kth node in the input layer. Term Ni, Nh, and No are the number of nodes in input, hidden and output layers, respectively. The transfer function f is selected as Sigmoid function [8]
f (⋅) = 1 /[1 + exp( −⋅)]
(2)
The second stage is error back-propagation through the network. During training, a system error function is used to monitor the performance of the network. This objective function, also called the error function, is often defined as follows P
No
E ( χ ) = ∑ (∑ ( y − o ) p =1
p
p
i =1
p i
p i
2
(3)
Where yi and oi denote the practical and desired value of output node i for training pattern p, P is the number of sample, w is the weight vector of neural network, E(χ) is the objective function. Training methods based on back-propagation offer a means of solving this nonlinear optimization problem based on adjusting the network parameters by a constant amount in the direction of steepest descent, with some variations depending on the flavor of BP being used. Since error surfaces of objective function for neural networks can be quite complex with many local optima,
418
S. Li et al.
the genetic algorithm seems to be better suited for this type search. The genetic algorithm searches from one population of points to another, focusing on the area of the best solution so far, while continuously sampling the total parameter space. When a network is trained with a database containing a substantial amount of input and output vector pairs the total error E can be calculated. The algorithm used to train network makes use of the genetic algorithm. The algorithm is more powerful than the common used gradient descent methods because the genetic algorithm makes training more accurate and faster near global minima on the error surface. For the model, the architecture of the network was determined to be a three-layer fully connected feedforward network and the weights between the layers were allowed to adjust according to the constraints given and the function to be optimized. Training methods based on BP offer a means of solving this nonlinear optimization problem based on adjusting the network parameters by a constant amount in the direction of steepest descent, with some variations depending on the flavour of BP being used. The BFGS algorithm is a quasi-Newton optimization technique, in which curvature information is used to prove a more accurate descent direction, without actually calculating the second derivatives. A sequence can be computed according to the formula
χ (n + 1) = χ (n) + Δχ (n)
(4)
Where χ (n) is the vector of network parameters (weights linking input layer and hidden layer and weights linking hidden layer and output layer) for iteration n. Other algorithm used to train network makes use of the Levenberg-Marquardt approximation. This algorithm is more powerful than the common used gradient descent methods, because the Levenberg-Marquardt approximation makes training more accurate and faster near minima on the error surface. The adjusted weight vector Δw is calculated using a Jacobian matrix J, a transposed Jacobian matrix JT, a constant μ, a unity matrix I and an error vector e. The method is shown as follows[9]
Δχ = ( J T J + μ I ) J T e
(5)
The Levenberg-Marquardt algorithm approximates the normal gradient descent method, while if it is small, the expression transforms into the Gauss-Newton method. After each successful step the constant μ is decreased, forcing the adjusted weight matrix to transform as quickly as possible to the Gauss-Newton solution. When after a step the errors increase the constant μ is increased subsequently.
3 Prediction of Death Rate of Breast Cancer Induced from Microelement Average Absorption with Neural Network Breast cancer is a malignant tumour that has developed from cells of the breast. Although scientists know some of the risk factors (i.e. ageing, genetic risk factors, family history, menstrual periods, not having children, obesity) that increase a woman’s chance of developing breast cancer, they do not yet know what causes most
Prediction of Death Rate of Breast Cancer
419
breast cancers or exactly how some of these risk factors cause cells to become cancerous. Research is under way to learn more and scientists are making great progress in understanding how certain changes in DNA can cause normal breast cells to become cancerous. ANNs are computational architectures composed of interconnected units (neurons). Its name reflects its initial inspiration from biological neural systems, though the functioning of today’s ANNs may be quite different from that of the biological ones. To estimate the predictive accuracy of the forecasting models, the data set must be split into a training set and a test set. The training set is used to establish the diagnostic model’s parameters, while the independent holdout sample is used to test the generalization capability of the model. Ten-fold cross validation is used in this research to minimize the impact of data dependency on the results and to improve the reliability of the resultant predictive estimates. The choice of ten partitions is somewhat arbitrary. Several key design decisions involving the topology and the learning process are required to define the neural network model. Topology decisions establish the network architecture and include the number of hidden layers and number of neurons in each layer. The number of neurons in the input layer of the neural models is simply the number of variables in the data set. The number of input layer is equal to the kind of microelements affecting the death rate of breast cancer. For the neural output layer, there is 1 neural element because the objective of model is to forecast the relationship between the death rate of breast and the microelement absorption. The hidden layer is more difficult to define[10]. A relatively large hidden layer creates a more flexible diagnostic model. The forecasting error for such a model will tend to have a low bias component with a large variance caused by the tendency of the model to over-fit the training data. Table 1. Comparison of the death rates of breast cancer with forecasting ones with neural network Microelement absorption
Se
Cu
Zn
93.8 634 Poland 5231 Hungary 87.2 714 4347 Formosa 84.1 592 1674 96.0 Yugoslavia 722 3288 *Note: practical value/forecasting value.
Cd
Cr
Mn
As
Death rate/%
80.3 74.6 33.3 72.9
15.9 15.8 12.8 11.7
876 1080 522 1169
138.3 268.5 184.3 82.1
11*/8.83 12.5*/14.2 4.0*/5.16 8.0*/10.73
A relatively small hidden layer results in a model with a higher error bias and a lower variance. The design of the hidden layer, therefore, involves a tradeoff between error components. In this research, the number of neurons in the hidden layer is determined using a cascade learning process. The cascade learning process is constructive, starting with an empty hidden layer and adding neurons to this layer one at a time. The addition of hidden neurons continues until there is no further improvement in network performance. Results suggest using 14 hidden layer nodes for a 7-14-1 network architecture. Forecasting accuracy is also dependent on the dynamics of the network learning process. Once training is completed, further tasks can be carried out with relative ease. Comparison of the death rates of breast cancer with forecasting ones with neural network is shown in Table 1. Fig 4 demonstrates comparison of actual values of the death rate of breast cancer with forecasting ones.
420
S. Li et al.
D eath rate/%
15
Actual values
10
Forecasting values
5 0 Poland
Hungary
Formosa
Yugoslavia
State Fig. 4. Comparison of actual values of the death rate of breast cancer with forecasting ones
4 Conclusions Artificial neural network is a new application of computer technology and has varied utilization in current medical research about breast cancer prediction. From the highly satisfactory specificity and sensitivity of the results, the proposed procedure is expected to be a helpful tool for forecasting the death rate of breast cancer induced from average microelement absorption. With the growth of the database, more and more cases will be collected and used as training set. The most important aspect of the proposed prediction procedure is the ability of self-organization of the neural network without requirements of programming and the immediate response of a trained net during real-time applications. Artificial neural networks are a form of artificial computer intelligence that has been the subject of renewed research interest in the last 20 years. Although they have been used extensively for problems in engineering, they have only recently been applied to medical problems, particularly in the fields of radiology, urology, laboratory medicine and cardiology. An artificial neural network is a distributed network of computing elements that is modeled after a biologic neural system and may be implemented as a computer software program. Artificial neural networks are not meant to replace clinical judgment or classic statistical approaches; they are meant to enhance clinical decision-making. Though they are relatively new, they are one of a number of techniques for interpreting medical data. Further work can be performed for improving the forecasting accuracies by the usage of different ANN architectures and training algorithms.
References 1. Itchhaporia, D., Snow, P.B.: Artificial neural networks: current status in cardiovascular medicine. JACC 28(2), 515–521 (1996) 2. Selvi, S.T., Arumugam, S., Ganesan, L.: BIONET: an artificial neural network model for diagnosis of diseases. Pattern Recognition Letters 21, 721–740 (2000) 3. Li, D.-C., Hsu, H.-C., Tsai, T.-I.: A new method to help diagnose cancers for small sample size. Expert Systems with Applications 33, 420–424 (2007) 4. Boracchi, P., Biganzoli, E., Marubini, E.: Modeling cause-specific hazards with radial basis function artificial neural networks: application to 2233 breast cancer patients. Statistics in Medicine 20, 3677–3694 (2001)
Prediction of Death Rate of Breast Cancer
421
5. Guler, I., Ubeyli, E.D.: Application of adaptive neuro-fuzzy inference system for detection of electrocardiographic changes in patients with partial epilepsy using feature extraction. Expert Systems with Applications 27, 323–330 (2004) 6. McCarthy, J.F., Marx, K.A., Hoffman, P.E.: Application of machine learning and highdimensional visualization in cancer detection, diagnosis, and management. Annals of the New York Academy of Sciences 1020, 239–262 (2004) 7. Cao, X.: Application of artificial neural networks to load identification. Computers & structures 69, 63–78 (1998) 8. Huang, Y.: Application of artificial neural networks to predictions of aggregate quality parameters. Int. J. of Rock mechanics and mining sciences 36, 551–561 (1999) 9. Meulenkamp, F.: Application of neural networks for the prediction of the unconfined compressive strength from Equotip hardness. Int. J. of Rock mechanics and mining Sciences 36, 29–39 (1999) 10. West, D., West, V.: Improving diagnostic accuracy using a hierarchical neural network to model decision subtasks. International Journal of Medical Informatics 57, 41–55 (2000)
An Adaptive Classifier Based on Artificial Immune Network Zhiguo Li, Jiang Zhong, Yong Feng, and ZhongFu Wu College of Computer Science and Technology, Chongqing University, Chongqing, 400044, china {lizhiguo,zjstud,fengyong,wzf}@cqu.edu.cn
Abstract. The central problem in training a radial basis function neural network is the selection of hidden layer neurons, which includes the selection of the center and width of those neurons. In this paper, we propose a new method to construct an adaptive RBF neural network classifier based on artificial immune network algorithm. A multiple granularities immune network (MGIN) algorithm is employed to get the candidate hidden neurons and construct an original RBF network including all candidate neurons, and a removing redundant neurons procedure is used to simplify the classifier finally. Some experimental results show that the network obtained tends to generalize well.
1 Introduction Radial basis function (RBF) networks are being used for function approximation, pattern recognition, and time series prediction problems. The performance of an RBF neural network depends very much on how it is trained. Training an RBF neural network involves selecting hidden layer neurons and estimating weights. The problem of neuron selection has been pursued in a variety of ways, based on different understandings or interpretations of the RBF neural network [1-5]. In ref [1], orthogonal least square (OLS) based algorithm is presented. The RBF neural network is interpreted in terms of its layered architecture, where the role of hidden layer is simply to map samples from the input space to the hidden layer space, and neuron selection is performed in the hidden layer space. Neuron selection is handled as a linear model selection problem in the hidden layer space. In ref [2], selection neurons in the hidden layer space based on data structure preserving criterion is proposed. Data structure denotes relative location of samples in the high-dimensional space. This algorithm have some shortcoming as follows: firstly, because the algorithm starting with RBF neural network with all samples as hidden neurons, this method could not solve large scale problem; secondly, an uniform values is chosen for the width parameter of the hidden neurons and which is selected by repeat attempts; lastly, this algorithm is an unsupervised neuron selection algorithm, as a result it could not utilize the class label information. In this paper we proposed a novel algorithm, it firstly use multiple granularities artificial immune network to find the candidate hidden neurons, then it construct a RBF neural network with all candidate hidden neurons and employ preserving K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 422–428, 2007. © Springer-Verlag Berlin Heidelberg 2007
An Adaptive Classifier Based on Artificial Immune Network
423
criterion to remove some redundant hidden neurons. This new algorithm takes full advantage of the class label information and starting with a small neural network; hence it is likely to be more efficient and is except to generalize well. This paper is organized as follows. In Section II, multiple granularities immune network algorithm is developed to construct the original RBF classifier. In Section III, a new neuron selection criterion is introduced. Experiment studies are presented in Section IV, and concluding remarks are given in Section V.
2 Construct the Original RBF Classifier Based on MGIN The original AIN method is an unsupervised algorithm [7, 8], so it is difficult to confirm the optimal number of the neuron based on the class label information. The most problem of original AIN algorithm for hidden layer is that it is a computation under the same granularity but the classification is under different granularities. Here we employ a variation of AIN algorithm to construct the original hidden layer of RBF network. In this section, we give a multiple granularities AIN algorithm for hidden neurons, which is employing immune clone operation, immune mutation operation and immune suppression operation defined in [7]. The multiple granularities immune network algorithm (MGIN) is described as following: Input : data set X , and the descend factor a of granularity Output : the candidate hidden neurons, R Step1: Calculation the radius r of the dataset hypersphere, and let r be the immune suppression parameter, let R = Φ , let X ' = X . Step2: Construct artificial immune network M based on X ' ; Step3: Let X ' = Φ , Let M be the cluster centers and partition the samples based on Gaussian radial basis function and width parameter is its suppression parameter r. If partition i contains only one class data points and Mi is the center of it, let R = R ∪ M i ; otherwise add the data points of partition i into X ' . Step4: If X ' ≠ Φ , let r = r × a and go to Setp2; otherwise return R as hidden neurons and stop. Algorithm 1. MGIN for candidate hidden neurons
The entire algorithm1 has time complexity of Ο(m * N ) [7], where N is the number of training samples; m is the maximum size of R. According to the property of R, a neighborhood classifier could be built based the hidden neurons R, where the distance function is Gaussian radial basis function. Theorem 1. Let V be the centers of a neighborhood classifier, then a RBF network classifier can be constructed based on V. Proof. suppose the number of classes is K, and the number of output neurons of the RBF network classifier is K, the data point of V is m.
We construct a RBF network classifier as fig.1. Let v j be a neuron of the hidden layer and be one center of class i, let W j , t be the weight between neuron v j and
424
Z. Li et al.
⎧1 , if class(v j ) = t output neuron t, and let W j , t = ⎨ . We will prove that this RBF ⎩- 1, otherwise network can classify the data correctly. Let xi be an arbitrarily sample of the dataset, which’s class label is k, and d ij =|| xi − v j || , dsi = arg min (|| xi − v j ||) class ( v j ) = k
, dd i = arg min (|| xi − v j ||)
,
class ( v j ) ≠ k
Δ i = dd i − dsi , Δ = arg min (Δ i ) . i =1,...,n
Hence Δ may be seen as the minimum separation between different classes in the nearest neighbor classification. Fk ( xi ) =
Fk ( xi ) ≥ e
− ds i / δ
=e
− ds i / δ
−
∑e
− d ij / σ
−
(∀v j ) class ( v j ) = k
∑e
− d if / σ
∑e
− d if / σ
( ∀v f ) class ( v f ) ≠ k
≥e
− ds i / δ
− (m − 1)e − dd i / δ
(∀v f ) class ( v f ) ≠ k
(1 − (m − 1)e − ( dd i − dsi ) / δ ) ≥ e − dsi / δ (1 − (m − 1)e − Δ / δ )
If the width parameter δ of the radial basis function is satisfied δ ≤ (∇ / lg(m − 1)) , then Fk ( xi ) > 0 and F f ( xi ) < 0 when f ≠ k . According to the class decision criterion, the output class label must be k. It’s proved that the error rate of the classifier is no more than twice the Bayes (optimal) error rate [].
Fig. 1. RBF network architecture
After the RBF network classifier has been constructed, we can employ some removing criterion described in section III to remove some redundant neurons and simplify the classifier.
3 Simplify the RBF Neural Network Consider an RBFNN with inputs from R m , c RBFs and K output units. Let v j ∈ R m be the prototype that is center of the jth RBF and wi = [w i1 , w i 2 ,..., w i c ]T be the vector
An Adaptive Classifier Based on Artificial Immune Network
425
containing the weights that connect the ith output unit to the RBFs. Define the sets V = {vi } and W = {w j } and let also A = {ai } be a set of free parameters associated with the RBFs. An RBFNN is defined as the function N : R m → R n that maps x ∈ R m to N (V ,W , A; x) , such that
∏
i
where
c
N (V ,W , A; x) = f (
∑w g ij
j(
2
x − v j ) + wi 0 )
(1)
j =1
f ( x) = 1 /(1 + e − x ) used in this paper, g j is represents the response of the
RBF centered at the prototype v j . Using this notation, the response of the ith output unit to the input
xk is ~
c
y i ,k =
∏ N ( x ) = f (∑ w g i
k
ij
j ,k
+ wi 0 )
(2)
j =1
where g j ,k represents the response of the RBF centered at the prototype v j to the
xk . Unlike the traditional RBFNN using the exponential functions, in this paper, we using a cosine function [9] for g j ,k is
input vector
g i ,k = a j / ( xk − v j
2
+ a j 2 )1 / 2
(3)
Cosine RBFNNs can be trained by the original learning algorithm, which was developed by using “stochastic” gradient descent to minimize (4). n
Ek = 1 / 2
∑
~
( y i ,k − y i ,k ) 2
(4)
i =1
for k = 1,2,..., M . For sufficiently small values of the learning rate, sequential minimization of Ek , leads to a minimum of the total error E example vector
= ∑k =1 Ek . After an m
( xk , yk ) is presented to the RBFNN, the new estimate wi ,k of each weight
wi , is obtained by incrementing its current estimate by the
amount Δ wi , k
= − β ∇ w i E k , where β is the learning rate. ~
~
~
wi,k = wi,k −1 + Δwi,k = wi,k −1 + β gi,k yi,k (1- yi,k )(yi,k- yik )
(5)
The new estimate a j ,k of each reference distance a j , can be obtained by incrementing its current estimate by the amount as
Δ a j , k = − β ∂ E k / ∂ a j [7].
426
Z. Li et al.
a i , k = a i , k − 1 + Δ a i , k = a i , k − 1 + β g j , k (1 − g 2j , k ) ε hj , k / a i , k − 1 c
~
~
ε hj , k = ( g 3 / a 2j ) ∑ f ' ( y i , k )( y i , k − y i , k ) w i , j j ,k
(6)
i =1
According to (3), the jth cosine RBF can be eliminated during the training process if its reference distance a j approaches zero. We can get new algorithm to training RBF classifier which uses the multiple granularities artificial immune network algorithm to get the candidate hidden neurons firstly, and then, training the neural network base on gradient descent learning process descried in this section.
4 Experiments In this section, we present a few examples including six real world problems. All the real-world examples are from the UCI Machine Learning Repository [10].For all problems, a Gaussian basis function is employed and the width parameter is determined automatically using the search procedure described in Section II.
4.1 Iris Data In Iris data experiment, there are 75 selected patterns with four features used as training set and testing set, respectively. The 75 training patterns are obtained via a random process from the original Iris data set with 150 patterns. The stop criterion for the removing procedure is that set Er = 0.05 and Hn = 5 . Because there are three classes in the Iris data, three output neurons is selected. The new classifiers are constructed to solve this problem. For the BP algorithm, the parameter of MSE is select 0.014. Several different kinds of classification methods are compared with our proposed MGIN based RBF network classifier on the Iris data classification problem. As shown in Table 1, 96.7% testing accuracy rate is obtained by the new classifier on the Iris data. In comparison with testing accuracy rates of other models, for the case of random sampling procedure with 50% training patterns and 50% test patterns, the new classifier has the best classification accuracy rate on the Iris data. SOM RBF classifier is also a two stage algorithm like this new algorithm; the difference is that it employs SOM to get the candidate hidden neurons. Table 1. Experimental Results on IRIS data
Accuracy Training Testing
BP 4-5-3 98.5 96.5
Nearest 97.5 96.3
SOM RBF (5 Neurons) 96.8 95.4
OLS RBF (5 Neurons) 97.9 96.0
MGIN RBF (5 Neurons) 98.1 96.7
An Adaptive Classifier Based on Artificial Immune Network
427
4.2 Several Benchmark Data Sets Testing from UCI Repository In this section, we use five benchmark’s data sets from the UCI repository [9] to further demonstrate the classification performance for the new classifier. Experimental conditions are the same as previous experiment. Table 2. Testing Results of Various Learning Models
Data set Liver Breast Echo Wine Va-Heart
BP 81.9 93.8 93.9 96.6 99.5
Nearest 69.7 92.1 90.3 84.5 95.1
SOM RBF 80.3 95.2 89.4 93.6 97.0
OLS RBF 79.6 93.7 91.5 95.7 98.2
MGIN RBF 80.3 94.6 91.8 93.8 98.5
We also give ten independent runs, and a half of original data patterns are used as training data (randomly selected) and the remaining patterns are used as testing data. According to testing results in table 2, it found that, BP neural network has the best performance at the most time. However, the network structure of BP neural network is difficult to be determined for the higher dimensional pattern classification problems and cannot be proved to converge well. Also the accuracy rate of the new classifier is increased obviously relative to the Traditional RBF network classifier.
5 Conclusions This paper proposes MGIN based a neural-network classifier, which contains two stages: employing multiple granularities immune network to find the candidate hidden neurons; and then use some removing criterion to delete the redundant neurons. Experimental results indicate that the new classifier has the best classification ability when compared with other conventional classifiers for our tested pattern classification problems.
Acknowledgment This work is supported by the Graduate Student Innovation Foundation of Chongqing University of China (Grant No. 200506Y1A0230130), the Research Fund for the Doctoral Program of Higher Education of China (Grant No. 2004061102) and High Technology Funds of Shanghai Pudong(Grant No. PKK2005-07).
References 1. Chen, S., Cowan, C.F., Grant, P.M.: Orthogonal least squares learning algorithms for redial basis function networks. IEEE Trans. Neural Networks 2, 302–309 (1991) 2. Mao, K.Z., Huang, G.-B.: Neuron Selection for RBF Neural Network Classifier Based on Data Structure Preserving Criterion. IEEE Trans. Neural Networks 16, 1531–1540 (2005)
428
Z. Li et al.
3. Huang, G.B., Saratchandran, P.: A Generalized Growing and Pruning RBF (GGAP-RBF) Neural Network for Function Approximation. IEEE Trans. Neural Networks 16, 57–67 (2005) 4. Lee, S.J., Hou, C.L.: An ART-Based Construction of RBF Networks. IEEE Trans. Neural Networks 13, 1308–1321 (2002) 5. Lee, H.M., Chen, C.M.: A Self-Organizing HCMAC Neural-Network Classifier. IEEE Trans. Neural Networks 14, 15–27 (2003) 6. Miller, D., Rao, A.V.: A Global Optimization Technique for Statistical Classifier Design. IEEE Trans. on Signal Processing 44, 3108–3122 (1996) 7. Timmis: Artificial immune system: an novel data analysis technique inspired by immune network theory. Wales university, Wales (2001) 8. Zhong, J., Wu, Z.F.: A Novel Dynamic Clustering Algorithm Based on Immune Network and Tabu Search. Chinese Journal of Electronics 14, 285–288 (2005) 9. Karayiannis, N.B.: Reformulated radial basis neural networks trained by gradient descent. IEEE Trans. Neural Networks, IEEE Computational Intelligence Society, 657–671 ( May 1999) 10. CI Machine Learning Repository: Available, http://www.ics.uci.edu/ mlearn/ MLRepository.html
Investigation of a Hydrodynamic Performance of a Ventricular Assist Device After Its Long-Term Use in Clinical Application Yuma Kokuzawa1, Tomohiro Shima1, Masateru Furusato1, Kazuhiko Ito1, Takashi Tanaka1, Toshihiro Igarashi2, Tomohiro Nishinaka2, Kiyotaka Iwasaki1, and Mitsuo Umezu1 1
Integrative Bioscience and Biomedical Engineering, Graduate School of Waseda University, #58-322 3-4-1 Ohkubo, Shinjuku-ku Tokyo, 169-8555, Japan
[email protected] 2 Heart Institute of Japan, Tokyo Women’s Medical University, 8-1 Kawada-cho, Shinjuku-ku Tokyo, 162-8666, Japan
Abstract. A long-term durability of a ventricular assist device (VAD) is required due to a shortage of donor hearts for cardiac transplantation, but there is no analyzed pump data after long-term use. This study aimed to perform a comparative study between new VAD and VADs after long-term use. The hydrodynamic performance of the used Toyobo VADs (mean period of 5 months with the maximum of 12 months) was evaluated in a mock circulatory system, where a new VAD was used as a control. Although a remarkable difference was not observed in terms of mean flow rate, flow and pressure waveforms varied significantly. Then, the pressure-volume relationship of each pump was measured. It was found that the capacity of long-term VADs was reduced. Although a further study is required, these results suggested that a long-term use of VAD may cause a change in mechanical properties of polymer materials.
1 Introduction Though cardiac transplantation is the most effective treatment method for the patient suffering from chronic severe heart failure, a long-term waiting period is required to receive suitable donor heart due to a shortage of donors. In the waiting period for cardiac transplantation, the cardiac function is mainly assisted by a ventricular assist device (VAD). [1] As the period of VAD becomes longer, long-term durability should be requested and guaranteed.[2] The purpose of this research is to investigate long-term hydrodynamic performances of VADs after the useage in clinical application. In general, three evaluation factors (durability of the materials, biocompatibility and hydrodynamic characteristics) are chosen to indicate a basic performance of the pump.[3] However, hydrodynamic performance is mainly focused in this paper. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 429–435, 2007. © Springer-Verlag Berlin Heidelberg 2007
430
Y. Kokuzawa et al.
The Toyobo VAD, as shown in Fig. 1, is a pneumatically-driven pulsatile pump, which is mainly used in clinical application in the Japanese market as a bridge to transplantation, or a circulatory support of the DCM (Dilated Cardiomyopathy) patients. And it is frequently used for a long period to exceed a guaranteed term (one month), because there is no alternative VADs to be approved for a temporary use. The Toyobo VAD is made from a segmented polyether polyurethane, and it has two chambers: blood and air chambers, both of them are separated by a moving diaphragm (Fig.2). Valve Blood chamber
Diaphragm Air chamber Fig. 1. Toyobo VAD pump for clinical use
Fig. 2. Cross sectional view of Toyobo VAD
2 In Vitro Hydrodynamic Performance Test 2.1 Samples and Method First of all, the hydrodynamic performances of six Toyobo pumps, implanted at Heart Institute of Japan, Tokyo Women’s Medical University, were evaluated in a mock circulatory system, whereas a new VAD was used as a control. The applied periods of the blood pumps used for the test are summarized in Table 1. Table 1. Pumps used for the in vitro hydrodynamic performance test
1 3 5 7 11 12
Name control month months months months months months
Days of usage 0 30 98 157 212 326 369
Implantation
-
Weaning
Apr-06 Apr-06 Feb-06 Jun-04 Mar-05 Jan-05
May-06 Jul-06 Jul-06 Jan-05 Jan-06 Jan-06
-
Investigation of a Hydrodynamic Performance of a Ventricular Assist Device
431
Overflow-type mock circulatory system with the fixed inlet and outlet pressure (10 mmHg, and 100 mmHg, respectively) was employed for the test, as shown in Fig. 3. The straight tigon tubes with 400mm in length and 1/2 inch in inner diameter were used to connect the VAD to each tank. Normal saline solution was used as a test fluid. Mean flow rate was measured directly, and the inlet / outlet flow waveforms were measured by the electromagnetic flow meter. In addition, drive pressure was measured by pressure transducers. The test conditions of the drive consol (VCT-50; Toyobo, Japan) were listed in Table 2. Table 2. Drive conditions of the VAD Parameter Pulse rate Systolic fraction Drive pressure Vacuum pressure
(unit)
Values 70, 80 20 ~ 60 240 -50
(BPM) (%) (mmHg) (mmHg)
After load
100 mmHg PC
Pre-load 10 mmHg Flow
400mm
400mm Pressure
Reservoir
VCT-50
Fig. 3. Schematic drawing of a pulsatile test circuit
432
Y. Kokuzawa et al.
2.2 Results and Discussion Relationships between mean flow rate and systolic fraction for different pulse rate are shown in Fig.4. The standard deviation of mean flow rate of each blood pump was only 2~5%, even for VADs of long-term use driven over 11 months, however, the pump flow of long-term used VAD indicated a little bit less at the systolic fraction of 40% or more. In general, the clinical VADs are operated at the systolic fraction of 2535%, and in this range it was found that the standard deviation was small within 2%. (a)
6 Mean Flow Rate L/min
control
5
1month
4
3months
3
5months 7months
2
11months
1
12months
0 10
30 40 50 Systolic Fraction %
60
70
(b) (b)
6 Mean Flow Rate L/min
20
control
5
1month 4
3months
3
5months 7months
2
11months 1
12months
0 10
20
30 40 50 Systolic Fraction %
60
70
Fig. 4. Hydrodynamic performance of Toyobo VADs. (a) 70BPM. (b) 80BPM.
However, there was a small difference in systolic duration between control and long-term used pump; shorter duration was found in the long-term used pump. In addition, outflow waveforms measured between late diastole and early systole were compared under three different systolic fractions, as indicated in Fig.5. Over-dilation
Investigation of a Hydrodynamic Performance of a Ventricular Assist Device
433
of the diaphragm and subsequent small ejection were observed in Fig.5 (a) due to a longer diastolic duration. On the contrary, there was no small subsequent ejection in Fig.5 (c), because there was no residual time in diastolic period. At 40%-systole, there was no small ejection in the pump of long-term use, however, some small ejection was found in other pumps. Judging from the above data, it was suggested that longterm used pump may have elongated diaphragms, then, longer period was required to reach a full filling state.
(a)
Outflow Waveforms L/min
10
control 1 month
5
3 months 5months 7months
0
11months
Systolic phase
Diastolic phase
12months
-5 0.7
1.0
Time sec
(b)
10
Outflow Waveforms L/min
control 1 month
5
3 months 5months
0
7months 11months
Systolic phase
Diastolic phase
12months
-5 0.7
(c)
10
Outflow Waveforms L/min
1.0
Time sec
control 1 month
5
3 months 5months 7months
0
11months
Systolic phase
Diastolic phase
12months
-5 0.7
Time sec
1.0
Fig. 5. Outflow waveforms of late diastolic and early systolic phases. (a) 35%-systole. (b) 40%systole. (c) 45%-systole.
434
Y. Kokuzawa et al.
3 In Vitro Static Pressure-Volume Characteristic Test 3.1 Method Relationship between pump blood chamber volume and air pressure was obtained under static condition, because air pressure is one of the parameters to determine the pump output. Seven pumps, shown in Table 1, were examined. The blood chamber volume was measured directly, while the chamber was filled with water, and air chamber pressure was also measured by the pressure transducer. The relation was clarified by repeating the procedure of filling and ejection of water. 3.2 Results and Discussion The static pressure-volume curve of each pump is shown in Fig.6. It was observed that curves in the pump of long-term use were shifted to the left. This shows that the capacity of blood chamber becomes small, and the capacity of 11 months-pump was smaller by 9ml than that of the control data. A deformation of the plastic pump housing is considered as one of the causes of the capacity decrease. It was observed that a swelling of the housing of the long-term used pump was not clear, therefore, it is speculated that material characteristics in other portion may be changed. 300
Pressure mmHg
200
control 1 month
100
3 months 5months
0
7months 11months
-100
12months
-200 30
50
70 90 Volume ml
110
130
Fig. 6. Static Pressure-Volume curve of each pump
4 Conclusions In the area of systolic fraction frequently used in clinical cases, the standard deviation of mean flow rate was the smallest, and a remarkable decrease was not observed in the long-term used pumps. However, it was speculated that a change of mechanical properties of polymer materials was occurred, because there was a difference in pressure-volume curves of VADs. Further study is required such as tensile test on diaphragm and accurate measurement of three-dimensional shape of the pump housing.
Investigation of a Hydrodynamic Performance of a Ventricular Assist Device
435
Acknowledgments. This research was organized by “Biomedical Engineering Research on Advanced Medical Treatment”, at Advanced Research Institute for science and Engineering, Waseda University (05P29), and it was financially supported by Health Science Research Grants from the Ministry of Health, Labour and Welfare, Japan (H17-F-003).
References 1. Fukushima, N., Matsuda, H.: Heart transplantation in Japan. Nippon Rinsho 61(6), 1057– 1062 (2003) 2. Masai, T., Shimazaki, Y., Kadoda, K., Miyamoto, Y., Sawa, Y., Yagura, A., Matsuda, H., Sato, M., Kashiwabara, S.: Clinical experience with long-term use of the Toyobo left ventricular assist system. ASIO J. 41(3), M522–525 (1995) 3. Seki, T., Kitamura, S., Kawachi, K., Fukutomi, M., Kobayashi, S., Kawata, T., Niwaya, K., Morita, R.: Efficacy and limitation of a left ventricular assist system in a patient with dilated cardiomyopathy accompanying multi-organ dysfunction. J. Cardiovasc. Surg. 36(2), 147– 151 (1995)
QSAR and Molecular Docking Study of a Series of Combretastatin Analogues Tubulin Inhibitors Yubin Ji1, Ran Tian1, and Wenhan Lin2 1
Research Center on Life Science and Environmental Science, Harbin University of Commerce, Harbin 150076, China
[email protected] 2 State Key laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing 100083, China
Abstract. In this article, we study a series of Combretastatin compounds which undergo B ring transformation. First of all, Genetic function analysis(GFA) is adopted to study two-dimensional quantitative structure activity relationship(QSAR). The results demonstrate that Apol, PMI-mag, Dipole-mag, Hbond donor, RadOfGyration descriptors make the most significant contributions to the activities of this series of inhibitors; Comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis(CoMSIA) are adopted to study three-dimensional quantitative structure activity relationship, both of which demonstrate strong predictive abilities. The tri-dimensional contour maps of CoMFA and CoMSIA provide explanations for the structure-activity relationship of Combretastatin compounds, and elucidate the effects of different substituents of B ring on inhibiting the activities of tubulin polymerization. And molecular docking was used to analyze and validate quantitative structure activity relationship models. The results of this study provide evidence for further designing and synthesizing tubulin inhibitors and conducting structure optimization.
1 Introduction During the development of tumor, abundant bloods are needed to provide a great number of oxygen and nutrients. Cutting the supply of blood can cause tumor starvation, which has been proved to be an effective treatment for curing cancer[1]. Microtubule is a gracile tubular structure which has certain stiffness assembled by tubulin, it is also the key components of cytoskeleton which correlates with cell growth, shape maintenance, vesicle transportation, mitochondrion movement, intercellular signal transduction, and cell mitosis[2]. During the mitosis of tumor vessel cells, microtubule participates in the location and movement of chromosome, especially separating DNA into two daughter cells. It plays a vital role in cell replication and has become a very important target for cancer treatment[3]. Additionally the anti-tumor activities of microtubule-targeted agent only comes from its inhibitory effects on mitosis, which does not affect cells that do not undergo mitosis, so its toxicity is pretty low. Many different kinds of microtubule-binding medicines have achieved great K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 436–444, 2007. © Springer-Verlag Berlin Heidelberg 2007
QSAR and Molecular Docking Study of a Series
437
therapeutic effects on tumor treatments. Researches demonstrate that there are at least three binding sites on microtubule, they are types of taxane, vinca alkaloid and colchicine. Medicine binding colchicine site has been reported to have the effects of microtubule-targeted agents, whereas medicines acting on the other two binding sites only achieve low anti-angiogenesis effects when the dosage is almost poisonous[4]. Combretastatins are a series of active compounds isolated from Combretum caffrum, a South Africa shrubbery[5]. Research manifests these series of compounds can inhibit polymerization of tubulin, combretastatin A-4 (CA-4) of which is the most effective, then researchses on (CA-4) are carried out widely[6]. CA-4 is a simple molecule, which has potent inhibitory activities on microtubule polymerization and cell mitosis, meanwhile CA-4 can have competitive inhibition effects on microtubule polymerization promoted by colchicine[7], it is the first developing anti-tumor medicine which can destroy the tumor vessel system. For increasing its anti-tumor activities and decreasing its toxicity, researchers modified its structure and got lots of derivatives. However, owing to the fact that the specific mechanism by which these inhibitors act on tubulin is still uncertain, sufficient evidence on their chemical reconstruction and modification are still lacking. Investigation demonstrates A ring of CA-4 plays a vital and stable role in inhibiting activities of tubulin[8,9]. Therefore we keep the A ring of Combretastatins unchanged and select Combretastatins derivatives which are modified in B ring as subjects, advanced Genetic Function Analysis(GFA) is utilized to study two-dimensional quantitative structure-activity relationship for primarily knowing which descriptor contributes most to the activities of these inhibitors; furthermore comparative molecular field analysis(CoMFA) and comparative molecular similarity indices analysis(CoMSIA) are adopted to study three-dimensional quantitative structure-activity relationship for establishing the visualized QSAR model; then the structural characteristics which take the binding site of tubulin into account are introduced, molecular docking was used to validate the models of three-dimensional quantitative structure-activity relationship, which provides some theoretical instructions for further designing novel tubulin inhibitors.
2 Experiment 2.1 Dataset We select 20 Combretastatin derivatives which are chemically modified in B ring (Table 1), methods and conditions for determining the activities of these compounds are all the same, so there is comparability for analysis as a set of data, meanwhile the activity difference reaches to 3 magnitudes. Four compounds CA-6, CA-13, CA-15, CA-18 are randomly selected as test set, the other 16 compounds constitute the training set. In this article, the data used for activity calculation adopts pIC50 value, pIC50= -lgIC50, IC50 represents 50 percent inhibitory concentration.
438
Y. Ji, R. Tian, and W. Lin
Table 1. Structures of the Combretastatin analogues in the training set and test set (* represent compounds in test set)
B ring
CA-4
A
-
Z
H
-
8.43
8.67
8.12
1
B
S
Z
H
H
7.06
7.08
6.89
2
B
S
E
H
H
4.41
4.64
4.39
3
B
O
Z
H
H
7.31
7.72
7.28
4
B
O
E
H
H
4.56
4.65
4.51
5
E or Z
R
R2
B
S
Z
H
2-CH3
7.64
7.03
6.43
Expt
CoMFACal
CoMSIA Cal.
*
B
S
E
H
2-CH3
6.08
4.65
4.54
7
B
O
Z
H
2-CH3
7.03
6.98
7.30
8
B
O
E
H
2-CH3
4.91
4.67
4.52
6
2.2
X
PIC50
Comp
9
B
S
Z
H
2-phenyl
4.64
4.91
5.37
10
B
S
E
H
2-phenyl
4.42
4.41
4.33
11
B
S
Z
CH3
H
5.09
5.33
5.08
12
B
S
E
CH3
H
4.28
4.25
4.20
13*
B
O
Z
CH3
H
5.12
5.61
5.12
14
B
O
E
CH3
H
4.12
4.28
4.18
*
15
C
S
Z
-
-
6.46
6.61
7.22
16
C
S
E
-
-
4.61
4.63
4.65
17
C
O
Z
-
-
7.27
7.04
7.68
18*
C
O
E
-
-
5.08
4.67
4.77
19
D
-
Z
H
-
6.17
6.17
6.50
20
D
-
E
H
-
4.25
4.17
4.45
Establishment and Align of Molecule Activity Conformation
Owing to the fact that CA-4 has competitive inhibition effects on microtubule polymerization promoted by Colchicine, and its structure is similar to that of colchicines. In this study, we use the conformation of crystal complex of colchicine binding tubulin as template to establish the active conformation of CA-4. SYBYL6.91 software package is utilized to optimize CA-4, Tripos force field is adopted and Gasteiger-Huckel charge is loaded, conjugate gradient method is firstly used to optimize by 1000 steps then steepest descent method is adopted for convergence, the convergence standard is that the energy gradient is 0.042(kJ/mol·nm). CA-4 as a template is used to establish the other 26
QSAR and Molecular Docking Study of a Series
439
compounds, then depend on the same method to load charge for establishing active conformation of these compounds following optimization. Molecule align and analysis by the method of GFA is conducted in the modules of Drug Discovery and QSAR of Cerius2 4.10 software package. The compounds established in SYBYL6.91 are introduced into Cerius2, Gasteiger charge is loaded on every molecule by the method of target align, all the heavy atoms of 2, 3, 4-methoxy-phenyl are used as parent nucleus, flexible align is conducted by utilizing CA-4 as template. Molecular align and analysis by the method three-dimensional quantitative structure-activity relationships is conducted in SYBYL6.91 software package. Every molecule maintains its original Gasteiger-Huckel charge, all the heavy atoms of 2, 3, 4-methoxy-phenyl are used as parent nucleus, molecular align is conducted by the method of Align Database utilizing CA-4 as template. 2.3 Establishment of GFA Model In QSAR module of Cerius2, 13 default descriptors are selected including four categories which are electrical property, space, structure and thermodynamics respectively. Default values are adopted to calculate descriptors of each compound. When using the method of GFA to analyze quantitative relationship between inhibitors’ activities and all kinds of physical and chemical descriptors, the chosen regression parameters setting are as follows: evolution frequency is 100000, types are linear and spline, the primary length of equation is default 4, no need to fix the equation length, fitting smoothness is 1.0, all other parameters adopt defaults. For testing the predicting ability of the established model, the method of Leave-one-out(LOO) is adopted to observe inner predicting ability of equation, CV-r2 which is one index to measure the predicting ability of equation is attained. 2.4 Establish of CoMFA Model CoMFA’s active regions are chosen by the method of automatic generation, tripos standard force field is adopted and dielectric constant is correlated with distance. C+ which has sp3 hybridization is used as probe, the magnitude and distribution of electrostatic field and steric field of align molecule’s peripheral mesh point is detected by the step size of 0.2nm. Threshold settings of energy of electrostatic field and steric field are 83.68 kJ/mol-83.68 kJ/mol, 104.60 kJ/mol-104.60 kJ/mol, 125.52kJ/mol-125.52kJ/mol respectively to find the optimal threshold, all other parameters are defaults. When analyzing by Partial Least Square Method, cross validation by the method of Leave-one-out is primarily conducted, minimal filtration value settings are 2.092, 4.184, 6.276 and 8.368 kJ/mol respectively for finding the most appropriate value. The optimal rank of the model and related coefficient(q2) of cross-validation are confirmed by the method of Leave-one-out. Then non-cross validation according to the optimal rank is performed to get routine related coefficient(R2) and establish CoMFA model, minimal filtration value used in non-cross validation is the same as that used in cross validation.
440
Y. Ji, R. Tian, and W. Lin
2.5 Establish of CoMSIA Model CoMFA’s active region is chosen by the method of automatic generation, electrostatic field, steric field, hydrophobic field, hydrogen bond receptor and donor field are adopted for observation to find the molecule field with the best q2 by combination of different fields. Basing on the optimal field combination, the effects of attenuation factor(0.2-0.4) and minimal filtration value on the results of CoMSIA are further taken into account. Setting of minimal filtration value, calculation of q2 and R2, establishment of CoMSIA model all adopt the same method as that of CoMFA model. 2.6 Research on Combination Mode Docking work adopts Autodock3.05 programs. Serial number of crystal structure of tubulin in Protein Data Base (PDB) is 1SA0, the missing amino acid residue of A and B chains are recovered, hydrogen is added and kollman charge is loaded. CA-5 and CA-6 as small molecules are randomly picked for research on combination mode, their primary structure adopts the conformation established by previous method of SYBYL6.91 with adding hydrogen and loading Gasteiger charge. During the docking process, binding site of colchicine in original complex is selected as active site center. A box with the size of 48*56*40 is produced with spacing of 0.0375 nm. Docking calculation adopts Lamarckian genetic algorithm(LGA), 100 docking conformations are produced for every compound, the other parameters are calculated by defaults. According to combination mode and combination energy of each conformation, the appropriate one is picked for study.
3 Results and Discussion 3.1 GFA Model GFA model produces 100 equations in total, descriptors in equation include Apol, PMI-mag, Dipole-mag, Hbond donor, RadOfGyration, etc, all of which demonstrate the model is quite stable. Basing on LOF value, predicting ability, and descriptor variant, five relatively better equations are selected(Table 2). These five equations’ LOF values are all about 0.5, BS-r2and CV-r2are all above 0.8, the most commonly used descriptors are included. H-bond donor has the biggest and positive coefficient in the equation, which means the compounds that have H-bond donors have more potent activities; Apol and Dipole-mag are all electric parameters, Apol stands for molecular polarizability which has the highest occurrence frequency, but its coefficient is negative, which means the compounds are not supposed to have higher polarizability, and it also means they ought to be hydrophobic molecule. Dipole-mag stands for molecular dipole moment which appears in two equations with positive coefficient, that means molecular of these compounds are supposed to have certain polarity; PMI-mag and RadOfGyration both are spatial parameters, PMI-mag is principal moment of inertia, and RadOfGyration
QSAR and Molecular Docking Study of a Series
441
Table 2. Equations of GFA No 1 2 3 4 5
Equation A= 3.8053+0.99175* “Hbond donor”+0.002421*(3422.6- “Apol” ) + 0.003699* (1326.93- “PMI-mag” ) A= 3.80096+0.003721-(1326.93- “PMI-mag” )+1.00586* “Hbond donor”+ 0.00259* (13374.4- “Apol” ) A= 3.7794+0.861847* “Hbond donor” +0.000958*(14181.7- “Apol” ) + 0.003661* (1326.93- “PMI-mag” ) A= 3.78318+0.003774*(1326.93-“PMI-mag” )+0.667744*( “Dipole-mag” - 3.86132)+ 0.003478*(1342.26- “Apol” ) A=3.76654+3.32606*(4.34874-“RadOfGyration”)+0.696031*(“Dipole-mag”-3.86132) + 0.004052*(13374.4- “Apol” )
is molecule radius of gyration, both of which correlates with molecular volume, and their coefficients are all negative, which means bigger volume can decrease compounds’ activities. 3.2 CoMFA Model As for research on CoMFA model, the minimal filtration value used for setting of threshold of electrostatic field and steric field, and analysis by partial least square method have some impacts on the establishment of CoMFA model. Therefore, different values are picked to perform the calculation of CoMFA. It can be concluded that different threshold values and minimal filtration values can affect CoMFA model significantly. When both the threshold values of electrostatic field and steric field are 83.68kJ/mol, and minimal filtration value used for performing the analysis by partial least square method is 4.184 kJ/mol, the best result is achieved. First cross validation by leave-one-out method is performed on 16 compounds, the achieved q2 is 0.630, the optimal rank is 4, R2 of established CoMFA model basing on the optimal rank is 0.975, and F=175.937, standard variance(SEE) is 0.244, contribution values of electrostatic field and steric field are 0.576 and 0.424 respectively. It is generally considered by PLS analysis method that predicting ability of established model is more convincing if q2 is above 0.5; when R2 is above 0.9, it is believed that the established model has good self-consistency. 3.3 CoMSIA Model Comparing to CoMFA model, besides electrostatic field(E) and steric field(S), CoMSIA model adds hydrophobic field(H), hydrogen bond receptor(A) and hydrogen bond donor field(D). The align regulation of compounds has small impacts on the results, and it can explain compound’s structure-activity relationship more vividly. In
442
Y. Ji, R. Tian, and W. Lin
practice, different combinations of molecule fields apply to different sets of data. The establishments of CoMSIA model by 5 different combinations of S-E, S-E-H, S-E-H-D, S-E-H-A and S-E-H-D-A are all performed in this study(step size of mesh point and attenuation factor are defaults). The combination of S-E-H-D-A is the best choice. During calculation of CoMSIA model, Gaussian function is adopted to calculate molecular similarity index, function contains a α-parameter called attenuation factor which has much of impact on the results. Hence basing on these facts, the impacts of attenuation factor and minimal filtration value on the results of CoMSIA model are further studied finding that attenuation factor has great impacts on the results, whereas minimal filtration value only has certain impacts. When attenuation factor is 0.3 and minimal filtration value is 4.184kJ/mol, a better model is achieved. Its statistical parameters are as follows: q2=0.634, the optimal rank is 3, R2 of CoMSIA model basing on the optimal rank is 0.932, F=87.089, SEE=0.391, contribution values of the five fields are: S=0.117, E=0.192, H=0.143, D=0.343, A=0.205 respectively. 3.4 Comprehensive Analysis on Three-Dimensional Chorisogram of GFA Model Equation, CoMFA and CoMSIA Mmodel and Results of Molecule Docking
Ⅰ
In the Three-Dimensional chorisogram( )of electrostatic field(E) and steric field(S) of CoMFA model(using CA-1 and CA-26 as reference molecules), region A means introducing big groups into this region is beneficial for increasing activity of compound, B region is the opposite; Region C means introducing groups with negative charge is beneficial for increasing activity of compound, D region is the opposite. In the Three-Dimensional chorisogram( ) of hydrophobic field(H) of CoMSIA model (using CA-4 andCA-8 as reference molecules), region E means introducing hydrophobic association group can increase activity of compound. In the Three-Dimensional chorisogram( ) of hydrogen bond receptor(A) and hydrogen bond donor field(D) of CoMSIA model (using CA-4 and CA-8 as reference molecules), region F and G mean introducing hydrogen bond donor to these regions is helpful for increasing compound’s activity; region H is the opposite. First of all, in the equation establish by the method of GFA, spatial parameters PMI-mag and RadOfGyration are both negative, which means the volume of inhibitor molecule is not supposed to be too big. From the results of molecule docking, it can be concluded that inhibitor molecule is wrapped in residue of amino acid of tubulin. If the molecule is way too big, it can collide with surrounding amino acid residues, the consequence of which can decrease its activity, or even completely lose its activity due to no entering into the active site. Meanwhile the location of benzothiophene group which derives from the modification of B ring of CA-4 is mainly region A, that means introducing big group into this region can increase its activity; 2-position of benzothiophene ring of cis-compound has the distribution of region B, which means this region only allows the introduction of small groups, such as methyl (CA-5, CA-7), if big group is introduced such as benzene ring (CA-9), its activity decreases nearly by 1000 times; 4-hydroxy group has distribution of region B, which means this region is not supposed to allow the introduction of big groups, or its activity can decrease by 10-100 times (such as CA-11,CA-13, CA-21, CA-23). From the docking interaction pattern, it can be concluded that 2-position of benzothiophene ring is wrapped by the
Ⅱ
Ⅲ
QSAR and Molecular Docking Study of a Series
443
bag formed by Thr314, Val315, Asn350, Val351 of β subunit, it only allows introduction of one methyl, introducing big group can cause collision which decreases its activity. 4-hydroxy group is wrapped by the bag formed by Thr179, Val181 of α subunit and Lys352 of β subunit, with which hydrogen bond are formed. Three-Dimensional chorisogram of electrostatic field(E) and steric field(S) of CoMSIA model is basically consistent with that of CoMFA model, only there is a more obvious region C near ortho position of hydroxy group of cis-compound’s B ring, which demonstrates introducing negative charge group could increase its activity, like CA-4. However, by comparing with CA-1, CA-15, CA-3, and CA-17, it is found that this characteristic is not that imperative. Whereas there is region B near 2 position of trans-compound’s benzothiophene ring, which demonstrates only small group like methyl can be allowed to be introduced, introducing big group like benzene ring can decrease its activity(such as CA-10). In the equation of GFA, coefficient of hydrogen bond donor is largest and positive, from molecule docking, no matter cis-compound or trans-compound, hydrogen bond plays a significant role for the interaction between inhibitor and tubulin. As for the contribution of different fields of CoMSIA model, we conclude that hydrogen bond receptor(A) and hydrogen bond donor field(D) contribute more than the others, which means when this data set combines with receptor, formation of hydrogen bond is more important. Fig.2 demonstrates there are region F and G near hydroxy group of B ring of. CA-4, which means hydroxy group of this position can act as both hydrogen bond receptor and hydrogen bond donor, as for molecule docking of CA-5, 4-hydroxy group with equal position acts as hydrogen bond receptor which binds with oxygen atom of carbonyl group of Thr179 amino acid residue of α subunit to form hydrogen bond, meanwhile it binds with Val181 amino acid residue and nitrogen atom of Lys352 amino acid of β subunit to form hydrogen bond as a hydrogen bond receptor, both the compounds have higher activities. There is region F and H near 4-hydroxy group of benzofuran ring of trans CA-8, it implies that this hydroxy group can act as hydrogen donor. The results also demonstrate 4- hydroxy group of benzothiophene ring can form hydrogen bond with oxygen atom of Thr179 amino acid residue of α subunit. With regards to electrostatic field, the results demonstrate there is region C distributing near 4 position of benzothiophene ring, which means there is a negative charge group in this region; there is region D near 1 position of benzofuran ring of trans-compound such as CA-26, which means introducing a positive charge group can increase its activity. Three-Dimensional Chorisogram of electrostatic field(E) and steric field(S) of CoMSIA model is basically consistent with that of CoMFA model, only a more obvious region C near ortho position of hydroxy group of cis-compound’s B ring is observed, which demonstrates introducing negative charge group could increase its activity, like CA-4. However, by comparing with CA-1, CA-15, CA-3, and CA-17, it can be found that this characteristic is not that imperative. From Three-Dimensional chorisogram of hydrophobic field we can know that the region both A and B ring locate are E, which means hydrophobic association group is suitable for this region. Apol and Dipole-mag are both electrical parameters, but their coefficients are contrary, which seems inconsistent. Combining the Three-Dimensional chorisogram of electrostatic field(E) and steric field(S) of CoMFA model and CoMSIA model and molecule docking results, it can
444
Y. Ji, R. Tian, and W. Lin
be found that the region where A and B ring locate is suitable for introducing hydrophobic association group, the reason is that position of inhibitor molecule is the hydrophobic bag wrapped by residue of amino acid of tubulin. Hydrophobic association group makes the combination of inhibitor and targeted enzyme easier, however the combination also needs the formation of hydrogen bond, so it is necessary for inhibitor to have polar group, this is also demonstrated by the positions of positive and negative charges in electrostatic field(E). Red region distributing near 4 position of benzothiophene ring and region D distributing near 1 position of benzofuran ring of trans-compound such as CA-26 all demonstrate this region is supposed to have a polar group participating in the formation of hydrogen bond. In this article, quantitative relationship between inhibitor compounds and some descriptors are achieved by the method of Genetic Function Analysis(GFA), the equation of quantitative structure-activity relationship is established; and visualized three dimensional structure-activity model adopting the method of CoMFA and CoMSIA is established, both of which have good self-consistence and predicting ability, and can be mutually explained and validated by results of molecule docking. So our works provide some useful information for further modifying and optimizing the structure of Combretastatin derivatives.
References 1. Si, Q., Mu, H., Yan: GIndividualized treatment models based on blood supply characteristics in hepatocellular carcinoma using color Doppler hemodynamics. Hepatogastroenterology 3, 334–341 (2007) 2. Vyas, J.M., Kim, Y.M., Van der Veen, A.G.: Tubulation of class II MHC compartments is microtubule dependent and involves multiple endolysosomal membrane proteins in primary dendritic cells. J. Immunol. 1, 7199–7210 (2007) 3. Roth, D.M., Moseley, G.W., Glover, D.: Microtubule-Facilitated Nuclear Import Pathway for Cancer Regulatory Proteins. Traffic 1, 673–686 (2007) 4. Usui, T.: Actin- and microtubule-targeting bioprobes: their binding sites and inhibitory mechanisms. Biosci. Biotechnol. Biochem. 2, 300–308 (2007) 5. Rappl, C., Barbier, P.: Interaction of 4-arylcoumarin analogues of combretastatins with microtubule network of HBL100 cells and binding to tubulin. Biochemistry 8, 9210–9218 (2006) 6. Gurjar, M.K., Wakharkar, R.D., Singh, A.T.: Synthesis and evaluation of 4/5-hydroxy-2,3-diaryl(substituted)-cyclopent-2-en-1-ones as cis-restricted analogues of combretastatin A-4 as novel anticancer agents. J. Med. Chem. 4, 1744–1753 (2007) 7. Sun, L., Vasilevich, N.I., Fuselier, J.A.: Abilities of 3,4-diarylfuran-2-one analogs of combretastatin A-4 to inhibit both proliferation of tumor cell lines and growth of relevant tumors in nude mice. Anticancer Res. 1, 179–186 (2004) 8. Simoni, D., Romagnoli, R., Baruchello, R., Pisano, C.: Novel combretastatin analogues endowed with antitumor activity. J. Med. Chem. 1, 3143–5312 (2006) 9. Eikesdal, H.P., Bjerkvig, R., Raleigh, J.A.: Tumor vasculature is targeted by the combination of combretastatin A-4 and hyperthermia. Radiother. Oncol. 12, 313–320 (2001) 10. Griggs, J., Brindle, K.M., Metcalfe, J.C., Hill, S.A., Smith, G.A.: Potent anti-metastatic activity of combretastatin-A4. Int. J. Oncol. 10, 821–825 (2001)
A Software Method to Model and Fabricate the Defective Bone Repair Bioscaffold Using in Tissue Engineering Qingxi Hu, Hongfei Yang, and Yuan Yao Rapid Manufacturing Engineering Center, Shanghai University, Shanghai 200444, P.R. China {huqingxi,honffe,yaoyuan}@shu.edu.cn
Abstract. In this paper, biologic properties and physical properties of the bioscaffold are studied. The requirements of the model of defective bone repair bioscaffold are proposed. Then, a new modeling method is presented, which can construct a defective bone repair bioscaffold 3d digital model that has the macro-shape and macro-pores. This method combines the image processing technology, 3d-reconstructing technology, and a new hole filling method, in which a mapping method is developed. It can be used in both symmetrical and unsymmetrical defective bone. By programming, this method was successfully implemented and the repair bioscaffold 3d digital model was constructed. Through RP process, using polymeric blends, the physical model was obtained, which meets the requirements of the bioscaffold.
1 Introduction The traditional method to repair the defective skull is of low efficiency and is not very feasible, which will exceed the desired time of optimal clinical treatment. Using the tissue engineering method to repair the bone, tissue, and organ is a research hotspot at the moment. The modeling technology of bioscaffold using in cell seeding and implantation is one of the key technologies in tissue engineering. The bioscaffold, which is one of the necessary parts, becomes an important research field [1]. To get a bioscaffold, firstly, a digital model is required. At present, for bionic modeling of bone repair, researchers mostly adopt conventional CAD methods [2]. For our modeling of defective bone repair bioscaffold begins with the medical CT images, using a single CAD software cannot get the model that meets the macro-shape and macro-pores requirements of the bioscaffold. In this paper, the requirements of the defective bone repair bioscaffold model are concluded, and a new modeling method is proposed, which can construct a defective bone repair bioscaffold digital model that has the macro-shape and macro-pores. This method combines the image processing technology, 3d-reconstructing technology, and a new hole filling method. It can be used not only in symmetrical situation [2], but also in the unsymmetrical situation of the defective skull. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 445–452, 2007. © Springer-Verlag Berlin Heidelberg 2007
446
Q. Hu, H. Yang, and Y. Yao
2 Requirements of the Bioscaffold The function of the bioscaffold is to provide a microenvironment, which will be beneficial to the adherence, multiplication and the function’s exertion of the cell. And then the repair and the substitution of the tissue or organ will be implemented. From [3-5], according to the requirements and the analysis of pore size and porosity of the bioscaffold, the requirements of the model can be concluded: a) Macro-shape: Match the defective part of the defective bone. b) Structure: Interconnectivity, high porosity. c) Total porosity: >60%. d) Pore: 500-800μm diameter size and 10-30% software constructed porosity. All of this will be considered when modeling the repair bioscaffold.
3 Modeling and Fabricate the Defective Bone Repair Bioscaffold In this part, the method of the modeling and fabrication of the defective bone repair bioscaffold is presented. There’re four main stages to build the model. 3.1 Reconstructing Defective Bone Model To construct the model of the defective part of the bone, the 3d model of the skull should be obtained. This stage includes three steps. Image Processing: The reconstruction of the defective bone model begins with the process of the medical CT images. Because of the precision and the electronic parts, noises of images can not be avoided. As the quality of the images is an important factor in reconstruction, the linear transform and median filter are adopted to improve the quality of the images. 3D-Reconstruction: After the process of the images, 3d reconstructing is required to build the defective bone model. The general algorithm is the MC method [6]. This algorithm is a kind of isosurface distilling method. It has a nicer speed and it is efficient on normal PC. An improved MC algorithm is adopted here [7]. Triangle Reduction: A large number of triangles that represent the surface of the defective bone model are created after 3d reconstruction. It has negative effect, so a triangle reduction is needed. In the paper the reduction method of Stan Melax [8-9] is extended. When compute the collapse cost of an edge to delete a vertex, the edge length, normal changes[9], and two new rules are considered: Sharp triangle: A sharp triangle is defined by the minimum angle (θ) of the new created triangles. Ifθis smaller than the pre- specified angle, this collapse will not be done and the sharp triangles will not be created. Area of the new created triangles: The rule of area is the same with the factor of the sharp triangle rule. Area is the parameter of this rule. 3.2 Repairing the Defective Bone In this stage, the main purpose is to repair the defective skull, and get the macro-shape of the defective part of the skull. As defect can occur with both symmetrical
A Software Method to Model and Fabricate the Defective Bone Repair Bioscaffold
(a)
447
(b)
Fig. 1. (a) Symmetrical defect (b) Unsymmetrical defect
(see figure 1 a) and unsymmetrical situation (figure 1 b), to solve any defective types, that is to say, to repair the defective bone with defects occur in any situations, a hole filling method should be developed. Peter Liepa proposed a geometric method for filling holes in triangular meshes [10]. But this method is used in a single-layer, non-closed triangular mesh. Here, a closedsurface model’s hole filling method is proposed. The aim of this stage is to build a meaningful surface, which matches the macro-shape of the defective part of the skull. The main steps are described as fellow. Boundary Identification To improve the identification speed, enlightening from Ping’s triangular mesh feature classification[11] a boundary identification method is proposed, which is base on a hypothesis: An edge that begins with a vertex is a boundary edge, if the following two rules are contented: (1) The direction of the boundary edge should be most consistent with the direction of the vector that begins with start vertex and the destination vector. (2) The normals of its two adjacent triangles have the least angle. If more than one vertex could be selected, the edge length will be considered. Rule (2) is more prior than rule (1). 1. Some guide-points are specified by users. Put G ← {G1, G2, …, Gn}. G is a set of the guide-points. It can be a vector or an array, etc. 2. For i = 1, 2, …, n, do BoundarySearch(i, (i+1) % n) to get the boundary edges of the defective bone between vertex i and i+1. Put these boundary edges to a set P. The function BoundarySearch is a recursive function. Its initial input is two guidepoints. Each recursion will select an edge as the boundary edge. For example, G1 and G2 are the input (see figure 2). First, the angles between G1’s edges (e1, e2, …, e8) and vector G1G2 will be calculated. The less two, e4 and e5 will be chosen as the candidates. Then Calculate the angles of T1&T2 (share edge e4), and T2&T3 (share edge e5). This can be got by there normal. Angle (T2&T3) is smaller than Angle (T1&T2), so e5 will be the result of this recursion. And the next recursion’s input will be V1 and G2. Other situations also follow the rules of this algorithm. Triangulation The Delaunay triangulation building method will be implemented on the boundary edges, by which a polygon is constructed. A Delaunay triangulator is adopted, which is a part of a C program Triangle[12]. By translating the boundary into the specified
448
Q. Hu, H. Yang, and Y. Yao
Fig. 2. Boundary identification
format of the adopted triangulator, it will output the desired triangular mesh. Here, the boundary points will be projected on the coordinate Cf (constructed in the next step, see Mapping Fairing) to be convenient to the triangulator’s process. This step will finish the surface construction and get a planar triangular mesh T, which will be adjusted to match the curvature of the defective part of the bone by the next step: Mapping Fairing To keep consistent with the curvature of the defective part of the skull, the planar triangular mesh created in operation Triangulation should be adjusted. A mapping fairing method is developed here. There are three steps: 1. Coordinates construction: A three-dimensional coordinate Cf will be created by specifying three points among the boundary points. The surface equation can be calculated. Let it be f(x, y, z) = 0. Then a series of plane will be built, each of which is vertical to Cf. Let this planes is S = {S1, S2, …, Sn}. 2. Reference points construction: The planar triangular mesh’s adjustment will be according to a set of points, which is named reference points. These referenced points should be on an implicit surface, which is minimally distinguishable from the surrounding mesh. A set of curves will be constructed to simulate the implicit surface. The more curves be constructed, the better accuracy will be got. When constructing the curves, planes S created in step 1 will be used. Extract the intersecting line (represented by a sampled points), and get a series of lines: L={L1, L2, …, Ln}. Each line is composed by sampled points. Let Li = {Pi1, Pi2, …,Qi1, Qi2, …, Pin}. Qi1 and Qi2 are points on the boundary of the defective bone. Spline curve will be used to create the points on the implicit surface. And a set of spline curves will be created: LQ = {LQ1, L Q2, …, L Qn}. The same as L, L Qi is composed by points. Let L Qi = {Qi1, Ri1, Ri2, …, Rin, Qi1}. Here, Cardinal curve is adopted. The reference points will be stored in LQ. 3. Mapping relation setup Searching Mapping: From the reference points LQ, the mapping relation (R: LQ → T) will be constructed. Each point of the reference points will be mapped to a point from the triangular mesh T. Both LQ and T will be projected to the coordinate Cf. The new set of these points (L’Q, T’) can be calculated by coordinate transforming. For each source point of L’Q, a target point from T’ will be determined by the rules of mapping: (1) For a source point, if a same point PT can be found in T’, PT will be the target point. (2) If can not found a same point, the source point will drop into the interior of a triangle, and the target point will be determined by the minimum distance
A Software Method to Model and Fabricate the Defective Bone Repair Bioscaffold
449
from the source point to the three point of the surround triangle. If the source point is on the edge of a triangle, the repeated calculation can be avoided in the process of programming. Interpolating mapping: The reference points LQ are created by created curves. According to the positions (index in a curve sample set) of these points, a series of average coordinate closed contour lines is constructed. The value of a contour is the average position of the points on the contour line. Then the other points of T will be mapped to the implicit surface by interpolating operation between two contour lines whose area the points drop into. Here, a linear interpolating is adopted. If point (x, y, z) drops into the area of contour line 1 (CL1, value = (x1, y1, z1)) and contour line 2 (CL2, value = (x2, y2, z2)), then the point (x, y, z) will mapped to (x, y, z + (z2 – z1) / k). k is an average proportion of x, y to x1, y1.
D ’
Fig. 3. Mapping of the points. Red points are selected points of T. A’ has the same x and y value with A. B drops into a triangle Tri and the distance between B’ and C is the shortest one. D’ is the interpolation operation on point D, contour CL1 and CL2.
4. Fairing: When finished the processes above, a triangular mesh will be generated, which is not very smooth. So a fairing operation will be done to make the mesh smooth. Taubin’s smoothing approach[13, 14] is adopted here.
Δ Pi =
∑
w ij ( v
j
− vi ) ,
j∈ i *
∑
w
ij
=1
(1)
j∈ i *
Pi’ = (1 + λΔ) (1 + μΔ) Pi
kPB = 1/λ + 1/μ
(2) (3)
wij is weight value. i* is a set that includes the neighbors of point vi. P’ is the new point after smoothing. P is the old point. λ is the weight and its value is between 0 and 1. μ is the negative scale factor and μ < -λ. kPB is cutoff frequency. Values from 0.01 to 0.1 will produce good results[14]. To preserve the shape of the mesh, fixed points are introduced. The boundary points and mapped points will be regarded as constraints and these points are fixed when smoothing. Extruding After repairing the defective bone, a solid model will be obtained by extruding the surface along the normal direction of the related triangles. The triangles are created in Triangulation operation, and have been done a mapping operation. The model of the
450
Q. Hu, H. Yang, and Y. Yao
defective bone repair is obtained at end of this step, which has a macro-shape of the repair. 3.3 Constructing the Macro-pores The macro-pores are necessary for the growth of the cells. So according to the requirements of the bioscaffold, proper structure and a well designed size and the porosity of the macro-pores are very important. This can be done by a Boolean operation between the repair and a designed interior structure model. 3.4 Manufacture the Bioscaffold Via RP Apparatus At present, there are mainly five RP technologies. That is SLS, SLA, FDM, LOM, and 3DP. Here, SLS is chosen to process the polymeric blends.
Fig. 4. (a) Hole of the defective bone (b) Repaired bone (c) Extruded (d) The Model of defective bone repair bioscaffold. Pore size is 600μm. Porosity is bout 25%. (e) The Bioscaffold fabricated by SLS. Materials are polymeric blends.
4 Experiment Result and Discussion 4.1 Experiment Result Using the proposed method, the repair bioscaffold is constructed. And the result is shown in figure 4. In this paper, kPB is set to 0.1. It’s the same with Taubin’s[14]. 4.2 Discussion Many geometric methods to fill holes are special for a single layer triangular mesh, which is not suitable for a closed triangular mesh. In this paper, the hole filling method is a method of mapping approach. A set of curves are constructed and the
A Software Method to Model and Fabricate the Defective Bone Repair Bioscaffold
451
points of a planer mesh are mapped to an implicit surface by two kinds of mapping operation. The sampled points on the constructed curves in fact are constraints in the process of fairing. This will assure the curvature of the objective. In the proposed method, by user selecting limited number of guide-points, the boundary search could be done quickly. However, the study of a fully automatic boundary identification method is of significance for accelerating the process of hole filling. In the procedure of mapping fairing, there are two kinds of mapped points. One is the points on the constructed curves, and the other is the interpolating points. The former is the result of direct mapping, and the latter is the z interpolation of two contour lines. The direct mapped points in fact are the constraints when fairing. While the interpolation created points are objects to be smoothed. When constructing the macro-pores, the interior structure model is very important for the growing of cells. Here, the interior model is a simple 3d interconnect structure, which is 90o in three directions (x, y, z). So studying the relation between the interior structure and the cells’ growth is very necessary. Limited to the process of SLS, some of the macro-pores were not well fabricated. So process of RP apparatus is also to be considered in further study, which can provide useful information for building more accurate defective bone repair bioscaffold model.
5 Conclusion In this paper, biologic property and physical property of bioscaffold is studied. A new modeling method is proposed, which can construct a defective bone repair bioscaffold 3d digital model that has the macro-shape and macro-pores. This method combines the image processing technology, 3d-reconstructing technology, and a new hole filling method. It can be used in both symmetrical and unsymmetrical defective bone. By programming, this method was successfully implemented and the repair bioscaffold 3d digital model was constructed. Through RP apparatus, using polymeric blends, a nicer physical model was obtained, which meets the requirements of the bioscaffold. In our proposed method, the boundary identification method is not fully automatic, and the interior model is just a simple 3d interconnect structure. So, the automatic boundary identification method, the relation between interior structure and the cell seeding, and the process of RP apparatus will be the main focus in our further study. Acknowledgments. Thanks to Jiang Ying and Zhang Quan for their help in this research. This project is supported by Shanghai Education Development Foundation Fund (No. 06AZ029) and Shanghai Splendid Youth Teachers Special Research Foundation Fund (No. B. 7-0109-07-011).
References 1. Chuanglong, H., Yuanliang, W., Lihua, Y., Jun, Z., Liewen, X.: Recent Advances in Natural Derived Extracellular Matrix Materials in Bone Tissue Engineering. Chinese Biotechnology 23(8), 11–17 (2003) 2. Liu, H., Hu, Q., Li, L., Fang, M.: A Study of the Method of Reconstructing the Bionic Scaffold for Repairing Defective Bone Based on Tissue Engineering. IFIP 207, 650–657 (2006)
452
Q. Hu, H. Yang, and Y. Yao
3. Ho, S.T., Hutmacher, D.W.: A comparison of micro CT with other techniques used in the characterization of scaffolds. Biomaterials 27, 1362–1376 (2006) 4. Deville, S., Saiz, E., Tomsia, A.P.: Freeze casting of hydroxyapatite scaffolds for bone tissue engineering. Biomaterials 27, 5480–5489 (2006) 5. Qingxi, H., Xianxu, H., Liulan, L., Minglun, F.: Design and Fabrication of Manual Bone Scaffolds via Rapid Prototyping. ITIC 1, 280–283 (2006) 6. Lorensen, W.E.: Marching Cubes: A High Resolution 3D Surface Construction Algorithm. Computer Graphics 21(4), 163–169 (1987) 7. Delibasis, K.S., Matsopoulos, G.K., Mouravliansky, N.A., Nikita, K.S.: A novel and efficient implementation of the marching cubes algorithm. Computerized Medical Imaging and Graphics 25, 343–352 (2001) 8. Melax, S.: A Simple, Fast, and Effective Polygon Reduction Algorithm, Game Developer Magazine (November 1998), http://www.melax.com 9. Shi-xiang, J., Jian-xin, Y.: Model Simplification Algorithm Based on Weighed Normal Changes. Journal of system simulation 17(9) (September 2005) 10. Liepa, P.: Filling holes in meshes. In: Proceedings of the 2003 eurographics/ACM SIGGRAPH symposium on geometry processing (SGP’03), pp. 200–205 (2003) 11. Xueliang, P., Laishui, Z., Shenglan, L.: Triangle Mesh Smoothing Method with Feature Preservation. Computer engineering and application 12 (2006) 12. Shewchuk, J.R.: Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator, http://www.cs.cmu.edu/ quake/tripaper/triangle0.html 13. Guangming, L., Jie, T., Huiguang, H., Mingchang, Z.: A Mesh Smoothing Algorithm Based on Distance Equalization. Journal of Computer-Aided Design & Computer Graphics 14(9) (September 2002) 14. Taubin, G.: A signal processing approach to fair surface design. In: ACM, Proceedings of the 22nd annual conference on Computer graphics and interactive techniques (September 1995)
Using Qualitative Technology for Modeling the Process of Virus Infection* Hailin Feng1,2 and Chenxi Shao1,3,4 1
Computer Science Dept. of University of Science and Technology of China, Hefei 230027, China 2 School of Information Science and Technology, ZheJiang Forestry University, Lin’an 311300, China
[email protected],
[email protected] 3 Simulation Center, Haerbin Institute of Technology, Haerbin, 150001, China 4 Anhui Province Key Laboratory of Software in Computing and Communication, Anhui, Hefei 230027, China
Abstract. The quantitative analysis of viral infection dynamical model can’t be processed easily due to the lack of complex quantitative knowledge in such biological system; therefore, the methods based on qualitative analysis become an alternative solution to researches in the complicated biological process. In this paper the qualitative technology is introduced to model and analyze the process of virus entry. A rough model is proposed first to be the foundation of further research. With more knowledge in the process, the framework is expanded by inserting the qualitative description of different kinds of factors that have interactive influence on the process of virus entry. The factors are described qualitatively in influencing degree, and the qualitative model is built based on the interaction among these influencing factors and the viruses and cells. The judging matrices are constructed according to the qualitative model and the coherence of these matrices is verified. A qualitative analysis about the process is given finally.
1 Introduction In the field of virus infection, the researching methods can be roughly divided into three categories: the theoretical study, the experimental study and the emulational study. Generally speaking, in the traditional researching processes, complete quantitative knowledge is mostly needed to be the prerequisite to more advanced studies. Unfortunately, quantitative information necessary to evaluate and analyze the process of viral entry is usually hard to gain. Thus, with only incomplete quantitative knowledge the traditional qualitative methods to model the process are less effective. For example, a number of studies on influenza virus have shown that virus particles enter the cell by endocytosis[1-2]. Moreover, amantadine, an inhibitor of influenza virus infection[9], has been shown to act at an intracellular location[10]. Other studies suggest that influenza virus, like Sendai virus, may fuse directly with the plasma *
Supported by Key Project of Natural Science Foundation of China (No.60434010).
K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 453–461, 2007. © Springer-Verlag Berlin Heidelberg 2007
454
H. Feng and C. Shao
membrane of the cell[11]. The process of viral entry can’t be fully quantitatively described even now[5-6]. One reason is that the entry mechanisms used by many of them remain unclear. Another important reason is that lots of quantitative knowledge in the process can’t be gained easily. In the process of viral entry, there are lots of factors that have influence on viruses and cells[7-8], e.g. the percentage of virus binding to cells increases when temperature gets higher [12]. Karl s.Matlin[13] found that when PH>5, the percentage of virus binding to cells decreases if PH decreases. The existence of positive ion in the environment may also affect the process of viral binding and entry. However, as discussed above, the influence in the environment is difficult or impossible to express and analyze fully quantitatively. What we know is often the qualitative interaction among the variables and factors. So the qualitative technology comes to become our alternative solution to researches in the complicated biological process. Researchers have also shown much interest in such complex systems with incompletely known knowledge. In reference [3] the authors use a qualitative method to describe the interaction among the factors existing in viral infection system and build qualitative viral infection model. If the system is given as ordinary differential equation, B.Kuipers writes its qualitative model in the formula of qualitative differential equation (QDE) and reasons the process qualitatively[14]. To address the problem, in this paper we introduce a qualitative modeling frame to promote the research in the field. The basic idea is to model the process of virus entry, taking the influencing factors into account qualitatively. We propose a rough framework to be the foundation of further research. With more knowledge in the process, the framework is expanded by inserting the qualitative description of different kinds of factors that have interactive influence on the process of virus entry. The factors are described qualitatively in the influencing degree, and the qualitative model is built based on the interaction among these influencing factors and the viruses and cells. The judging matrices are constructed according to the qualitative model and the coherence of these matrices is verified finally. Based on the technology connecting qualitative to quantitative technology, we joint the behavior of viruses and the environment and the cells together.
2 The Preliminary Knowledge To qualitatively model the interaction between viruses and cells and the influence factors in the environment, the first difficult is to qualitatively specify the involved factors among the system, and the specification of interaction terms should be in favor of the reasoning of qualitative simulation. We propose the model that can describe the initial framework of the process, as shown in Fig.1.
Fig. 1. The framework of the interactive process
Using Qualitative Technology for Modeling the Process of Virus Infection
455
Here the V indicates the individual situation of the virus, and E is the description of whole environment, C is the situation about the cell, B is the prediction of the possible behaviors of the virus. However, the model is so rough that it can’t play much role as a real model. So we should expatiate on the detailed individual situation and environment of the process to make the model more realistic. An advantage is that the model can be worked further and step by step with more quantitative and qualitative knowledge about the process. At first, it is possible that we may have little information about the object need to be modeled. Then more quantitative and qualitative knowledge can be inserted into the initial framework. And by keeping adding information to it, the model can be more and more complicated and precise. Then we can give the specification of influence. In the process of virus entry, the first step is the interaction between the virus attachment protein (VAP) on the virus and the viral receptor on the surface of cells[1], which is the pivotal step to determine whether the virus would infect the cells. There are lots of factors that will influence the process of attachment[2]. The percentage of virus binding to cells increases when temperature gets higher between the range from 0℃ to 37℃[12]. Karl s.Matlin[13] found that when PH>5, the percentage of virus binding to cells decrease if PH decreases. The existing of positive ion in the environment may also affect the process of viral binding and entry. The static gravitation between the virus attachment virus and the receptor on the cells play a central role at the primary stage of attachment. The positive ion may accelerate the process of attachment and entry. After the step of attachment, the virus penetration comes to be the next stage, which is the process that viruses penetrate into the host cells by using different ways. There are four different ways for viruses to penetrate into the host cells: injection, viropexis, envelope fuse, and other ways to penetrate directly. The virus uncoating is the process that the virus infective nucleic acid release from the coat of viruses, and the ways to uncoat are different depending on what kind of virus it is and whether it contains the envelope. The existence of enzyme and the structure of the cellular skeleton play a vital role in this step. There are lots of factors that influence the behaviors of virus in the process of viral entry, and these factors can be divided into three categories: 1) Individual factors: The individual factors include some situations about the viruses themselves, and some evaluating standard such as activity and size and structure character. 2) Environment factors: The environment factors include temperature, PH value and positive ions. 3) Cellular factors: The cellular factors include the disturbing of receptor sites and content of enzyme. The factors listed above may have different importance of influence on the entry process. Here we use the formula to express the relationship between the factors: H i = f ( Pi ) , Pi indicates the factors that have influence on the process; H i
456
H. Feng and C. Shao
indicates the possible of the virus to bind and entry cells, and f ( ) denotes the function of influence degree. According to the available knowledge in the process of viral entry, we can give a table to show the influence degree of the factors. All the degrees are expressed in certain number based on quantitative and qualitative information. Table 1. The table of influence degree The individual factors 1 2 3 4 5 6 7 8
Virus activity the structure of virus attachment protein The environment factors temperature PH value Positive ion The Cell factors Receptor site enzyme The structure of cellular skeleton
Influence degree 9 8 5 5 5 7 5 6
Of course, the value can also be certain one other than those in the table, and here we just qualitatively simplify the expression. According to the description above, we can gain the figure of relation among the system of cells and viruses and the environmental influence factors.
Fig. 2. Model framework of virus behavior in the entry process
After we have discussed the possible influencing factors that may play their role in the process of virus entry, and the model framework is shown in figure, we will build the qualitative model in the next section based on the interaction among these influencing factors and the viruses and cells.
Using Qualitative Technology for Modeling the Process of Virus Infection
457
3 The Qualitative Modeling of Virus Behavior We model the process qualitatively combined with the Analytic Hierarchy Process(AHP), which is an effective approach proposed by T.L.Saaty in 1970’s that can handle the subjective decision-making questions. The building of hierarchical structure is the key step. Generally speaking, the preconcerted goal of the problem is defined as the goal-hierarchy, the middle level is usually the rule and sub-rule, and the lowest level is the behavior one. According to the table of influence degree, we can get the initial model: The behavior level
The rule level
The goal level
Virus Activity Structure of VAP
The success rate of attachment
Temperature PH Value The behaviors of virus
Positive Ion Cellular Receptor Enzyme
The success rate of peneration
The success rate of uncoating
Cellular Skeleton
Fig. 3. Qualitative model of the entry process
When the hiberarchy model of the process is created, we should consider how to analyze the weight and verify the coherence. The judging matrixes should be constructed to do the weight analysis[4]. The evaluating matrixes indicate the contrast of relative weightiness between the items in certain level and in its above level. The values of elements in the evaluating matrixes imply the knowledge of relative weightiness of every element in the researching field, so the participance of relative experts and the investigation of the field are needed to ensure the quality. The relative weightiness can shown the relative importance between two factors based on biological knowledge. Aij indicates the relative importance degree of the factor i to factor j. The bigger the value of Aij is, the more important the factor i is than j. For example, when the value is 1, it means the two factors may have the same importance to the object. If the value is defined as 3, it means the factor i is a little more important than factor j. Analogically, the value of Aij increases as the relative importance of factor i over j increases.
458
H. Feng and C. Shao
We define the meaning of reciprocal is: ai j = 1/ a ji , which implies the corresponding elements in the matrix are reciprocal. Subsequently, we can gain the evaluating matrixes: ⎛ 1 ⎜ 1/ 3 ⎜ ⎜ 1/ 7 ⎜ 1/ 6 A=⎜ ⎜ 1/ 7 ⎜ ⎜ 1/ 5 ⎜ 1/ 8 ⎜⎜ ⎝ 1/ 7
7 ⎞ 5 ⎟⎟ 1/ 6 1 2 3 1/ 9 1/ 6 1/ 7 ⎟ ⎟ 1/ 5 1/ 2 1 3 1/ 8 1/ 5 1/ 6 ⎟ 1/ 7 1/ 3 1/ 3 1 1/ 7 1/ 6 1/ 8 ⎟ ⎟ 1 9 8 7 1 5 3 ⎟ 1/ 3 6 5 6 1/ 5 1 1/ 2 ⎟ ⎟ 1/ 5 7 6 8 1/ 3 2 1 ⎟⎠ 3 1
7 6
6 5
7 7
5 1
8 3
In the matrix A, the element Aij indicates the importance grade that the factor i is higher than j, the importance grade is described and evaluated qualitatively by using certain number. 5 7⎞ 7 8⎞ ⎛ 1 7 9 ⎞ ⎛ 1 ⎛ 1 2 3⎞ ⎛ 1 ⎜ ⎟ B = ⎜1/ 7 1 1/ 2 ⎟ B = ⎜ 1/ 5 1 2 ⎟ B = ⎜1/ 7 1 2 ⎟ B1 = ⎜ 1/ 2 1 2 ⎟ 2 ⎜ ⎟ 3 ⎜ ⎟ 4 ⎜ ⎟ ⎜ 1/ 9 2 1 ⎟ ⎜ 1/ 7 1/ 2 1 ⎟ ⎜ 1/ 8 1/ 2 1 ⎟ ⎜ 1/ 3 1/ 2 1 ⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
⎛ 1 1/ 3 1/ 7 ⎞ ⎛ 1 1/ 3 1/ 7 ⎞ 5 7⎞ ⎛ 1 ⎛ 1 6 9 ⎞ ⎜ ⎟ B = ⎜ 1/ 5 1 3 ⎟ B = ⎜ 3 1 1/ 7 ⎟ B = ⎜ 3 1 1/ 5 ⎟ B5 = ⎜ 1/ 6 1 1/ 3 ⎟ 6 ⎜ ⎟ 8 ⎜ ⎟ ⎟ 7 ⎜ ⎜7 5 ⎟ ⎜7 7 ⎟ ⎜ 1/ 7 1/ 3 1 ⎟ ⎜ 1/ 9 3 1 ⎟ 1 1 ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ In the matrix B, the element Bij implies the qualitatively relative value of impact that the individual activity of virus has on the element i over the element j in the next level.
4 The Weight Vector and the Verifying of Coherence The coherence of elements in the matrixes can be used to evaluate the quality of the matrix and model. However, a full coherence is usually unavailable due to the complexity during the viral entry process and the lack of quantitative knowledge in the process, so an approximate coherence can also match the requirement. 4.1 The Weight Vector
To verify the coherence of the matrixes, the definition of relative computation should be given. The formula to compute the coherence is: C.I = (λmax − n) /(n − 1) . The unitary matrix A* can be obtained according to the rows: A * = a ij ij n ∑ aij i =1
Using Qualitative Technology for Modeling the Process of Virus Infection ⎛ 0.444 ⎜ ⎜ 0.148 ⎜ 0.063 ⎜ 0.074 A *ij = ⎜ ⎜ 0.063 ⎜ ⎜ 0.089 ⎜ 0.056 ⎜⎜ ⎝ 0.063
459
0.496 0.190 0.180 0.167 0.632 0.410 0.367 ⎞ ⎟ 0.165 0.163 0.150 0.167 0.126 0.154 0.262 ⎟ 0.028 0.027 0.060 0.071 0.014 0.009 0.007 ⎟ ⎟ 0.033 0.014 0.030 0.071 0.016 0.102 0.008 ⎟ 0.024 0.009 0.010 0.024 0.018 0.009 0.007 ⎟ ⎟ 0.165 0.244 0.240 0.167 0.126 0.256 0.157 ⎟ 0.055 0.163 0.150 0.143 0.025 0.051 0.026 ⎟ ⎟ 0.033 0.190 0.180 0.190 0.042 0.102 0.052 ⎟⎠
Then get the sum of the whole lines. ⎛ 2.886 ⎞ ⎜ ⎟ ⎜ 1.335 ⎟ handler ⎜ 0.279 ⎟ ⎜ ⎟ 0.348 ⎟ V =⎜ ⎜ 0.164 ⎟ ⎜ ⎟ ⎜ 1.444 ⎟ ⎜ 0.669 ⎟ ⎜⎜ ⎟⎟ ⎝ 0.852 ⎠
⎛ 0.361 ⎞ ⎜ 0.167 ⎟ ⎜ ⎟ ⎜ 0.035 ⎟ it unitarily: ⎜ ⎟ 0.044 ⎟ W =⎜ ⎜ 0.021 ⎟ ⎜ ⎟ ⎜ 0.181 ⎟ ⎜ 0.084 ⎟ ⎜⎜ ⎟⎟ ⎝ 0.107 ⎠
To verify the coherence: λ = 1 ∑ ( AW )i max n i wi 1 ⎛ 3.8440 1.8323
0.3148
0.2753
⎛ 3.8440 ⎞ ⎜1.8323 ⎟ ⎜ ⎟ ⎜ 0.3148⎟ ⎜ ⎟ 0.2753⎟ C = AW = ⎜ ⎜ 0.1760⎟ ⎜ ⎟ ⎜1.9752 ⎟ ⎜ 0.8305⎟ ⎜⎜ ⎟⎟ ⎝1.0973 ⎠
0.1760 1.9752
0.8305 1.0973 ⎞
λmax = ⎜ + + + + + + + ⎟ 8 ⎝ 0.361 0.167 0.035 0.044 0.021 0.181 0.084 0.107 ⎠ = 9.538 The unitary eigenvector corresponding to the eigenvalue λ = 9.538 is: W = {0.361 0.167 0.035 0.044 0.021 0181 0.084 0.107}
So we get: C .I . =
λmax − n n −1
= ( 9.538 − 8 ) /(8 − 1)=0.219
The standard to verify the coherence is: C.R = C.I / R.I . Here the average random coherence standard R.I. can be gained as the arithmetic mean of random eigenvalue of judging matrix, which should be taken for lager numbers of times randomly. Table 2. The coherence table of RI value
n
1
2
3
4
5
6
7
8
R.I
0
0
1.58
1.90
2.12
2.24
2.32
2.41
C.R=0.219/2.41=0.090. Generally speaking, the approximate coherence of the judging matrix can be accepted if C.R<0.1, which means the coherence of the judging matrix is also permitted. Using the same computing method, we can verify the coherence of all the matrices B.
460
H. Feng and C. Shao Table 3. The table of total weighted vector
K
1
2
3
4
5
6
7
8
WK1
0.162
0.055
0.091
0.086
0.417
0.372
0.059
0.099
WK2
0.084
0.303
0.364
0.229
0.058
0.079
0.243
0.284
WK3
0.581
0.105
0.364
0.229
0.417
0.372
0.115
0.099
λK
4.008
4.009
4.001
4.062
4.003
4.005
4.114
4.004
CIK
0.003
0.003
0.002
0.021
0.001
0.002
0.038
0.001
RIK
0.90
0.90
0.90
0.90
0.90
0.90
0.90
0.90
4.2 The Total Weightiness and Qualitative Analysis
We can compute the weightiness between the levels: W 1 = 0.172 , and W2=0.195, W3=0.548. The weightiness vector from the rule level to the goal level is {0.172, 0.195, 0.548}. And the total CR of the levels is computed as 0.00626, which is lower than 0.1, so it can pass the verification of coherence. From the computing result above, the weightiness can be ordered as: B1
5 Conclusions The quantitative analysis of viral infection dynamical model can’t be processed easily due to the lack of complex quantitative knowledge in such biological system; therefore, the methods based on qualitative analysis become our alternative solution to researches in the complicated process. In this paper we propose a rough model to be the foundation of further research. With more knowledge in the process, the framework is expanded by inserting the qualitative description of different kinds of factors that have interactive influence on the process of virus entry. The factors are described qualitatively in the influencing degree, and the qualitative model is built based on the interaction among these influencing factors and the viruses and cells. The judging matrices are constructed according to the qualitative model and the coherence of these matrices is also verified. Because of the uncertainty and incomplete knowledge about the environment and influence, the application of qualitative method to address the interaction of variable in complex system is effective. Another factor is that the quantitative information
Using Qualitative Technology for Modeling the Process of Virus Infection
461
about the relationship between the variables in system can’t be gained easily. Lastly, when we only care the trend of development in a system, the quantitative knowledge is not as important as we have ever thought. Its value comes from the ability to specify natural types of incomplete knowledge of the world, and the ability to derive a probably complete set of possible behaviors in spite of the incompleteness of the model. To sum up, though qualitative technique has restriction in application due to their ambiguity, it can be useful in predicting the trend and probability of changes in a system, which may be time consuming using traditional quantitative methods. Our researches in the fields of virus entry based on qualitatively simulation and modeling are still underway, there are lots of work should be done further. For instance, the validations of the theoretical model, the combination of quantitative with qualitative knowledge, the formal representation of states, and so on, are our further work in this direction.
References 1. Rust, M.J., Lakadamyali, M., et al.: Assembly of endocytic machinery around individual influenza viruses during viral entry. Nature structural & molecular biology 11(6), 567–573 (2004) 2. Phogat, S.K., Dimitrov, D.S.: Cell biology of virus entry: a review of selected topics from the 3rd International Frederick meeting. Biochimica et Biophysica Acta. 1614, 85–88 (2003) 3. ChenXi, S., Hailin, F., Yue, W., Jinfeng, F.: Qualitative prediction in Virus entry based on feedback. In: Proceeding of Asia Simulation Conference, vol. 2, pp. 1343–1347 (2005) 4. Schmidt, B.: The Modelling of Human Behaviour. SCS-Europe BVBA, Ghent, Belgium (2000) 5. Dales, s., Choppin, P.W.: Attachment and penetration of influenza virus. Virology 18, 489–493 (1962) 6. Courmashkin, R.R., Tyrrel, D.A.J.: Electron microscopic observations on the entry of influenza virus into susceptible cells. J. Gen. Virol. 24, 129–141 (1974) 7. Dales, S., Pons, M.W.: Penetration of influenza examined by means of virus aggregates. Virology 69, 278–286 (1976) 8. Patterson, S., Oxford, J.S., Dourmashkin, R.R.: Studies on the mechanism of influenza virus entry into cells. J. Gen. Viro. 43, 223–229 (1979) 9. Oxford, J.S., Galbraith, A.: Antiviral activity of amantadine: a review of laboratory and clinical data. Pharmacol. Ther. 11, 181–262 (1980) 10. Long, W.F., Olusanya, J.: Adamantanamine and early events following influenza virus infection. Arch. Gesamte VirusForsch 36, 18–22 (1972) 11. Skehel, J.J., Hay, A.J., Armstrong, J.A.: On the mechanism of inhibition of influenza virus replication by amantadine hydrochloride. J. Gen. Virol. 38, 97–110 (1977) 12. YunDe., H.: Molecule Virus. Science of Publishing Company (1990) 13. Matlin, K.S., Reggio, H., Helenius, A., Simons, K.: Infectious Entry Pathway of Influenza Virus in a Canine Kidney Cell Line. The Journal of Cell Biology 91, 601–613 (1981) 14. Kuipers, B.: Qualitative Reasoning. MIT Press, Cambridge, MA (1994)
AOC-by-Self-discovery Modeling and Simulation for HIV Chunxiao Zhao1,2 , Ning Zhong2,3 , and Ying Hao1 1
3
Department of Computer, Beijing University of Civil Engineering and Architecture, China
[email protected] 2 International WIC Institute, Beijing University of Technology, China Department of Information Engineering, Maebashi Institute of Technology, Japan
Abstract. Among HIV, immune cell and drug, exhibit interactions that are usually not well understood and as a result, cannot be accurately modeled. In this paper, Modeling by AOC is to understand the dynamics of HIV infection and treatment. The use of AOC-by-self-discovery modeling was investigated. AOC-by-self-discovery methods try to adjust the system parameters automatically. To demonstrate the effects of therapies, we design and implement a HIV Computational Lab prototype. HIV Computational Lab is an AOC-based simulation of HIV immune dynamics that is currently being developed in NetLogo. It allows researches to investigate dependencies various immune responses to HIV. The HIV Computational Lab provides a good tool to characterize the process of HIV infection and study HIV drug treatment.
1
Introduction
Since 1996, combination therapy using multiple anti-retroviral medications has been successfully used to suppress HIV activity, often below the limit of detection. This combination therapy is called Highly Active Antiretroviral Therapy (HAART), which consists of triple therapy including two nucleoside analogues and a protease inhibitor. There are several drugs now employed. The drugs are classified as three major categories. The first two, Nucleoside Reverse Transcriptase Inhibitors (NRTIs) and Non-Nucleoside Reverse Transcriptase Inhibitors (NNRTIs), act by interfering in the process in which HIV is integrated into the host cell genome, effectively preventing the infection of CD4+ T cells. The third category, Protease Inhibitors (PIS), act by interfering in the production of viruses by infected cells, causing the viruses to be unable to infect CD4+ T cells. The use of these drugs has increased the life expectancy of HIV patients in first-world countries. Although the progression of HIV infection into AIDS is controlled with anti-HIV drugs, the treatments with the anti-HIV drugs do not eradicate the disease, long-term non-progressor(LTNP) status is desired. LTNP is defined as a patient who has the virus, without symptoms of AIDS for more K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 462–469, 2007. c Springer-Verlag Berlin Heidelberg 2007
AOC-by-Self-discovery Modeling and Simulation for HIV
463
than seven years without treatment [1]. Recently, it is reported Structured Therapy Interruption (STI) can lead to LTNP [2]. STI is a treatment method that gives medication interruptions to reduce side effects or help the immune system control HIV [3]. Various administration schemes are used to improve patients’ lives and at the same time suppressing development of drug resistance, reduce evolution of new viral strains, minimize serious side effects, improve patient adherence and also reduce the costs of drugs [4]. Many modeling methods have been reported for HIV treatment [5,6,7,8,9]. Among HIV, immune cell and drug, exhibit interactions that are usually not well understood and as a result, can not be accurately modeled. In this paper, Modeling by AOC is to understand the dynamics of HIV infection and treatment. Based on above-mentioned thought, the work of this paper is shown as follows: (1) The use of AOC-by-self-discovery modeling was investigated. In MMASbased work, the authors employ AOC-by-prototyping method to simulate the HIV-immune dynamics. Because manual parameter setting may become long and tediousthe trial-and-error process of an AOC-by-prototyping algorithm in the MMAS is replaced by adjusting the system parameters automatically in this paper. (2) To demonstrate the effects of treatments, we design and implement a HIV Computational Lab prototype. It is an AOC-based model of HIV immune dynamics that is currently being developped in NetLogo. It allows researches to investigate dependencies various immune responses to HIV. The HIV Computational Lab provides a good tool to characterize the process of HIV infection and study HIV drug treatment. (3) The current result shows that the typical three-stage dynamics of HIV infection is basically reproduced in the NetLogo simulation and AOC algorithm is feasible. In addition, experiment results also mean the AOC algorithm has the strong robustness and good scalability in homogeneous entities situation.
2
Related Works
Many modeling methods have been reported for HIV treatment [5,6,7,8,9]. Mathematical models have proven valuable in understanding the dynamics of HIV-I infection. By comparing these models to data obtained from patients undergoing antiretroviral drug therapy, it has been possible to determine many quantitative features of the interaction between HIV and immune cells that are infected by the virus. In this paper we compare three popular modeling approaches: ODE Models, CA and MMAS Models. Ordinary differential equation (ODE) modeling is a traditional top-down method for analyzing systems with differential equations. Wodarz and Nowak introduced an ODE model which attempts to model HIV infection [2]. The action of HAART can be incorporated into the Wodarz-Nowak model [4].The model is
464
C. Zhao, N. Zhong, and Y. Hao
a five state nonlinear ordinary differential equation focusing on the cytotoxic-T cell response to HIV infection, as mediated by helper-T cells. ODE models simplify the interactions in the process of the immune response The models are insufficient to describe the two extreme time scales involved in HIV infection (days and decades), as well as the implicit spatial heterogeneity [11,12,13]. The ODE views the elements in the system as being homogeneous, and ignores the spatial structure of the biological system in the microscopic scale. This allows for extensive mathematical analysis and reduces the computational cost of simulation. ODE often fails to account for the large-scale emergence from local interactions and individual diversity. Cellular automaton is a discrete dynamical system that is often used to simulate some natural phenomena. It contains an n-dimensional lattice and identical cells. The cells on the sites of the lattice have variable states, which can change according to particular rules in the simulation. The CA system evolves over a succession of time steps. The values of all the sites in the lattice are updated synchronously in each time steps. Sites values are updated using a set of rules that take the values of the site and its neighboring sites into account. A matrix is created with specific element values (ingeter, real, symbols list). In it, every site in the two-dimensional CA grid represents a target cell for the HIV. R.M.Zorzenon dos Santose [14] reported a cellular automata approach to simulate three-phase patterns of human immunodeficiency virus (HIV) infection consisting of primary response, clinical latency and onset of acquired immunodeficiency syndrome (AIDS). In another study [10], authors employ non-uniform Cellular Automata (CA) to simulate drug treatment of HIV infection, where each cell can be in four states, namely healthy, infected-A1, infected-A2 or dead and each computational domain may contain different CA rules, in contrast to normal uniform CA models. They developed a non-uniform CA model to study the dynamics of drug therapy of HIV infection, which simulates four-phases (acute, chronic, drug treatment responds and onset of AIDS). Three different drug therapies (mono-therapy, combined drug therapy and highly active antiretroviral therapy HAART) can also be simulated in their model. The model for prediction of the temporal behavior of the immune system to drug therapy qualitatively corresponds to clinical data. In MMAS-based work, a hybrid massively multi-agent systems (MMAS) model is developed, and it incorporates the characteristics of CA and systemlevel mathematical equation modeling to simulate HIV-immune interaction dynamics [15]. The simulation based on the implemented MMAS discovers the dynamics of HIV evolution over different temporal and spatial scales, and reproduces the typical three-stage dynamics of HIV infection. In the approach, the mathematical equations are used within a single site, which keeps the spatial characteristics of the system and reduces the heavy computational costs of the CA model. In the MMAS model, there are three types of agents: HIV, T cells and O cells. System mathematical equations are adopted to simulate agents interactions within the site of a two dimensional lattice. The rules of the hybrid MMAS model are listed below.
AOC-by-Self-discovery Modeling and Simulation for HIV
Rules Rules Rules Rules Rules Rules Rules
3
465
1: The environment is an NN two-dimensional lattice that is circular. 2: T cell recognizing HIV and stimulating to reproduce itself. 3: HIV infecting and killing cells. 4: HIV mutation. 5: HIV and cells diffusion. 6: Cell natural creation by an organism. 7: Natural death of HIV and cells.
AOC Approach
Jiming Liu proposes autonomy oriented computing (AOC) as a paradigm to describe systems for solving hard computational problems and for characterizing the behaviors of a complex system [16]. AOC emphasizes the modeling of autonomy in the entities of a complex system and the self-organization of them in achieving a specific goal. They are intended to reconstruct, explain, and predict the behavior of such systems, which is hard to model or compute using top-down approaches. AOC starts from the smallest and simplest element of a complex system based on the following characteristics of the entities in the system. – Autonomous: System elements are rational individuals that will act independently. In other words, a central controller for directing and coordinating individual elements is absent; – Emergent: They exhibit complex behavior that is not present or pre-defined in the behavior of the autonomous entities within complex adaptive systems; – Adaptive: They often change their behaviors in response to changes in the environment in which they are situated; – Self-organized: They are able to organize the elements to achieve the above behaviors. There are three different approaches to AOC. They are AOC-by-fabrication, AOC-by-prototyping, and AOC-by-self-discovery. AOC-by-fabrication is intended to replicate certain self-organized behaviors observable in the real-world to form a general-purpose problem solver. AOC-by-prototyping is devoted to understanding the operating mechanism underlying a complex system to be modeled by simulating the observed behavior, through characterizing a society of autonomous entities. Usually, AOC-by-prototyping involves a trial-and-error process to eliminate the difference between a prototype and its natural counterpart. AOC-by-self-discovery concentrates on the automatic discovery of a solution. The trial-and-error process of an AOC-by-prototyping algorithm is replaced by autonomy in the system. In MMAS-based work, the authors employ AOC-by-prototyping method to simulate the HIV-immune dynamics. Because manual parameter setting may become long and tediousthe trial-and-error process of an AOC-by-prototyping algorithm in the MMAS is replaced by autonomy in this paper. In other words, AOC-by-self-discovery methods try to eliminate the repeated trial-and-error process that often comes with other AOC approaches by adjusting the system parameters automatically.
466
4
C. Zhao, N. Zhong, and Y. Hao
AOC-by-Self-discovery Modeling for HIV
Based on the model summarized above, we incorporate the drug therapy process into the MMAS model for drug therapy. In this model, we introduce drugs entities into the MMAS model. In addition, we add two rules to the above drug therapy model as follows. Append Rule 8: Drug is ’on’ to block HIV replication within cells by inhibiting either reverse transcriptase or the HIV protease. The state ’on’ corresponds to full treatment. Appended Rule 9: Drug is ’off’. The state ’off’ corresponds to no treatment to simulate the treatment method of STI. The rest of rules don’t change. NetLogo is a multi-agent programming language and simulating platform. In the NetLogo platform, there is a software tool called Behavior Space [17] that allows user to perform experiments with models. It runs a model many times, systematically varying the model’s settings and recording the results of each model run. It lets you explore the model’s ”space” of possible behaviors and determine which combinations of settings cause the behaviors of interest. Based on above MMAS modelwe propose extending MMAS model by using Behavior Space model in NetLogo and adjusting the system parameters automatically. The rules of the modified MMAS model are listed below. Rules Rules Rules Rules Rules Rules Rules Rules Rules
5
1: The environment is an NN two-dimensional lattice that is circular. 2: T cell recognizing HIV and stimulating to reproduce itself. 3: HIV infecting and killing cells. 4: HIV mutation. 5: HIV and cells diffusion. 6: Cell natural creation by an organism. 7: Natural death of HIV and cells. 8: Drug is ’on’ according to the condition. 9: Drug is ’off’ according to the condition.
Simulation Results
The common pattern of HIV dynamics in infected patients is the three-stage dynamics of HIV infection, which has been proven by a mass of experimental data and now accepted by many researchers. The three-stage dynamics consist of the primary response, the clinical latency and the onset of AIDS. In the model, HIV and immune cells are expressed by entities of different types. A two-dimensional grid provides an environment for HIV and the immune cells to live and interact. In the NetLogo platform, the Behavior Space tool allows to explore the parameter space automatically and systematically. This space is a Cartesian product of values that each parameter can take. In this paper, we use
AOC-by-Self-discovery Modeling and Simulation for HIV
467
Fig. 1. The part result of the experiment
Fig. 2. Simulation time for different immune cell
NetLogo to simulate the AOC model. The HIV Computational Lab interface and the partial result of the experiment are as shown in Fig. 1. The left part of Fig. 1 represents the temporal emergence of dynamics between the immune cell and HIV. It can be shown that the typical three-stage dynamics of HIV infection is well reproduced in the simulation, which means that the temporal pattern emerges from the local interactions between HIV and the immune system. In Fig. 1, the HIV population increases and decreases sharply in the first, which account for the primary immune response. After the primary response there is a long period in which the virus population keeps stable and increases slowly. However, HIV and the immune cells are still active, with very high death and reproduction rates. Besides the temporal dynamics of the HIV and immune cell population, HIVand-immune-cells dynamics in physical space is also very important for understanding HIV infection and HIV treatment. The interactions among the three types of agents are shown in the right part of Fig. 1. They are T cells (immune cells), O cells (other cells) and HIV.
468
C. Zhao, N. Zhong, and Y. Hao
Fig. 3. Reliability for different immune cell population
Fig. 2 shows simulation time of different immune cell population, parameter n expresses the different immune cell number. Along with the n increase, the simulation time variation is small; this shows that AOC algorithm is feasible. Fig. 3 shows the reliability for different immune cell population. The reliability defines as the ratio between the number of successful experiments and the total number of experiments. The simulation result had indicated the algorithm validity. The reason is that the AOC algorithm is one kind of probability algorithm, does not have the common control constraint condition. Because the immune cells or the HIV virus’s death does not influence modeling, the AOC algorithm has the strong robustness. In addition, increases along with n, the success ratio improves. This means the AOC method has good scalability. Fig. 3 presents a homogeneous entities situation. The algorithm time complexity analysis is as follows. Supposes the system running time is O(1) in each site, each time step, then the system runs n generation, each generation runs n time step and time complexity is O(n2 ).If we only consider k discrete domains parameter, time complexity by the automatic control parameter is O(nk+2 ).
6
Conclusion and Future Works
In some situations, selecting the different parameter value will cause the completely different result in the simulation process. Manual exploration for parameter space will be complex. This paper’s proposal of the automatic exploration technology will be suitable. However, the algorithm proposed is only suitable for the parameter for the discrete value situation. When we have too many real value parameters, the parameter space turns to be infinite and the system exploration becomes impossible. This is a question that we should explore further.
Acknowledgements We would like to acknowledge the support of the following research grant: the National Natural Science Foundation of China under grant 60642003.
AOC-by-Self-discovery Modeling and Simulation for HIV
469
References 1. Perelson, A.S., Neumann, A.U., Markowitz, M., Leonard, J.M., Ho, D.D.: HIV-1 dynamics in-vivo-virion clearance rate, infected cell life-span, and viral generation time. Science 271, 1582–1586 (1996) 2. Wodarz, M.A.: Nowak: Specific therapy regimes could lead to long-term immunological control of HIV. PNAS 96, 14464–14469 (1999) 3. Treatment Interruptions, factsheets.: AIDS.org. (2003), http://www.aids.or,g/factSheets/406-Treatment-Iniermptiomhtml 4. Zurakowski, R., Messina, M.J., Tuna, S.E., Andrew, R.: Tee1 HIV treatment scheduling via robust nonlinear model predictive control. In: The 5th Asian Control Conference, vol.1, pp. 25–32 (2004) 5. Nowak, M.A., Bangham, C.R.: Population dynamics of immune responses to persistent viruses. Science 272, 74–79 (1996) 6. Bonhoeffer, S., May, R.M., Shaw, G.M., Nowak, M.A.: Virus dynamics and drug therapy. Proc. Natn. Acad. Sci. 94, 6971–6976 (1997) 7. Wei, X.P., et al.: Viral dynamics in human-immunodeficiency-virus type-1 infection. Nature 373, 117–122 (1995) 8. Ho, D.D., Neumann, A.U., Perelson, A.S., Chen, W., Leonard, J.M., Markowitz, M.: Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature 373, 123–126 (1995) 9. Morel, P.A.: Mathematical modeling of immunological reactions. Frontiers in Bioscience 3, 338–347 (1998) 10. Sloot, P., Chen, F., Boucher, C.: Cellular automata model of drug therapy for HIV infection. In: Bandini, S., Chopard, B., Tomassini, M. (eds.) ACRI 2002. LNCS, vol. 2493, pp. 282–293. Springer, Heidelberg (2002) 11. Kirschner, D.E., Webb, G.F.: A mathematical model of combined drug therapy of HIV infection. Theoret. Med. 25–34 (1997) 12. Kirschner, D.E., Webb, G.F.: Understanding drug resistance for mono-therapy treatment of HIV infection. Bull. Math. Biol. 185–763 (1997) 13. Verotta, D., Schaedeli, F.: MNon-linear dynamics models characterizing long - term virological data from aids clinical trials. Math. Biosci. 1–21 (2002) 14. dos Santos, R.M.Z., Coutinho, S.: Dynamics of HIV infection: A cellular automata approach. Phys. Rev. Lett. 87(16), 168102–168114 (2001) 15. Zhang, S., Liu, J.: A massively multi-agent system for discovering HIV-immune interaction dynamics. In: Ishida, T., Gasser, L., Nakashima, H. (eds.) Massively Multi-Agent Systems I. LNCS (LNAI), vol. 3446, pp. 161–173. Springer, Heidelberg (2005) 16. Liu, J., Jin, X.L, Tsui, K.C.: Autonomy Oriented Computing (AOC): Formulating Computational Systems with Autonomous Components. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans 35, 879–902 (2005) 17. Wilensky, U.: Modeling Nature’ s Emergent Patterns with Multi-agent Languages. Proceedings of EuroLogo, 1–6 (2002)
A Simulation Study on the Encoding Mechanism of Retinal Ganglion Cell Chao-Feng Cai, Pei-Ji Liang, and Pu-Ming Zhang Department of Biomedical Engineering, Shanghai Jiao Tong University, 800 Dong-Chuan Road, 200240 Shanghai, China {ccf1980,pjliang,pmzhang}@sjtu.edu.cn
Abstract. Understanding how the retina encodes visual information is a key issue for the development of a retinal prosthesis. To study this issue, the neural retina is modeled as a retina module (RM) consisted of an ensemble of spatialtemporal (ST) filters and each ST filter simulates the input-output property of an individual ganglion cell (GC). Two receptive field (RF) models of retinal GC, the difference of Gaussians (DOG) model and the disinhibition (DIS) model, are employed to implement these ST filters respectively. RM performs the encoding operation from an input optical pattern to a group of parallel action potential (AP) trains. To assess the encoding efficiency of RF models, a central visual system module (VM) consisted of a group of artificial neural networks is employed to perform the decoding operation from AP trains to an output perceptual pattern. A matching error is defined as an index to quantify the similarity between the input optical pattern and the output perceptual pattern generated by VM. The simulation results suggest that the matching error declines dramatically when the DOG model is replaced by the DIS model, which implies that the encoding mechanism of the DIS model might be more effective than that of the DOG model.
1 Introduction To develop a retinal prosthesis which might someday restore partial vision to patients suffering from retinal degeneration has absorbed increasing interests of researchers in recent years [1]. At present, there are at least 18 independent research groups worldwide projecting epiretinal, subretinal, intrapapillary, suprachoroidal, extraocular or complex prosthesis design [2]. No matter what kind of retinal prosthesis is considered, an inevitable question to be solved is how to design a prosthetic device that can replace the degenerated retina to process and encode visual information. However, although the retina is a well-researched part of the visual system, its encoding mechanism still partially remains a mystery. Hence, understanding how the retina encodes the dynamic nature of visual scenes and thus generating action potential (AP) trains that match those produced in the healthy retina become the foremost challenge towards the development of a functional retinal prosthesis [3]. Enough is known about retinal anatomy and physiology, which allows the construction of a computing model to study the encoding mechanism of the retina. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 470–479, 2007. © Springer-Verlag Berlin Heidelberg 2007
A Simulation Study on the Encoding Mechanism of Retinal Ganglion Cell
471
Based on identified synaptic interactions and local microcircuits, Zaghloul and Boahen constructed a model for the retinal circuitry and then morphed it into a silicon chip by replacing each synapse or gap junction in the model with a transistor [4]. This silicon retina can approximate the behavior of the neural retina in both linear response and nonlinear adaptations. Cottaris and Elfar developed a retina model that includes all major retinal cell types and the corresponding interconnections among them to characterize the spatiotemporal activation of the retina circuitry during the electrical stimulation period [5]. Their simulation results shows that during the electrical stimulation period the activation of retinal ganglion cells (GC) is governed mainly by the electric field imposed by the stimulating electrode, which results in the indiscriminate excitation of ON and OFF retinal ganglion cells. This finding helps to design the image encoding strategies for the retinal prosthesis. Kenyon proposed a retina model that consists of five distinct cell types: bipolar cells, three classes of amacrine cells, corresponding to the small, large, and polyaxonal subtypes, and ganglion cell [3]. This model captures essential nonlinear property of the neural retina and offers good quantitative agreement between theory and experiment. Their simulation results suggest that synchronous oscillations between retinal ganglion cells are able to encode visual information. Eckmiller proposed a visual system model that consists of a tunable retina module (RM) and a central visual system module (VM), and this simulated visual system offers a clear view of the three pattern or signal domains (physical, neural and perceptual) [6, 7, 8]. RM performs the encoding operation from an input optical pattern in the physical domain to an ensemble of parallel AP trains in the neural domain, and VM performs the decoding operation from these parallel AP trains to a perceptual pattern in the perception domain, as shown in Fig.1. Although RM is particularly simple compared with the retinal models mentioned above because only the retinal GC is considered , it still describes the basic information processing property of the retina. VM generates a corresponding output image featuring the reservation of edge and contour to represent the perceptual pattern. The framework of that simulated visual system is employed to investigate the encoding mechanism of the retinal GC in this paper.
Fig. 1. Schematic illustration of the simulated visual system (Redrawn from [8])
In Eckmiller’s simulated visual system, RM consists of an ensemble of spatialtemporal (ST) filters. These ST filters are implemented by the classical difference of Gaussians (DOG) model to simulate the input-output property of retinal GC with antagonistic center/surround receptive field (RF) structure. The DOG model is often
472
C.-F. Cai, P.-J Liang, and P.-M. Zhang
considered as a spatial band-pass filter [9]. When tested with a real image, the low spatial frequency components are filtered thoroughly because of the antagonistic interaction between center and surround regions of RF, which results in the enhancement of edge property and the attenuation of area luminance contrast in the output image [10]. On the other hand, it has long been reported that there exits an extensive disinhibitory region beyond the classical RF of retinal GC [11], the DOG model obviously fails to explain this phenomenon. Another RF model of retinal GC, the disinhibition (DIS) model proposed by Qiu [10], explains the mechanism of this phenomenon well. Furthermore, the DIS model not only enhances the edge property but also efficiently recover the area luminance contrast which is attenuated by the DOG model when tested with a real image [10]. Hence, the DIS model is also employed to implement ST filters in the simulation study.
2 Methods 2.1 RM The essential information processing properties of the retina are simulated by considering the retina as an array of independent ST filters [6, 7, 8], as illustrated in Fig. 2. Each ST filter simulates the spatial and temporal RF properties of an individual retinal GC and receives light information within its RF region in the photoreceptor array. These ST filters jointly simulate the function of the neural retina to encode the visual information into the form of AP trains.
Fig. 2. Schematic illustration of RM (Redrawn from [8]). Each ST filter in RM simulates one retinal GC.
An image with 320×320 pixels is generated by a computer program module to represent the optical pattern. Each pixel in the optical pattern corresponds to one photoreceptor in the photoreceptor array. A parameter vector is predefined for each ST filter to depict the temporal and spatial properties of the simulated GC. Neighboring RF may overlap each other, which is consistent with the neurobiological findings. To compare the encoding efficiency of the DOG model and the DIS model, ST filters are implemented by two RF models respectively in the simulation study.
A Simulation Study on the Encoding Mechanism of Retinal Ganglion Cell
473
A DOG Model Experiments showed that the sensitivity profiles across center and surround regions of RF (classical RF) of a retinal GC may be fit with two Gaussian functions: a narrow Gaussian function of high amplitude describes the center, and a boarder Gaussian function of lower amplitude describes the surround. The two regions are assumed to have circular profiles, to be concentric, and to have opposite polarities [12, 13]. Let I(x,y,t) be the light intensity of an optical pattern that falls on the retina photoreceptor array and R(t) the original response signal of the simulated GC. The simulation of a retinal GC can be achieved by two distinct pathways, one for the center computation and the other for the surround. Each pathway performs a spatial scalar product of the visual stimulus data and a rotationally symmetric 2-dimension Gaussian function [6]. Then the response of the center and the surround are:
Rc (t ) = ∫
+∞ +∞
∫
−∞ −∞
Rs (t ) = ∫
+∞ +∞
∫
−∞ −∞
I ( x, y, t ) Ac
I ( x, y, t ) As
1 2πσ c2 1 2πσ s2
exp(−
x2 + y2 )dxdy 2σ c2
(1)
exp(−
x2 + y2 )dxdy 2σ s2
(2)
where Ac and As represent the peak sensitivities of Gaussian function corresponding to the center and surround respectively, σc and σs represent the standard deviations of Gaussian function corresponding to the center and surround respectively [10]. Since Rs(t) is generally delayed in relation to Rc(t) [9], a surround time delay d should be considered when they are converged to obtain R(t):
R(t ) = max{Rc (t ) − Rs (t − d ), 0}
(3)
maximization function is applied because the firing rate is always positive [10]. Since R(t) varies over a small range, it is rectified to obtain the firing rate as follows:
F (t ) = R(t )
F peak − F0 R peak
+ F0
(4)
where Fpeak is the permitted peak firing rate, F0 is the resting firing rate and Rpeak is the peak value of R(t) of all simulated GCs in RM. Although the DOG model ignores the nonlinear transduction and adaptation which occur at earlier stages of retinal processing, it still describes many of the actual properties of retinal GC. Fig. 4(b) shows the spatial frequency transfer property of the DOG model tested with a real image. From the figure we can see that, as a band-pass filter, the DOG model filters the low spatial frequency components of the image, and results in the enhancement of edge property and the attenuation of area luminance contrast information. B DIS Model The DIS model was proposed to explain the fact that there exists an extensive disinhibitory region beyond the classical RF of retina GC [10]. In this model, RF
474
C.-F. Cai, P.-J Liang, and P.-M. Zhang
(extensive RF) is composed of a small center and an extensive disinhibitory surround. The extensive surround is not a connected region but consists of many mutually inhibitive subunits, which partially counteracts the inhibition of the surround to the center, as illustrated in Fig.3.
Fig. 3. Diagram of the DIS model (Redrawn from [10]). Inhibitory interactions (bipolar arrows) among the subunits (gray disk) and the direct inhibition (uni-polar arrows) from the subunits to the center (black disk). The thickness of the arrows represents different inhibition strength.
Based on the DIS model, the linear sum of inhibitory interaction that subunit (x, y) undergone from other subunits is:
II ( x, y, t ) = ∫
+ ∞ +∞
∫
−∞ −∞
I ( x + m, y + n, t ) Ass
1 2πσ ss2
exp(−
m2 + n2 ) dmdn 2σ ss2
(5)
where Ass represents the integrated sensitivity of the inhibitory interaction and σss represents the corresponding standard deviation. Then the response of the surround is:
Rs (t ) = ∫
⎧ ⎫ 1 x2 + y2 max ⎨ As exp(− )[I ( x, y, t ) − II ( x, y, t )], 0⎬dxdy 2 2 −∞ 2σ s ⎩ 2πσ s ⎭
+ ∞ +∞
−∞
∫
(6)
Finally, Rc(t) and Rs(t) are converged to obtain R(t) according to Eq. (3). Compared with the DOG model, the DIS model not only describes the basic RF properties of retinal GC, but also well explains the genesis of the extensive disinhibitory region beyond the classical RF [10]. Fig. 4(c) shows the spatial frequency transfer property of the DIS model, from where we can see that the DIS model can not only enhance the edge property, but also recover the area luminance contrast information ignored by the DOG model. C Inhomogeneous Poisson Process A stochastic model, the inhomogeneous Poisson process (IPP) model, is employed to generate AP trains at the form of 1-0 trains according to the firing rate [14], ‘1’ and ‘0’ correspond to whether the simulated GC fires or not. Both the absolute refractory period and the relative refractory period are not considered.
A Simulation Study on the Encoding Mechanism of Retinal Ganglion Cell
(a)
(b)
475
(c)
Fig. 4. (a) Original image. (b) Transferred image by the DOG model. (c) Transferred image by the DIS model.
2.2 VM To assess the encoding efficiency of RF models that are employed in RM, VM is constructed to perform the decoding operation from parallel AP trains in the neural domain into a perceptual pattern in the perception domain. In Eckmiller’s simulated visual system, VM is simulated by an artificial neural network (ANN) with 1280 inputs and 1024 outputs [8]. Obviously, the training processing of such a huge ANN is not easy on personal computer. To solve this problem, the huge ANN is decomposed to a group of simple ANNs and each ANN reconstructs one pixel in the perceptual pattern, as illustrated in Fig. 5(a). When the AP trains generated by RM are fed to VM, a perceptual pattern (also represented by an image) is produced. Let NANN be the number of ANNs in VM, then the resolution of the perceptual pattern are N ANN × N ANN .
(a)
(b)
Fig. 5. (a) Each ANN reestablishes the light intensity of only one pixel in the perceptual pattern and thus the number of ANNs (NANN) determines the resolution of the perceptual pattern. (b) Topological structure of one ANN in VM.
All the ANNs in VM have the same topological structure (feed-forward 3-layer network with full connectivity), as illustrated in Fig. 5(b). Let NGC be the number of simulated GCs in RM, then the number of neurons in input layer is NGC and each neuron receives spikes from one GC’s AP train. Then the input to every ANN is a
476
C.-F. Cai, P.-J Liang, and P.-M. Zhang
common NGC-dimension vector which is extracted from NGC AP trains. The number of neurons in output layer is 3, so the output is a 3-dimension vector to represent the light intensity level of one pixel in the perceptual pattern. The number of neurons in hidden layer is empirically set to 50. Weights and bias of these ANNs are initialized to random values between -1 and 1 and the activation function is sigmoid function. These ANNs are trained one by one using back-propagation algorithm.
3 Results The optical pattern that falls on the retinal photoreceptor layer is produced by a computer program module (Fig. 6(a)). After all of the ANNs have been trained, VM generates an output perceptual pattern according to the AP trains generated by RM. When the ST filters are implemented by the DOG model, VM generates an output perceptual pattern with enhanced edge property and attenuated area luminance contrast (Fig. 6(b)-(c)). However, when the ST filters are implemented by the DIS model, VM generates an output perceptual pattern with not only enhanced edge property but also reserved area luminance contrast (Fig. 6(d)-(e)). The resolution of the output perceptual pattern can be changed by adjusting NANN. For example, when NANN is 1024, the resolution of output perceptual pattern is 32×32 (Fig. 6(b) and (d)); when NANN is 4096, the resolution of output perceptual pattern is 64×64 (Fig. 6(c) and (e)). To assess the encoding efficiency of VM, a matching error is defined as follows: T
Err = ∑ t =1
N ANN
N ANN
∑ ∑[ P(i, j, t ) − Q(i, j, t )] i
2
/ N ANN
(7)
j
where P is the transferred optical pattern by the DOG model or the DIS model, Q is the output perceptual pattern decoded by VM, T is the length of the sequence of input optical pattern. Err reflects the similarity degree between the theoretical optimal perceptual pattern and the output perceptual pattern generated by VM. When the resolution of the output perceptual pattern is 32×32, Err declines dramatically as the increase of the peak firing rate, which suggests that AP trains with
(a)
(b)
(c)
(d)
(e)
Fig. 6. (a) The input optical pattern. (b) The output perceptual pattern of the DOG model, with 32×32 resolution. (c) The output perceptual pattern of the DOG model, with 64×64 resolution. (d) The output perceptual pattern of the DIS model, with 32×32 resolution. (e) The output perceptual pattern of the DIS model, with 64×64 resolution.
A Simulation Study on the Encoding Mechanism of Retinal Ganglion Cell
477
more APs are always informative. Furthermore, Err of the DIS model is always lower than that of the DOG model, which implies that VM can extract more information from AP trains generated by RM in which the ST filters are implemented by the DIS model (Fig. 7(a)). When the resolution of the output perceptual pattern is changed to 64×64, same conclusions can be found from Fig. 7(b). Those results imply that the encoding efficiency of the DIS model might be more effective than that of the DOG model.
(a)
(b)
Fig. 7. (a) Err versus the peak firing rate plotted for the DOG model and the DIS model with 32×32 resolution. (b) Err versus the peak firing rate plotted for the DOG model and the DIS model with 64×64 resolution.
4 Discussions The simulated visual system discussed in this paper is particularly simple compared with the physiological visual system, because it consists of only two modules (RM and VM). However, it offers a way to assess the encoding efficiency of the different RF models of retinal GC, which maybe favors the design of encoding algorithm for the development of a retinal prosthesis. Simulation results suggest that when the DOG model is replaced by the DIS model to implement ST filters in RM, VM always generate a more accurate perceptual pattern. In contrast to the classical DOG model, the DIS model not only depicts the basic input-output property of retinal GC but also introduces the effect of the extensive disinhibitory region beyond the classical RF. However, owing to the intrinsic transfer property of the DOG model and the DIS model, the encoding process performed by RM is a lossy one, i.e., some information of the input optical pattern will be missed during the course of encoding. Thus, it is impossible for VM to generate a perceptual pattern which is identical to the origin input optical pattern. In the simulated visual system, RM is composed of an ensemble of ST filters and these ST filters simulate the function of retinal output neurons (GCs). Other types of neurons in the retina (e.g., photoreceptors, horizontal cells, bipolar cells and amacrine cells) and the corresponding interconnections among them are all neglected. On the other hand, when the IPP model is employed to simulate the firing activity of retinal GC, both the absolute refractory period and the relative refractory period are not considered, which is not consistent with the fact that the firing probability of retinal GC does depend on the history of previous APs [14]. So RM fails to reflect several information processing properties of the physiological retina.
478
C.-F. Cai, P.-J Liang, and P.-M. Zhang
Due to the mutually independent information processing of the ST filters in RM, firing correlations among the retinal GCs are not involved in our simulation study. However, it has been reported that retinal GCs often fire in significant patterns of concerted activity that cannot be derived from any single-neuron description [15]. Theoretical arguments also indicate that firing correlations themselves may encode relevant visual scenes and increase the overall information content represented by the retinal GCs’ firing activity [16]. Adaptation is another important information processing property of the retina that is not involved in our simulation study. The biological significance of adaptation is that retinal neurons have limited signaling ranges but must encode visual stimulus intensities that vary over much wider ranges. Thus the existence of adaptation allows the retina to adjust its sensitivity and activity in facing the varying visual scenes [17]. A new retina model that includes all basic retinal cell types and the corresponding interconnections among them will be constructed according to the identified synaptic interactions and local microcircuits in the future work. By modulating the synaptic strength among the cells in the model, the information processing properties of the physiological retina that has been omitted in the current simulation study may be captured.
Acknowledgements This work was supported by grants from the National Natural Science Foundation of China (No. 30400088) and the Hi-Tech Research and Development Program of China (2006AA01Z125).
References 1. Zrenner, E.: Will Retinal Implants Restore Vision. Science 295(5557), 1022–1025 (2002) 2. Gerding, H.: A New Approach towards a Minimal Invasive Retina Implant. Journal of Neural Engineering 4, 30–37 (2007) 3. Kenyon, G.T., George, J., Travis, B., Blagoev, K.: Models of the Retina with Application to the Design of a Visual Prosthesis. Los Alamos Science 29, 110–123 (2005) 4. Zaghloul, K.A., Boahen, K.: A Silicon Retina that Reproduces Signals in the Optic Nerve. Journal of Neural Engineering 3, 256–267 (2006) 5. Cottaris, N.P., Elfar, S.D.: How the Retinal Network Reacts to Epiretinal Stimulation to Form the Prosthetic Visual Input to the Cortex. Journal of Neural Engineering 2, 74–90 (2005) 6. Becker, M., Eckmiller, R., Hünermann, R.: Psychophysical Test of a Tunable Retina Encoder for Retina Implants. In: Proceedings of the International Joint Conference on Neural networks, vol.1, pp. 192–195 (1999) 7. Eckmiller, R., Hünermann, R., Becker, M.: Exploration of a Dialog-based Tunable Retina Encoder for Retina Implants. Neurocomputing 26(27), 1005–1011 (1999) 8. Eckmiller, R., Neumann, D., Baruth, O.: Tunable Retina Encoders for Retina Implants: Why and How. Journal of Neural Engineering 2, 91–104 (2005) 9. Fleet, D.J., Hallett, P.E., Jepson, A.D.: Spatiotemporal Inseparability in Early Visual Processing. Biological Cybernetics 52(3), 153–164 (1985)
A Simulation Study on the Encoding Mechanism of Retinal Ganglion Cell
479
10. Qiu, F.-T., Li, C.-Y.: Mathematical Simulation of Disinhibitory Properties of Concentric Receptive Field. Acta Biophysica Sinica 11(2), 214–220 (1995) 11. Ikeda, H., Wright, M.J.: The Outer Disinhibitory Surround of the Retinal Ganglion Cell Receptive Field. Journal of Physiology 226(2), 511–544 (1972) 12. Croner, L.J., Kaplan, E.: Receptive Field of P and M Ganglion Cells across the Primate Retina. Vision Research 35(1), 7–24 (1995) 13. Troy, J.B., Shou, T.: The Receptive Fields of Cat Retinal Ganglion Cells in Physiological and Pathological States: Where We Are after Half a Century of Research. Progress in Retinal and Eye Research 21(3), 263–302 (2002) 14. Berry II, M.J., Meister, M.: Refractoriness and Neural Precision. The Journal of Neuroscience 18(6), 2200–2211 (1998) 15. Meister, M., Berry II, M.J.: The Neural Code of the Retina. Neuron 22(3), 435–450 (1999) 16. Kenyon, G.T., Theiler, J., George, J.S., Travis, B.J., Marshak, D.W.: Correlated Firing Improves 14. Stimulus Discrimination in a Retinal Model. Neural Computation 16(11), 2261–2291 (2004) 17. Chen, A.-H., Zhou, Y., Gong, H.-Q., Liang, P.-J.: Luminance Adaptation Increased the Contrast Sensitivity of Retinal Ganglion Cells. NeuroReport 16(4), 371–375 (2005)
Modelling the MAPK Signalling Pathway Using a Two-Stage Identification Algorithm Padhraig Gormley, Kang Li, and George W. Irwin School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Stranmillis Road, Belfast BT9 5AH, UK {pgormley02,k.li,g.irwin}@qub.ac.uk http://www.ee.qub.ac.uk/isac
Abstract. Signal transduction pathways describe the dynamics of cellular response to input signalling molecules at receptors on the cell membrane. The Mitogen-Activated Protein Kinase (MAPK) cascade is one of such pathways that are involved in many important cellular processes including cell growth and proliferation. This paper describes a black-box model of this pathway created using an advanced two-stage identification algorithm. Identification allows us to capture the unique features and dynamics of the pathway and also opens up the possibility of regulatory control design. In the approach described, an optimal model is obtained by performing model subset selection in two stages, where the terms are first determined by a forward selection method and then modified using a backward selection model refinement. The simulation results demonstrate that the model selected using the two-stage algorithm performs better than with the forward selection method alone. Keywords: Systems biology, system identification, MAPK, signal transduction, structure selection, orthogonal least squares, iterative subset selection.
1
Introduction
Signal transduction pathways describe the response of a cell when it detects the binding of extracellular signalling molecules to receptor proteins at the surface of the cell membrane. The binding process results in conformational changes in the part of the receptor that is below the surface, which in turn triggers the activation of a cascade of intracellular signalling proteins. Finally, at the end of the cascade, the terminal signalling protein activates target proteins which alter the behaviour of the cell, for example, by regulating the expression of certain genes, by altering cell shape (by cytoskeletal proteins) or by changing cell metabolism [1]. An important intracellular signalling pathway that is involved in producing many different cellular responses, including cell growth and proliferation, is the Mitogen-Activated Protein Kinase (MAPK) cascade [2,3]. This is a 3-tiered cascade where the kinase at each level is activated through dual phosphorylation at 2 amino acid sites by the activated kinase of the previous level. The activated K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 480–491, 2007. c Springer-Verlag Berlin Heidelberg 2007
Modelling the MAPK Signalling Pathway
481
kinase at the terminal level proceeds to activate proteins that elicit a certain response from the cell such as growth factors or molecules in the nucleus that promote or inhibit gene expression. These complex molecular interactions of the MAPK pathway mean that it is a highly nonlinear process. As such, identification of this system could be important in order to capture the unique features and dynamics of the underlying biological process and also to open up the possibility of regulatory control design. The MAPK pathway has been widely studied in the literature [3,4,5,6,7], whereby models have been derived from chemical rate equations in order to perform analysis of the molecular dynamics. These types of white-box models are perfectly feasible when the number of interacting molecules in the pathway are relatively small (such as in the case of MAPK). However, in other signal transduction networks the number of interactions can become incredibly large, resulting in the model becoming too complex to analyse and even impossible to solve. Therefore the work described here takes a different approach by investigating black-box identification of the MAPK signal transduction pathway using a linear-in-the-parameters model. This type of model uses a linear combination of model terms or basis functions and is very popular throughout the literature where it has been applied to modelling a wide range of nonlinear dynamic systems (e.g. [8,9,10,11,12,13]). An important part of building a linear-in-the-parameters model is the subset selection(see [14,15,16,17,12]), where a model structure is generated by selecting the most significant model terms from the entire pool of candidate terms. The number of terms selected is determined by the parsimonious principle [18,19], which selects the smallest possible model structure that can explain the data. Unfortunately, when modelling nonlinear dynamic systems, one of the problems with performing subset selection is that the size of the term pool can be extremely large [20,11,9] and therefore it is too computationally expensive to search for an optimal model [20]. Traditionally the most common subset approaches are based on forward selection methods [11,21,22], where each new term is selected one by one, depending on which term provides the greatest contribution to minimising the cost function. Undoubtedly the most popular forward selection method is Orthogonal Least Squares (OLS) (see [8,21,23]) which uses modified Gram-Schmidt (MGS) orthogonalization to perform the subset selection. This method allows each term’s contribution to the cost function to be computed without explicitly solving the least-squares and therefore significantly reduces the computational complexity. However, a major disadvantage to this and other forward selection methods is that the obtained model structure is not optimal [24]. Attempts at solving this problem have involved modifying the OLS algorithm with genetic search procedures in order to search for the optimal model [20]. Unfortunately, the genetic search is not guaranteed to find the global optimum as these algorithms can suffer from slow and premature convergence [25,26], moreover, the computational complexity to perform such approach can be very high.
482
P. Gormley, K. Li, and G.W. Irwin
An alternative solution has been proposed in [12], where an optimal model is achieved through an iterative two-stage approach. At the first stage an initial model is generated using a fast recursive algorithm derived in [11]. At the second stage a backwards selection algorithm refines the model by replacing any insignificant terms with a more significant candidate from the remaining pool of terms. This approach to subset selection has advantages over the previous methods as it not only obtains the optimal model structure, but it also performs the structure selection within one analytic framework leading to increased computational efficiency and model compactness. Thus, the work described in this paper applies the two-stage algorithm outlined in [12] to the identification of the nonlinear dynamics in the MAPK signalling pathway. The main objective is to obtain an optimal model structure that is both effective and efficient with an improved performance. The remaining sections of this paper are organised as follows. Section 2 describes the main method used to select the optimal model structure. Section 3 provides some background to the MAPK signalling pathway and then the iterative subset selection method is applied to the identification of the MAPK cascade using a polynomial NARX structure. Finally, section 4 offers some conclusions.
2
The Method
The aim of the proposed identification method is to find an optimal model for the MAPK pathway that is both simple and computationally efficient, while still effectively capturing the nonlinear dynamics of the system. The approach can be separated into two stages, where an initial model is first generated using a fast recursive forward selection algorithm. This algorithm selects terms one by one, each time selecting the term that produces a maximum reduction in the cost function. The first stage continues selecting terms until a certain stop criterion has been reached, such as Akaike’s information criterion (AIC) [27] or the Minimum Description Length (MDL) [28]. Then the second stage takes the model structure generated by forward selection and replaces any insignificant terms with more significant ones from the remaining term pool resulting in an improved model performance. The two-stage identification algorithm used to perform the subset selection is only briefly described in the following subsections. A more detailed description of the algorithm and it’s derivation can be found in [11,12]. 2.1
Forward Subset Selection
This section briefly outlines the first stage of the identification method where the algorithm uses forward selection to generate an initial model. The model terms are selected one by one from a pool of candidates so that each time the cost function is reduced by the maximum amount. This procedure is iterated until k model terms are selected, where k is determined by the model structure selection criterion. To begin with, consider a general nonlinear dynamic system ([8,11,12])
Modelling the MAPK Signalling Pathway
y(t) = f (y(t − 1), .., y(t − ny ), u(t − 1), .., u(t − nu )) = f (x(t))
483
(1)
where u(t) and y(t) are the system input and output variables at time instant t, nu and ny are the corresponding maximal lags, x(t) represents the model ‘input’ vector, and f (·) is some unknown nonlinear function. Now in this case a polynomial NARX model will be used to represent system (1): M
y(t) =
θi ϕi (x(t)) + ε(t)
(2)
i=1
where ϕi (·), i = 1, ..., M are all candidate basis functions, and ε is the model residual sequence. And N data samples {x(t), y(t)}, t = 1, ..., N are used for model identification. Equation (2) can then be formulated as: y = ΦΘ + Ξ
(3)
where Φ = [ϕ1 , ..., ϕM ] ∈ N ×M with ϕi = [ϕi (x(1)), ..., ϕi (x(N ))]T ∈ N for i = 1, ..., M , yT = [y(1), ..., y(N )] ∈ N , Θ = [θ1 , ..., θM ]T ∈ M , and ΞT = [ε(t1 ), ..., ε(tN )] ∈ N . The model selection aims to select, say k, regressor terms, denoted as p1 , ..., pk , from all the candidates, ϕi (·), i = 1..., M (M is usually a very large number in nonlinear system identification), resulting in the linear-in-the-parameters model of the form y = Pk Θk + e
(4)
which best fits the data samples in the sense of least-squares, i.e. the sum squared-errors (SSE) is minimised J(Pk ) = =
min
Φk ∈Φ,Θk ∈k
min
Φk ∈Φ,Θk ∈k
{eT e}
{(y − Φk Θk )T (y − Φk Θk )}
(5)
where Φk is an N × k matrix composing of k columns from Φ, Θk denotes the corresponding regression coefficient vector, and the selected regression matrix Pk = [p1 , ..., pk ]
(6)
If the selected regression matrix Pk is of full column-rank, the least-squares estimation of the regression coefficients in (4) is given by Θk = (PTk Pk )−1 PTk y
(7)
Having selected k model terms, suppose that one more term is added into the model with the corresponding regressor term pk+1 , then the net reduction of the cost function due to adding this term is given by ΔJk+1 (pk+1 ) = J(Pk ) − J(Pk+1 )
(8)
484
P. Gormley, K. Li, and G.W. Irwin
Now if we re-define Φ = [Pk , CM−k ] CM−k = [φk+1 , · · · , φM ]
(9)
Then obviously the first k regressor terms in Φ (i.e. Pk ) correspond to the selected k terms, and the remaining M − k terms CM−k = [φk+1 , · · · , φM ] are candidates, forming the candidate pool CM−k . Now using (8)the contribution of all remaining candidate terms in Φ = {φ1 , · · · , φM } can be calculated and the one from CM−k which gives the maximum contribution is then selected as the (k + 1)th model term. For example, if the index j of the next most significant term is given by j = arg max {ΔJk+1 (φi )} k
(10)
then φj is selected as the (k+1)th model term, and re-denoted as pk+1 = φj , and the regression matrix of the selected model becomes Pk+1 = [Pk pk+1 ], while the candidate pool is reduced in size and becomes CM−k−1 , and remaining candidates in CM−k−1 are re-indexed as to φk+2 , · · · , φM . Thus, the full regression matrix Φ becomes Φ = [Pk+1 CM−k−1 ]. This forward selection procedure repeats itself until it is terminated once the desired number (say k) model terms has been reached or the cost function is reduced to a given level [29], or some information criterion such as the AIC or MDL begins to increase. Once the initial model has been constructed, the model can be refined using a backwards approach to replace insignificant model terms in the original structure. 2.2
Backward Model Refinement
The forward selection algorithm described in the previous section selects one new term each time it is iterated and adds this term to the model structure. At each iteration, it selects the term that produces a maximum reduction to the cost function. However, there is usually some correlation between the regressor terms. Therefore terms that are selected at later iterations of the algorithm may affect the contribution of previously selected terms. In other words, a previously selected term may once have provided a large contribution to reducing the cost function, but due to a newly introduced term, its contribution can suddenly become insignificant. This means that the model structure obtained through forward selection methods is not optimal [24]. Therefore at the second stage of this approach, all the previously selected model terms are reviewed, and the model is refined. Any insignificant terms are removed and/or replaced until an optimal model is achieved for a given selection criterion. Assuming an initial model has been generated using forward selection with a model size of n regressor terms. Then suppose a term, say pi , 1 ≤ i ≤ n, is to be reviewed. Its contribution to the error (SSE) reduction ΔJn (pi ) needs to be compared with that of the one in the pool of candidate terms that can give the maximum contribution among the candidate pool. Denote the maximum
Modelling the MAPK Signalling Pathway
485
candidate contribution as ΔJn (φj ), then the significance of a model term pi can be checked by identifying the maximum of the contribution of all the other candidates from ΔJn (φj ) = max{ΔJn (φs ), s = n + 1, · · · , M }
(11)
If ΔJn (φj ) > ΔJn (pi ), pi is said to be insignificant, and will be replaced with φj as the new regressor term, while pi is put back into the candidate pool, taking the position of φj in the candidate pool. This exchange of model terms will further reduce the error SSE by ΔJn (φj ) − ΔJn (pi ), which means that the model compactness is further improved and an optimal model structure can be obtained.
3
Black-Box Model Identification of the MAPK Cascade
To test the efficacy of the proposed subset selection approach, the two-stage algorithm will be applied to a model of the MAPK intracellular signalling pathway. The MAPK pathway is a 3-tiered protein kinase cascade, where the kinase at each level catalyzes the phosphorylation of a phosphate group onto the protein at the level immediately below (see Fig. 1). Extracellular signalling molecules that bind to cell surface receptors can relay signals to target gene regulatory proteins by activating the terminal member of the cascade: MAPK. MAPK is activated by a MAPK kinase (MKK) that phosphorylates 2 sites of tyrosine and threonine residues on the MAPK amino acid sequence. MKK at the 2nd level of the cascade is activated by a MAPK kinase kinase (MKKK) that performs dual phosphorylation at sites of serine and threonine residues on the MKK molecules. The kinase at the top level of the cascade (MKKK) can be activated by several processes including ligand binding of cell surface receptor proteins. At each level of the cascade, protein phosphatases can inactivate the kinase by removing the phosphate group from either one of the 2 sites on the amino acid sequence [5]. The MAPK signalling pathway has been studied extensively in the literature. The most prevalent method of modelling the pathway has been to derive a whitebox model from chemical rate equations and then perform estimation of the model parameters by in vitro/in vivo experiments. These modelling techniques have uncovered some interesting properties of the MAPK cascade. For example [4] showed that ultrasensitivity is an inherent property of the 3-tiered structure of the cascade whereby a graded input is transformed into a switch-like output of activated/inactivated MAPK. Furthermore, [5] discovered that the addition of a negative feedback loop from the output to input layer of the cascade could produce sustained oscillations in the concentrations of molecules at each level of the cascade. Later work [6,7] has revealed other interesting properties such as bistability and the prevention of crosstalk between related signalling pathways through the use of scaffold proteins to bind the three molecule components of
486
P. Gormley, K. Li, and G.W. Irwin
Fig. 1. Kinetic pathway diagram of the MAPK cascade. The single and dual phosphorylation of each molecule is represented by the addition of a ‘-P’ and ‘-PP’ respectively to the name of the kinase, where MAPK-PP represents the output activated form of the kinase. Ras (or MKKKK) is the input protein that triggers the activation of the kinase at the top level of the cascade.
the cascade. Further modelling of this important signalling pathway could yet reveal previously unknown properties of the cascade. 3.1
Simulation of the MAPK Cascade
To create a black-box model of the MAPK cascade, a set of input-output data is required to perform model estimation and validation. In order to obtain a Table 1. Kinetic rate equations for the concentrations of each of the 8 types of molecule found in the MAPK cascade [5] d[MKKK]/dt=v2 -v1 d[MKKK-P]/dt=v1 -v2 d[MKK]/dt=v6 -v3 d[MKK-P]/dt=v3 +v5 -v4 -v6 d[MKK-PP]/dt=v4 -v5 d[MAPK]/dt=v10 -v7 d[MAPK-P]/dt=v7 +v9 -v8 -v10 d[MAPK-PP]/dt=v8 -v9 Moiety conservation relations: [MKKK]total =[MKKK]+[MKKK-P]=100 [MKK]total =[MKK]+[MKK-P]+[MKK-PP]=300 [MAPK]total =[MAPK]+[MAPK-P]+[MAPK-PP]=300
Modelling the MAPK Signalling Pathway
487
Table 2. Rate equations and parameter values for each of the 10 reactions in the MAPK pathway diagram (Fig. 1. The Michaelis-Menten constants (K1 -K10 ) and molecular concentrations are given in nM. [Ras0 ] is the initial concentration of the input protein or MKKK kinase. The catalytic rate constants (k1 , k3 , k4 , k7 , k8 ) and the maximal enzyme rates (V2 , V5 , V6 , V9 , V10 ) are given in units of s−1 and nM·s−1 respectively [5]. Reaction Rate Equation v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
Parameter Values
k1 ·[Ras0 ]·[MKKK]/ ((1+([MAPK-PP]/KI )n )·(K1 +[MKKK])) V2 ·[MKKK-P]/(K2 +[MKKK-P]) k3 ·[MKKK-P]·[MKK]/(K3 +[MKK]) k4 ·[MKKK-P]·[MKK-P]/(K4 +[MKK-P]) V5 ·[MKK-PP]/(K5 +[MKK-PP]) V6 ·[MKK-P]/(K6 +[MKK-P]) k7 ·[MKK-PP]·[MAPK]/(K7 +[MAPK]) k8 ·[MKK-PP]·[MAPK-P]/(K8 +[MAPK-P]) V9 ·[MAPK-PP]/(K9 +[MAPK-PP]) V10 ·[MAPK-P]/(K10 +[MAPK-P])
k1 =0.025; n=1; KI =9; K1 =10 V2 =0.25; K2 =8 k3 =0.025; K3 =15 k4 =0.025; K4 =15 V5 =0.75; K5 =15 V6 =0.75; K6 =15 k7 =0.025; K7 =15 k8 =0.025; K8 =15 V9 =0.5; K9 =15 V10 =0.5; K10 =15
sufficiently large data set, a simulation of the signalling pathway was performed in silico to generate the data. The model used in the simulation was based on the model designed in [5] which was also based on an earlier model by [4] with the addition of negative feedback. The model uses Michaelis-Menten enzyme kinetics to derive chemical rate equations for each of the pathway connections in the cascade (see Fig. 1). The rate equations are given in tables 1 and 2. After setting the initial concentrations of each species and rate constants, the model can be solved for a particular time series. 3.2
Identification of the MAPK Model
A data set of 3000 samples were generated from the above simulation of the MAPK signalling cascade. The first 1000 samples in the set were used for model estimation, and the second 1000 samples were used for model validation. The final 1000 samples were used to test the response of the model to input noise. Table 3. Statistical analysis of the input-output data sets used for training, validation and with added noise. Ras corresponds to the input data vector (ut ) and MAPK-PP corresponds to the output data vector (yt ).
Mean Std. Deviation Min-Max
Ras 0.5255 0.2871 0-1
Training MAPK-PP 0.4104 0.3159 0-1
Validation Ras MAPK-PP 0.5036 0.4613 0.2930 0.3114 0-1 0-1
Ras 0.5118 0.2891 0-1
Noise MAPK-PP 0.4464 0.3062 0-1
488
P. Gormley, K. Li, and G.W. Irwin
Table 4. Parameters and model terms selected by each method. The forward approach selected the following 8 terms from the pool of 83: {4,5,6,2,80,23,81,22}. Whereas the two-stage method selected a different set of terms: {79,78,74,77,76,6,5,4}. Forward Selected Terms (Pk ) Parameters (Θk ) yt−1 yt−2 yt−3 ut−2 3 yt−2 yt−1 yt−2 2 yt−3 yt−2 2 yt−1
3.38180 -3.38070 1.16780 -0.00798 1.58010 0.80822 -1.30260 -1.28570
2-Stage Selected Terms (Pk ) Parameters (Θk ) 2 yt−1 yt−3 yt−1 yt−2 yt−3 3 yt−1 2 yt−1 yt−2 2 yt−1 yt−3 yt−3 yt−2 yt−1
0.54456 -3.65260 -1.33160 2.94140 1.44270 1.24310 -3.32110 3.10530
Table 5. Comparison of results for the models selected by forward selection alone and the 2-Stage algorithm
SSE MSE RMSE
Training Forward 2-Stage 0.7953 0.6970 0.0008 0.0007 0.0282 0.0264
Validation Forward 2-Stage 2.1934 1.8204 0.0022 0.0018 0.0468 0.0427
Noise Forward 2-Stage 2.3715 2.1158 0.0024 0.0021 0.0487 0.0460
In this case a normally distributed random noise variation of ±0.5% was added to the samples in the set. The sample values were normalised and the statistics of the corresponding input and output data sets are given in Table 3. A nonlinear polynomial AutoRegressive model with eXogenous inputs (NARX) with polynomial order of up to 3 was used to construct the model. The model input variables Ras (ut ) and MAPK-PP (yt ), with delays of up to 3 time steps each were used to construct the full model set, resulting in a candidate pool of 83 terms. The forward subset selection procedure was performed first and MDL was used as the stop criterion. Eventually, 8 terms were selected to model the MAPK cascade. Then the proposed two-stage forward and backward selection method was used to select a new subset of 8 terms. The different subsets of terms and parameters obtained by the two approaches are shown in table 4. To compare the performance of the two approaches, the results of training and validation for both methods are listed in Table 5. Table 5 also shows compares the performance of the two approaches when the models are subjected to noisey conditions. It is shown that the proposed two-stage method outperformed the conventional forward selection approach in terms of the modelling accuracy as predicted in the previous sections. Finally, the modelling output by the two-stage method is illustrated for training, validation and noise in Figure 2. It is shown from the figure that the produced polynomial NARX model matches well with the original MAPK cascade output in each case.
Modelling the MAPK Signalling Pathway
489
1 1 0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0 0 0
200
400
600
800
−0.2
1000
0
200
400
600
800
1000
1.2
1
0.8
0.6
0.4
0.2
0
−0.2
0
100
200
300
400
500
600
700
800
900
1000
Fig. 2. Model performance of the proposed method over the training data (top-left), validation data (top-right) and noisey data (bottom)
4
Conclusion
In this paper, a black-box model was created using an advanced two-stage identification algorithm in order to investigate the nonlinear dynamics of the MAPK cascade. An optimal and computationally efficient model was obtained for the pathway using the two-stage forward and backward subset selection approach. The results presented from simulations of the MAPK model confirmed the effectiveness of the two-stage algorithm over traditional forward selection approaches. Black-box identification methods such as this provide a simple and effective method of capturing the underlying features and dynamics of signalling pathways and may also open up the possibility of regulatory control design.
References 1. Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P.: Molecular Biology of the Cell, 4th edn. Garland Science (2002) 2. Widmann, C., Gibson, S., Jarpe, M.B., Johnson, G.L.: Mitogen-activated protein kinase: Conservation of a three-kinase module from yeast to human. Physiological Reviews 79(1), 143–180 (1999) 3. Sasagawa, S., Ozaki, Y., Fujita, K., Kuroda, S.: Prediction and validation of the distinct dynamics of transient and sustained erk activation. Nature Cell Biology 7(4), 365–373 (2005)
490
P. Gormley, K. Li, and G.W. Irwin
4. Huang, C.F., Ferrell, J.E.: Ultrasensitivity in the mitogen-activated protein kinase cascade. Proceedings of the National Academy of Science 93, 10078–10083 (1996) 5. Kholodenko, B.N.: Negative feedback and ultrasensitivity can bring about oscillations in the mitogen-activated protein kinase cascades. European Journal of Biochemistry 267, 1583–1588 (2000) 6. Levchenko, A., Bruck, J., Sternberg, P.W.: Scaffold proteins may biphasically affect the levels of mitogen-activated protein kinase signaling and reduce its threshold properties. Proceedings of the National Academy of Science 97(11), 5818–5823 (2000) 7. Markevich, N.I., Hock, J.B., Kholodenko, B.N.: Signaling switches and bistability arising from multisite phosphorylation in protein kinase cascades. The Journal of Cell Biology 164(3), 353–359 (2007) 8. Chen, S., Billings, S.A., Luo, W.: Orthogonal least squares methods and their application to non-linear system identification. International Journal of Control 50(5), 1873–1896 (1989) 9. Haber, R., Unbehauen, H.: Structure identification of nonlinear dynamic systems - a survey on input/output approaches. Automatica 26, 651–667 (1990) 10. Sjberg, J., Zhang, Q., Ljung, L., Benveniste, A., Delyon, B., Glorennec, P., Hjalmarsson, H., Juditsky, A.: Nonlinear black-box models in system identification: a unified overview. Automatica 31(12), 1691–1724 (1995) 11. Li, K., Peng, J., Irwin, G.W.: A fast nonlinear model identification method. IEEE Transactions on Automatic Control 50(8), 1211–1216 (2005) 12. Li, K., Peng, J., Bai, E.W.: A two-stage algorithm for identification of nonlinear dynamic systems. Automatica 42(7), 1189–1197 (2006) 13. Peng, J., Li, K., Huang, D.S.: A hybrid forward algorithm for RBF neural network construction. IEEE Transactions on Neural Networks 17, 1439–1451 (2006) 14. Draper, N.R., Smith, H.J.: Applied regression analysis. 2nd edn. John Wiley and Sons, Inc. USA (1981) 15. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning Data Mining, Inference and Prediction. Springer, New York (2001) 16. Lawson, L., Hanson, R.J.: Solving Least Squares Problem. Prentice-Hall, Englewood Cliffs, NJ (1974) 17. Miller, A.J.: Subset Selection in Regression. Chapman & Hall, Sydney, Australia (1990) 18. Ljung, L.: System identification: theory for the user. Prentice Hall, Englewood Cliffs, NJ (1987) 19. Sderstrm, T., Stoica, P.: System identification. Prentice-Hall, Englewood Cliffs, NJ (1989) 20. Mao, K.Z., Billings, S.A.: Algorithms for minimal model structure detection in nonlinear dynamic system identification. International Journal of Control 68(2), 311–330 (1997) 21. Chen, S., Wigger, J.: Fast orthogonal least squares algorithm for efficient subset model selection. IEEE Transactions on Signal Processing 43(7), 1713–1715 (1995) 22. Korenberg, M.J.: Identifying nonlinear difference equation and functional expansion representations: the fast orthogonal algorithm. Annals of Biomedical Engineering 16, 123–142 (1988) 23. Zhu, Q.M., Billings, S.A.: Fast orthogonal identification of nonlinear stochastic models and radial basis function neural networks. International Journal of Control 64(5), 871–886 (1996)
Modelling the MAPK Signalling Pathway
491
24. Sherstinsky, A., Picard, R.W.: On the efficiency of the orthogonal least squares training method for radial basis function networks. IEEE Transactions on Neural Networks 7(1), 195–200 (1996) 25. Andre, J., Siarry, P., Dognon, T.: An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization. Advances in Engineering Software 32, 49–60 (2001) 26. Peng, J., Li, K., Thompson, S.: A combined adaptive bounding and adaptive mutation technique for genetic algorithms. In: Proceedings of the 5th World Congress on Intelligent Control and Automation, Hangzhou, China (2004) 27. Akaike, H.: New look at the statistical model identification. IEEE Transactions on Automatic Control AC-19(6), 716–723 (1974) 28. Gustafsson, F., Hjalmarsson, H.: Twenty-one ml estimators for model selection. Automatica 31(10), 1377–1392 (1995) 29. Chen, S., Billings, S.A.: Neural network for nonlinear dynamic system modelling and identification. International Journal of Control 56, 319–346 (1992)
Design and Path Planning for a Remote-Brained Service Robot* Shigang Cui1, Xuelian Xu2, Zhengguang Lian2, Li Zhao2, and Zhigang Bing2 1
Institute of Semiconductors, CAS, 100083 Beijing, China {Shigang Cui}
[email protected] 2 Tianjin University of Technology and Education, 300222 Tianjin, China {Xuelian Xu,Zhengguang Lian,Li Zhao, Zhigang Bing}
[email protected]
Abstract. This article introduced an effective design method of robot called remote-brain, which is made the brain and body separated. It leaves the brain in the mother environment, by which we mean the environment in which the brain's software is developed, and talks with its body by wireless links. It also presents a real robot TUT06-B based on this method which has human-machine interaction, vision systems, manipulator etc. Then it discussed the path planning method for the robot based on ant colony algorithm in details, especially the Ant-cycle model. And it also analyzed the parameter of the algorithm which can affect the convergence. Finally, it gives the program flow chat of this algorithm.
1 Introduction Along with the progress of technical, the results of its research have been applied in people's daily life by a high speed and a great practicability. Service robot had emerged as the times required when the relevant subjects such as the robotics, artificial intelligence, communication etc. has been in a wonderful development. It can provide support for common families, which can do some housework and care for older and disabled, as well as public areas, for example, offices, hospitals, commercial facilities etc. TUT-06B which will be introduced in details in this paper is specially developed for the support work of hospitals. The brain and body of robot is designed as a whole in traditionally. It means that all of signal processing, programming and decision of the task, assigning commands are completed by the computer which is in the body of robot. However, with the development of the hardware, the computer in the body of robot should be update in a quickly speed. In the other hand, in order to accomplish each complex task, purely depending on a computer is insufficient to complete the massive data analysis and processing. So the traditional design method is inconvenient in the updating and expansion the function of the robot. *
This work was supported by the key project of Chinese Ministry of Education under Grants 205010.
K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 492–500, 2007. © Springer-Verlag Berlin Heidelberg 2007
Design and Path Planning for a Remote-Brained Service Robot
493
A remote-brained robot does not bring its own brain with the body [1].It means that the remote-brained robot treats the software and hardware of the whole system as two independent subsystems, and isolates their respective development from each other. Thanks to the fact that there is no need to embed a brain into the robot, the process of constructing the robot’s physical body and debugging brain software can be simplified. Thus, a developer can focus on problems with a subsystem and combine the part he is responsible for with those developed by others so as to conduct an evaluation [2-3]. Therefore, a robot based on remote-brained approach has been brought forward in order to reduce robot design cost, shorten the development cycle and simplify the design procedure. Path planning which is used to find a best route without collision from the startingpoint to the ending-point is an important research direction of robotics. There are many scholars from all over the world who do a great deal of research in this domain, and they also proposed many kinds of methods, such as artificial potential field method, genetic algorithm, fuzzy logic algorithm and so on. But these methods also have some shortcomings, for example, there will appear the local optimization by using the artificial potential field method, and the speed of genetic algorithm is much lower than we need. In 1991, an Italian scholar had been illumined by the behavior of the natural ants and come into being a bran-new general-purpose heuristic algorithm which can be used to solve different combinatorial optimization problems. That is ant colony algorithm. And the new heuristic has some desirable characteristics, such as versatile, robust, population based [5]. So far, this algorithm has solved successfully different combinatorial optimization problems, such as traveling salesman problem (TSP), the quadratic assignment problem (QAP) and the job-shop scheduling problem (JSP). Many efforts have been applied to solve the path planning problem of robot by ant colony algorithm [6-7]. In the paper, the ant colony algorithm is applied to the path planning of TUT-06B service robot.
2 TUT06-B Remote-Brained Mobile Service Robot We develop a novel service robot by using remote-brain approach and model design. There is a brain which is separated from the body, and a cerebellum in the body. They are communicating with each other by the wireless links which has many kinds of structure. The brain is in charge of complex signal processing and decision, the cerebellum controls the motion of the body. This framework of the robot allows us to carry out different kinds of robotics research in an environment that can be shared and inherited over generations [11]. This robot accepts the user’s order through the human-machine interface, and than transmit it to the brain. The human-machine interface include kinds of structure, such as keyboard and mouse by which we can type some orders that is what we want the robot to do, the speech recognition and synthesis by which we can talk with the robot directly rather than input orders. Then after analyzing and processing in the brain, it will make a decision which controls the robot’s motion, and through the wireless communication module, it will transmit the command including the direction, speed
494
S. Cui et al.
and the operation mode of the robot’s manipulator to the cerebellum in the body of robot. By the cerebellum’s dispatching and implementing, the robot will completes each kind of task. The structure figure of the system is shown in Fig.1. the Module of Speech Recognition and Synthesis
User
the Brain Keyboard and Mouse
the Module of wireless communication
the Module of Speech Recognition and Synthesis
the Cerebellum
the Actuator
Human-Machine Interface the Module of Global Vision
the Module of Local Sensor
Fig. 1. Structure figure of the system
Based on this thought, we developed a service robot “TUT06-B”, which is shown as Fig.2. It can move agilely and catch object smoothly.
Fig. 2. TUT06-B remote-brain service robot
The robot mainly relies on the global sensor such as global camera which can gather the global environment information and the local signal collecting module which is embed in the body of robot to obtain the real-time information including the robot’s position, the obstacles’ position. The global camera is installed in the work environment. For avoid to the single sensor the limit, the local signal collecting module is a ring structure that is made up of infrared sensor and sonar sensor, in this way, it can receive the exterior signal more efficiency. The sensor ring is as shown in Fig.3.
Design and Path Planning for a Remote-Brained Service Robot
Fig. 3. Structure of sensor ring
Fig. 4. Control System Structure
495
496
S. Cui et al.
And then the host computer as “the brain” of this system processes the collected data, in this way it can provide basis for the next task. Firstly, user assigns his order to the robot though the human-machine interface, then the brain can analyze these orders, and translate it to the language which the robot can understand. Afterwards, by the wireless communication module, the body makes response to the receiving orders. Though the cerebellum’s processing, it can diver actuators’ movement, at the same time, we can also obtain the information of motors’ motion and use it as a feedback by which we could monitor the motors and adjustment them when they work in unusual situation. In the other hand, this robot can talk with the user by the loudspeaker, because it adopted the expert system in the brain. When the robot is assigned missions or finish a task, it can tell the user. So, this robot is very humanly. As the picture shown, this is a display embedded in the body, it can provide a friendly human-machine interaction interface. We could debug and operate it expediently without the “brain”. The system divides into the user layer, the decision and control layer, sensor/bottom decision layer, the execution layer. Every layer communicate with each other by the standard physical level and protocol layer between them, therefore each layer has very strong extension. You can increase equipments and control procedure very convenient if you comply with defer to the standard protocol. As is shown in Fig.4.
3 The Ant Colony Algorithm The ant colony algorithm is based on the behavior of real ants searching for food. A lonely ant’s structure and behavior are very simple and limited, but the ant colony which is composed of thousands of ants has high socialization and can finish very complex mission. Real ants communicate with each other using an aromatic substance called pheromone, which they leave on the paths they traverse [10]. In the absence of pheromone trails ants more or less perform a random walk. In this algorithm, the artificial ants will be some differenced with the natural ants:artificial ants will have some memory; they will not be completely blind;they will live in an environment where time is discrete [5]. The principal characters of this algorithm are positive feedback, distributed calculation method and using constructive greedy heuristic. Positive feedback accounts for rapid discovery of good solutions, distributed computation avoids premature convergence, and the greedy heuristic helps find acceptable solutions in the early stages of the search process [4], [5]. As to the path planning of TUT06-B, we adopt the ant colony algorithm presented by reference [4]. According to reference [5], let bi (t )(i = 1," , n) be the number of ants at point i n
at time t and let
m = ∑ bi (t ) be the total number of ants. Let ij (t) be the intensity of i =1
trail on edge (i,j) at time t. So at time t+1, the intensity of trail on edge (i,j) is as the follow formula(1).
Where
Design and Path Planning for a Remote-Brained Service Robot
497
τ ij (t + 1) = ρτ ij (t ) ++τ ij (t , t + 1)
(1)
ρ ( ρ <1)is the residual coefficient as 1- ρ is the evaporation coefficient. m
+τ ij (t , t + 1) = ∑ +τ ijk (t , t + 1)
(2)
k =1
Where
+τ ijk (t , t + 1) is the quantity per unit of length of trail substance by the k-th
ant between time t and t+1.
We define the quantity ηij = 1 dij , where dij is the Euclidean distance between i to j, and define the transition probability from point i to j point as α
β
⎡τ ij (t ) ⎤⎦ ⋅ ⎡⎣ηij ⎤⎦ pij (t ) = n⎣ α β ∑ ⎡⎣τ ij (t ) ⎤⎦ ⎡⎣ηij ⎤⎦
(3)
j =1
Where α and β are parameters that allow a user control on the relative important of trail versus visibility. Therefore the transition probability is a trade-off between visibilities, which says that close points should be chosen with high probability. In order to satisfy the constraint that an ant should visit different cities, we introduced the tabu in which we can register the visited cities for each ant. When once search was completed, the tabu had been updated by added the visited city in this search. Different choices of computing
+τ ijk (t , t + 1) and updating τ ij (t ) cause different
instantiations of the ant-algorithm, such as the Ant-quantity, the Ant-density and the Ant-cycle. The former two methods are all the local pheromone updating, and the later is the global pheromone updating. After a great deal experiment, we can found the Ant-cycle is more excellent than the other two. Here, we select the Ant-cycle model.
⎧Q ⎪ if +τ (t , t + n) = ⎨ Lk ⎪0 ⎩ k ij
k − th ant
passed
edge(i, j ) in its tour
otherwise
(4)
Where Q is a constant quantity of pheromone every time an ant goes from i to j, Lk is the tour length of the k-th ant. Since the ant colony algorithm had been proposed firstly in 1991, there has a lot of research in the model and the applications. However, the research for the convergence of the algorithm, which is focus on how to find the best path by a quicker speed and a higher accuracy, is start in recent years. In this paper, it has introduced the ant-cycle model. In this model, because of using the globe updating for the pheromone, the algorithm can get a better convergence than the ant-quantity model and ant-density model, which are using the local pheromone
498
S. Cui et al.
update. The ant-cycle model also can avoid the precocious phenomenon. According to the experiment, it can be proved. It gave the transition probability formula and the pheromone updating formula. The parameters considered here are those that affect the convergence: α, it relate to the importance of the pheromone. When α<1, small values of α lead to slow convergence, the slower the smaller. And when α>2, it will observed an early convergence to suboptimal solutions. β, it relate to the importance of the visibility. When β=0, this is no convergence, progressively higher values lead to progressively quicker convergence. And in the pheromone updating formula, the parameter ρ also can relate the convergence. When ρ<0.5 or ρ>0.8, the convergence is much slower. The influence of Q is not very important in lots of experiments. So in the future study of the ant colony algorithm, it should enforce the research for the convergence, and set the parameters reasonably. In that way, it will improve the solution. S ta rt
In itia liz e
N c = N c + 1
s= s+ 1
A n t n u m b e r k = k + 1
C h o o s e th e to w n j to m o v e to , w ith p r o b a b i l i t y P i,j ( t ) g i v e n b y fo rm u la ( 3 ) ,m o v e k = th a n t to j
U p d a ta ta b u ( s )
k > = m
?
s> = n ?
U p d a ta th e p h e ro m o n e ij( t + n ) , w i t h f o r m u la ( 1 ) ,( 2 ),( 4 )
N c> = N m a x ?
P rin t th e b e s t p a th
E n d
Fig. 5. Program flow chart of Ant-cycle algorithm
Design and Path Planning for a Remote-Brained Service Robot
499
4 Specific Implementing Procedure After introduced the mathematic model of the ant colony algorithm, this paper will present the procedure of the algorithm.Fig.5. is the program flow char [8], [9]. Now, we give the complementarily explanation to this diagram: Initialize include let t =0, Nc=0, v=C, +τ ij = 0 , tabui = Φ , k=1, s=1, and parameter setting such as ηij ,α, β, ρ, Q.
5 Conclusions This article introduced an effective robot design method based on remote-brained. Using this method, it is very convenient to extend more functions because of the brain has separated from the body [13], [14]. And it mentioned a real robot TUT06-B, and explained its structure and work principles. In the other part of this paper, it presented the ant colony algorithm that is adopted to path planning in detail. By the past experiment, it can improve that this design method is eligible and effective. And in the next step, we need to research this algorithm in depth, and carry on massive experiments. In the other hand, we should also refer to other related algorithms. It is necessary to fuse and overlap these methods, in order to make the improvement of the algorithm, and complete the path planning more quickly, accurately and stably. In a word, we also have lots of work to do in the future. If you wish to include color illustrations in the electronic version in place of or in addition to any black and white illustrations in the printed version, please provide the volume editors with the appropriate files.
References 1. Inaba, M.: Remote-Brained Robots. In: Proceedings of Fifteenth International Joint Conference on Artificial Intelligence, pp.1593–1606 (1997) 2. Inaba, M., Kagami, S., Kanehiro, F., Hoshino, Y., Inoue, H.: A Platform for Robotics Research Based on the Remote-Brained Robot Approach. International Journal of Robotics Research 19(10), 933–954 (2000) 3. Inaba, M.: Remote-brained humanoid project. Advanced Robotics 11(6), 605–620 (1997) 4. Colorni, A., Dorigo, M., Maniezzo, V.: Distributed optimization by ant colonies. In: Proceedings of the 1st European Conference on Artificial Life, pp. 134–142 (1991) 5. Dorigo, M., Maniezzo, V.: Ant system: optimization by a colony of cooperating agents. IEEE Transactions on System, Man, and Cybernetics-Part B 26(1), 29–41 (1996) 6. Li, G., Liu, S.: Path planning of mobile robot based on improved ant colony algorithm. Control Engineering of China 12(5), 473–485 (2005) 7. Zhu, Q.: Ants predictive algorithm for path planning of robot in a complex dynamic environment. Chinese Journal of Computers 28(11), 1898–1906 (2005) 8. Duan, H.-B.: Ant Colony Algorithms: Theory and Applications, pp. 33–39, 72–75. Science Press, Beijing, China (2005) 9. Li, S.-Y., Chen, Y.-Q., Li, Y.: Ant Colony Algorithms with Applications, pp. 22–33. HIT Press, Haerbin, China (2004)
500
S. Cui et al.
10. Dorigo, M., Gambardella, L.M.: Ant Colonies for the Traveling Salesman Problem. BioSytem 43, 73–81 (1997) 11. Nakamura, T., Oohara, M., Ogasawara, T., Ishiguro, H.: Fast self-localization method for mobile robots using multiple omnidirectional vision sensors. Machine Vision and Applications 14(2), 129–138 (2003) 12. Randall, M.: A Parallel Implementation of Ant Colony Optimization. Journal of Parallel and Distributed Computing 62, 1421–1432 (2002) 13. Brown, R.G., Donald, B.R.: Mobile Robot Self-Localization without Explicit Landmarks. Algorithmica 26(3-4), 515–559 (2000) 14. Korayem, M.H., Basu, A.: An Educational Autonomous Mobile Robot Measurement of Accuracy. Advanced Manufacturing Technology 20(3), 236–240 (2002)
Adaptive Fuzzy Sliding Mode Control of the Model of Aneurysms of the Circle of Willis Peijun Ju, Guocai Liu, Li Tian, and Wei Zhang Taishan University, Department of Mathematics and System Science Shandong Taian 371021, China
[email protected]
Abstract. An adaptive fuzzy sliding mode controller for highly nonlinear biological system is proposed by considering the model of aneurysms of the circle of willis. The fuzzy logic system approximates and adaptively cancels an unknown plant nonlinearity using the state variables. A control law and adaptive laws for unknown parameters and bounding constant are established so that the whole closed-loop system is stable in the sense of Lyapunov. This paper discussed the nonlinear dynamic behavior and control of blood speed in aneurysm. The simulation for the aneurysms model demonstrates that the proposed controller provide good tracking and estimation performances.
1
Introduction
The aneurysms of the circle of willis are common in the general population and their rupture is a catastrophic event. Austin first reported a model of aneurysms of the circle of willis[1]. x ¨ + αx − βx2 + γx3 = F cosω
(1)
Nieto studied the periodic solution of the general system[2-4]. x¨ + px˙ + αx − βx2 + γx3 = F cosω
(2)
Fuzzy logic systems(FLS) have been successful applied to many control problems because they do not need an accurate mathematical model of the system under control and they can co-operate with human expert knowledge. Wang[5] proved that the FLS is a universal approximator and the output of the system can be represented by a linear combination of the fuzzy basis functions. Based on this property, an adaptive fuzzy sliding mode control design for nonlinear systems was presented[6]. In the direct adaptive fuzzy control scheme the FLS is used to estimate the plant dynamics and these estimates are used to generate controls that achieve asymptotoc tracking of a reference input[7]. For this, the adjustable fuzzy parameters are updated online by an adaptive law based on the Lyapunov approach. We describe the the adaptive fuzzy sliding mode control of the system (2) feedback linearizing which is mainly inspired by [6]. The update laws for parameters and the sliding mode provide the Lyapunov stability for the closed-loop K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 501–506, 2007. c Springer-Verlag Berlin Heidelberg 2007
502
P. Ju et al.
system, and guarantee that the tracking error as well as all the signals involved are uniformly ultimately bounded.
2
Controller Design
Consider the following nonlinear SISO systems x¨ = f (x, t) + g(x, t)u
(3)
where x ∈ n are the state vector, f and g are unknown nonlinear functions.The desired output xd , and we define the tracking error e = x − xd = [e, e, ˙ · · · , e(n−1) ]τ . The sliding mode s(x, t) = ce, where c = [c1 , c2 , · · · , c(n−1) , 1]. Let the sliding mode control input be determined as u=
R − f (x, t) g(x, t)
(4)
Let ξ1 (x, t) = x¨d − c1 e, ˙ R = ξ1 (x, t) − ksgn(s), k > 0. Two FLSs fˆ(x, t) and gˆ(x, t) is employed to approximate the unknown nonlinear functions f (x, t) and g(x, t), respectively. An adaptive FLS is defined as a fuzzy system equipped with a learning algorithm where the FLS is constructed from a set of fuzzy IF-THEN rules using fuzzy logic principles, and the learning algorithm adjusts the parameters of the FLS based on the training information. We consider the case where the fuzzy rule base consists of rules for a multi-input and single-output (MISO) in the following Rj : IF X1 is Aj1 and X2 is Aj2 and XM is AjM THEN y j is B j . with j1 = 1, 2, · · · , N1 ,j2 = 1, 2, · · · , N2 ,· · ·, iM = 1, 2, · · · , NM , Xj , j = 1, 2, · · · , M are the input variables of the FLS, y is the output variable of the FLS, and fuzzy set Aji ∈ Uj and B j ∈ V are linguistic terms characterized by fuzzy membership functions Aji (Xj ) and B j (y j ), respectively. The fuzzy membership functions used in this paper is a gaussian-shaped form. (Xj − pij )2 j (5) Ai (Xj , pij , qij ) = exp − 2qi2j with the centroid pij and the width qij . By using the singleton fuzzifier, product inference engine, and the centre-average defuzzifier, the output of FLSs can be expressed as follows m j n j j=1 y ( i=1 μAi (xi )) (6) y(x) = m n j j=1 ( i=1 μAi (xi )) where μAji (xi ) are the fuzzy membership functions of xi . Let θ=[y 1 , y 2 , · · · , y m ]τ , ξ(x) = [ξ 1 (x), ξ 2 (x), · · · , ξ m (x)]τ , n μAji (xi )) j ξ (x) = m i=1 , n j j=1 ( i=1 μAi (xi ))
Adaptive Fuzzy Sliding Mode Control of the Model
503
then the FLS of (6) is equivalent to y(x) = θτ ξ(x)
(7)
The aim of this paper is to design a adaptive fuzzy sliding mode controller for systems (3) under plant uncertainties which guarantees boundedness of all the estimated variables of the closed-loop system and forces the system output to track a given desired output. It is equivalent to stabilize the system x¨ = −px˙ − αx + βx2 − γx3 + F cosω + u
(8)
Theorem 1. Consider the system (8), the control input u(t) =
R − fˆ(x, t) gˆ(x, t)
(9)
fˆ(x | θf ) = θˆfτ ξ(x), gˆ(x | θg ) = θˆgτ ξ(x). θˆgτ and θˆfτ are the estimate of the optimal active parameter vector θ∗ . The update laws for θˆf and θˆg ˙ ˙ θˆf = −r1 sξ(x), θˆg = −r2 sξ(x)u
(10)
where r1 , r2 are positive design constants. Then the closed-loop system are Lyapunov stability and the tracking error as well as all the signals involved are uniformly ultimately bounded. Proof. Let θf∗ and θg∗ are the optimal active parameter vectors satisfying ∗ ˆ θ = arg min sup |f (x|θf ) − f (x, t)| f
θf
θg∗
(11)
X∈(n )
= arg min θg
g (x|θg ) − g(x, t)| sup |ˆ
(12)
X∈(n )
We define the approximate error ω = fˆ(x|θf ) − f (x, t) + (ˆ g (x|θg ) − g(x, t))u
(13)
For the two order system s˙ = c1 e˙ + e¨ = c1 e˙ + f (x, t) + g(x, t)u − x ¨d = f (x, t) + g(x, t)u − ξ(x, t) Then with (9) and (13)we obtain s˙ = [f (x, t) − fˆ(x, t)] + [g(x, t) − gˆ(x, t)]u − ksgn(s) = fˆ∗ (x, t) − fˆ(x, t)] + [ˆ g ∗ (x, t) − gˆ(x, t)]u − ksgn(s) + ω = θ˜τ ξ(x) + θ˜τ ξ(x)u(t) − ksgn(s) + ω f
g
where θ˜f = θˆf − θf∗ , θ˜g = θˆg − θg∗ , fˆ∗ (x, t) = fˆ∗ (x|θf∗ ), gˆ∗ (x, t) = gˆ∗ (x|θg∗ ).
(14)
504
P. Ju et al.
Consider the following Lyapunov function 1 2 1 1 V = s + θ˜fτ θ˜f + θ˜gτ θ˜g 2 r1 r2
(15)
where r1 , r2 are positive constants. Differentiating (15) with respect to time. 1 ˙ 1 ˙ V˙ = ss˙ + θ˜fτ θ˜f + θ˜gτ θ˜g r1 r2
1 ˙ 1 ˙ = s θ˜fτ ξ(x) + θ˜gτ ξ(x)u(t) − ksgn(s) + ω + θ˜fτ θ˜f + θ˜gτ θ˜g r1 r2
1 1 ˜τ
˙ ˙ = θf r1 sξ(x) + θ˜f + θ˜gτ r2 sξ(x)u(t) + θ˜g − k|s| + sω r1 r2 From the update laws (10) we derive that V˙ = −k|s| + sω
(16)
According to the universal approximation theorem for the FLS the positive constant ω can be very small, then we can obtain that V˙ ≤ 0. In order to reduce the chattering we use the continue function Sδ instead of sgn(s) s Sδ = (17) , δ = δ0 + δ1 e |s| + δ
3
Simulation
To illustrate the control procedure and performance we apply our adaptive fuzzy sliding mode controller to control the model of aneurysms of the circle of willis
Fig. 1. Membership functions for Xi and Control input
Adaptive Fuzzy Sliding Mode Control of the Model
505
Fig. 2. Position tracking and Position tracking error
(8). The main parameter in the systems: p = 0.4, α = 0.889, β = 3, γ = 2, F = 0.01, ω = 1.8. The tracking objective is to make the output x(t) follow a desired reference xd (t) = 0.1sin(πt), The initial values are x0 = [−π/30, 0], r1 = 5.0, r2 = 1, c = 5, δ0 = 0.03, δ1 = 5, k = 5, θˆf = θˆg = 0. There are five fuzzy membership functions μN M (xi ) = exp[−((xi + π/6)/(π/24))2 ], μN S (xi ) = exp[−((xi + π/12)/(π/24))2], μZO (xi ) = exp[−((xi )/(π/24))2 ], μP S (xi ) = exp[−((xi − π/12)/(π/24))2 ], μP S (xi ) = exp[−((xi − π/6)/(π/24))2 ]. The number of entire fuzzy rule is 25. Figure 1-2 depict the simulation results. Figure 1 shows the Membership functions degree and the control input. The trajectory of the system output, the desired output and the tracking error is in Figure 2.
4
Conclusions
The main contribution has been to propose the adaptive sliding mode controller for uncertain nonlinear biological systems–the system of aneurysm of circle of Willis using a fuzzy system. Inspired by the adaptive sliding mode control proposed in [6], a simpler controller was proposed. The adaptive laws and control input are established to stabilize the closed-loop system in the Lyapunov sense and to guarantee that the system stable. The simulation for the system demonstrates that the proposed controller provide good tracking and estimation performances. Acknowledgments. This work is supported by the Project of Science and Technology of the Education Department of Shandong Province (J04A64), and the Science Foundation of Taishan University (Y06-2-04).
506
P. Ju et al.
References 1. Austin, G.: Biomathematical model of the circle of willis I: the duffing equation and some approximate solutions. Math. Biosci. 11, 163–172 (1971) 2. Nieto, J.J., Torres, A.: A mathmatical model of aneurysm of circle of willis. J. Bio. Syst. 3, 653–659 (1995) 3. Nieto, J.J., Torres, A.: A nonlinear biomathematical model for the study of intractanial anourysms. J. Neuro. Sci. 177, 18–23 (2000) 4. Nieto, J.J., Torres, A.: A approximation of solutions for nonlinear problems with an application to the study of aneurysms of the circle of willis. Nonlinear analysis 40, 512–521 (2000) 5. Wang, L.X.: Stable adaptive fuzzy controllers with application to inverted tracking. IEEE Trans. Fuzzy syst. 26, 677–691 (1996) 6. Wang, J., Rad, A.B., Chan, P.T.: Indirect adaptive fuzzy sliding mode control: Part I: fuzzy switching. Fuzzy sets and systems 122, 21–30 (2001) 7. Park, J.H., Rark, G.T.: Robust adaptive fuzzy controller for nonlinear system using estimation of bounds for approximation errors. Fuzzy syst. 133, 19–36 (2003)
Particle Swarm Optimization Applied to Image Vector Quantization Xubing Zhang1, Zequn Guan1, and Tianhong Gan2 1
Remote Sensing Information Engineering School of Wuhan University, Wuhan, China, 430079
[email protected] 2 Geodesy and Geomatics School of Wuhan University, Wuhan, China, 430079
[email protected]
Abstract. Codebook design of VQ (Vector Quantization) is a global optimization problem. The LBG algorithm depends upon the initial codebook and is prone to converge to a local optimal solution. To solve the problem, adopt PSO (Particle Swarm Optimization) to design the optimal codebook of image vector quantization and present PSO-VQ (PSO Vector Quantization) algorithm. According to PSO-VQ, a particle indicates a codebook and the optimal codebook is obtained from iterations of the initial codebooks by method of the particle evolvement. To ensure the solution converge to the global optimal codebook, the authors presented the PCO (Particle Coherent Operation), by which the code vectors of each initial codebook are sorted in ascending order based on the average gray value of the pixels in the code vector, and so that the inner structures of all the particles are essentially identical. The experimental results show that the PSO-VQ algorithm is feasible and effective, as well as develops the application of the PSO.
1 Introduction VQ is an efficient method of data compression coding. The performance of VQ is determined by the codebook, which makes it play the key role in the usage of VQ. While there is no a universal method by which we can design a global optimal codebook. Because of the dependence upon the initial codebook, the classical LBG [1] codebook design algorithm is prone to converge to a local optimal solution, as well as the serious blocking effects, distortion of the edge and detail exist in the decoded images. To overcome the disadvantage of LBG, there are various improved algorithm such as, SOFM (Self Organizing Feature Maps) algorithm [2], FVQ (Fuzzy Vector Quantization) algorithm [3], introducing genetic algorithm for codebook design [4] or combining LBG with other algorithm [5]. Differed from the algorithms above, we applied the PSO algorithm to design the optimal codebook of the image VQ, which is a new improved algorithm based on the global parallel search principle of the PSO. Particle Swarm Optimization (PSO) algorithm [6], which developed in recent years, is a kind of multiobjective (MO) evolutionary technology. Based on the metaphor of K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 507–515, 2007. © Springer-Verlag Berlin Heidelberg 2007
508
X. Zhang, Z. Guan, and T. Gan
social interaction, this algorithm searches a space by adjusting the trajectories of individual vectors, called particles as they are conceptualized as moving points in multidimensional space [7]. It requires only primitive mathematical operators, and is computationally inexpensive in terms of both memory requirements and speed [6]. PSO algorithm is widely applied to various continuous space optimization problems [8], network training [9], fuzzy system control [10], and combinatorial optimization problems [11], [12]. While there are few reports about the PSO applied in the image compression. The global parallel search character of PSO is naturally suitable to solve the dependence upon the initial codebook of LBG. So we presented PSO-VQ algorithm and PCO. According to PSO-VQ, the optimal codebook was obtained through iterative evolving from the numbers of initial codebooks exclusively. The experimental results prove that the PSO-VQ algorithm and PCO are feasible and effective.
2 Vector Quantization and LBG Algorithm The codebook design of the VQ is a global optimization problem, and the optimization aim is to make the distortion of VQ system to be the least. VQ is a map from the k-dimensional Euclidean space to the finite subset C, which can be described as Q:Rk → C, where C={C1,C2,...,CN /Ci RK} is a codebook, and Ci is named codeword or code vector. The map satisfy a definition that Q(V/V RK, V=(v1,v2,...,vk))=Ci, where Ci=(Ci1,Ci2,...,Cik) is a code vector of the codebook; d(V,Ci)=min(d(V,Cj)), where 1≤j≤N, and d(V,Cj) is the distortion measure between the vector V and the codeword Cj. MSE (Mean Square Error) is usually adopted to be the distortion measure, which is defined as k
d (V , C j ) = ∑ (vi − cij ) 2
(1)
i =1
The process of the codebook design is to search the subset C which makes the distortion of VQ system to be the least, as the follow Eq.
J (C ) = min{ ∑ mind (V , Ci ), Ci ∈ R k } c
v∈ R k
C i ∈C
(2)
The LBG algorithm presented by Linde, Buzo and Gray is essentially that the k-means clustering method used in the VQ. According to LBG, the initial codebook is chosen from the training vectors, and the local optimal codebook is obtained by quantization, clustering and iteration of the initial codebook. The detail of LBG can be referred to [1].
3 Particle Swarm Optimization (PSO) Algorithm PSO is a population-based, self-adaptive, stochastic optimization technique [6]. The basic idea of the PSO is the mathematical modeling and simulating of the food searching activities of a swarm of birds (particles). In the multidimensional space
Particle Swarm Optimization Applied to Image Vector Quantization
509
where the optimal solution is sought, each particle in the swarm is moved toward the optimal point by adding a velocity to its position. The velocity of a particle is influenced by three components, namely, inertial operator, cognitive operator, and social operator. The inertial operator simulates the inertial behavior of the bird to fly in the previous direction. The cognitive operator models the memory of the bird about its previous best position, and the social operator models the memory of the bird about the best position among the particles (interaction inside the swarm). The particles move around the multidimensional search space until they find the food (optimal solution). Based on the above discussion, the mathematical model for PSO is as follows. Velocity update equation is given by
vi , j (t + 1) = wv(t ) + c1r1 ( xi#, j (t ) − xi , j (t )) + c2 r2 ( x*j (t ) − xi , j (t ))
(3)
i=1,2,3,…,n; j=1,2,3,…,m.
wv(t ) is inertial operator, c1r1 ( xi#, j (t ) − xi , j (t )) is cognitive operator, and
c2 r2 ( x*j (t ) − xi , j (t )) is social operator. Position update equation is given by
xi , j (t + 1) = xi , j (t ) + vi , j (t + 1)
(4)
i=1,2,3,…,n; j=1,2,3,…,m. where t
vi , j (t + 1)
iteration count; dimension j of the velocity of particle i at iteration t+1;
xi , j (t + 1)
dimension j of the position of particle i at iteration t+1;
w c1,c2
inertia weight; acceleration coefficients;
xi#, j (t )
pBesti , j (t ) , dimension j of the own best position of particle i until iteration t;
∗ j
x (t ) m n r1,r2
gBest j (t ) , dimension j of the best particle in the swarm at iteration t; dimension of the optimization problem (particle); number of particles in the swarm; two separately generated uniformly distributed random numbers in the range [0, 1].
Based on Eq. (3) and (4), all particles search around the solution space for an optimal solution iteratively. They trace track the two best positions of pBest and gBest until a criterion is met. The criterion is usually the maximum number of iterations, acceptable fitness of the gBest, or tolerable convergence of all particles.
510
X. Zhang, Z. Guan, and T. Gan
4 Particle Swarm Optimization Vector Quantization (PSO-VQ) Algorithm The ideal of PSO applied to codebook design of VQ is described as follows. As before-mentioned, VQ is a global optimization problem, and the objective function of the optimization is the distortion of the VQ. Suppose that there are m code vectors in the codebook, the codebook design is to find m code vectors to make the system distortion the least. On the other hand, PSO is an effective solution of the global optimization problem. According to PSO, the global optimal solution is sought parallel in the multidimensional space, and the particle whose fitness is the largest is the global optimal solution. So that if we regard the codebook as a particle, the code vectors of the codebook as the components of the particle, and make the particle fitness to be inverse ratio to the distortion of VQ, we could adopt the PSO to design the optimal codebook of VQ as follows. 4.1 Particle Coding PSO-VQ algorithm is adopted here to design the optimal codebook. Each particle represents one codebook, and particle swarm represents a set of codebooks. Given there are n particles (codebooks) in the particle swarm X, X = { X i / i = 1,2,3,..., n} , and every codebook contain m code vectors, the number of vectors’ dimensionality is d. The position and velocity of a particle can be coded as follows.
⎡ X i ,11 ⎢X i , 21 Xi = ⎢ ⎢ M ⎢ ⎣ X i , m1
X i ,12 ... X i ,1d ⎤ ⎡Vi ,11 Vi ,12 ... Vi ,1d ⎤ ⎢V ⎥ X i , 22 ... X i , 2 d ⎥ Vi , 22 ... Vi , 2 d ⎥⎥ i , 21 ⎢ , Vi = ⎢ M M M ⎥ M M ⎥ ⎢ ⎥ ⎥ X i , m 2 ... X i , md ⎦ ⎣Vi , m1 Vi , m 2 ... Vi , md ⎦
Xi is the position of particle i, and Vi is the velocity of particle i, m is the dimensionality of the particle. The concept of component particle is presented here. It represents 1-dimensional component of the particle, i.e. code vector. The component particle acts an important role in this paper. 4.2 Fitness Given
R
training
vectors,
and
they
compose
a
vector
set
Y,
Y = { yr / r = 1,2,3,..., R} . If codebook Xi is used to do vector quantization, the average distortion of the vector set is
Di =
1 R ∑ min d ( yr , X i ) R r =1
(5)
Particle Swarm Optimization Applied to Image Vector Quantization
511
The squared Euclidean distance is adopted here as distortion measure.
min d ( yr , X i ) is the minimal squared Euclidean distance from the training vector to codebook Xi. The particle fitness is inverse ratio to the average distortion,
f ( X i ) = k / Di
(6)
where k is scale constant. The bigger the fitness value is, the better the performance of codebook is. 4.3 Algorithm Description 1) Initializing Swarm. According to the sub-imageÊs pixel gray value, a training vector set Y is generated. Then n × m vectors are selected randomly from the vector set to compose n particles, where a particle represents one codebook and contains m vectors, each vector contains d pixels, moreover, the pixel gray values indicate the initial positions of the particles. Initially, assign the velocity of every particle with 0 m× d , so do the initial pBest for each particle and the initial gBest of the particle swarm, and all the fitness values are set to be 0 as well. Thus we get the initial particle swarm. 2) Particle Coherent Operation. Based on the average gray value of d pixels among every code vector, m code vectors of each codebook are sorted in ascending order. This process is called Particle Coherent Operation (PCO). By PCO, the inner structures of all the particles being essentially identical, and a “coherent‰ relationship between different particles is established. PCO assures the particles advancing toward the optimal solution. It is so important in PSO-VQ that a detailed description will be given to it in the latter section of this paper. 3) Calculating and storing the fitness value of each particle. Comparing each particleÊs fitness evaluation with its pBestÊs fitness, if current fitness is better than pBestÊs, then pBest is reset to be the particleÊs current location and the fitness value is stored. Finding the best one from all particlesÊ pBest, if the fitness value of the best one is better than gBestÊs fitness value, then store this value and reset gBest to be the location of the pBest corresponding to this value. 4) Updating each particleÊs velocity Vand position Xaccording to Eq. (3) and (4). 5) Repeating steps 3) and 4) until a criterion is met. The criterion is usually the maximum number of iterations, acceptable fitness of the gBest, or tolerable convergence of all particles.
[]
4.4 Particle Coherent Operation (PCO) The “coherent” in this paper indicated that the components are ordered in the particle, and the inner structures of different particles are essentially identical. To the general PSO problem, each component of the particle is defined respectively and has its owner meanings and order in the particle, so that we say that all the particles of the swarm are “coherent”. While to the optimal codebook design of the image VQ with PSO, there is some difference from the general PSO problem. The code vectors of the initial codebooks are chosen randomly from the training vectors, the sequences of the vectors in the codebooks are also randomly, so the inner structures of different codebooks are
512
X. Zhang, Z. Guan, and T. Gan
not essentially identical. That is to say the particles of the swarm are “un-coherent” in codebook design optimization problem. The basic idea of the PSO is that the swarm of birds searches reiteratively the region around the bird which is near the food mostly and finds the food in the end. During the iterative process, particles move towards the region around the pBest and gBest in order to arrive at the optimal point. Since there are several component particles in each particle, the process of particles’ movements is essentially that the component particles move towards the positions of the corresponding component particles of the the pBest and gBest particles. So it is obvious that if there is no some corresponding relationship exists between the component particles of each particle, the evolution process of the particles will be disordered and could not converge to the optimal solution. From the discussion above, a conclusion is drawn that how to establish a coherent relationship between the inner structures of all the particles is considerable. To image, the gray means and variances of pixels are essential characteristics. In image VQ, every codebook contains m code vectors, and each code vector has d pixels. It is easy to find that the gray variances of the code vectors are small, because of the continuously spacial character of d pixels in a single code vector and the little dimensionality number of the vector. Therefore the gray value of one code vector can be indicated by the gray mean of the d pixels in the code vector. Based on the average gray value of d pixels in a single code vector, m code vectors of each initial codebook were ordered. Then the inner structures of different codebooks are essentially identical, and a “coherent” relationship between different particles is established. We called the above process “Particle Coherent Operation”, by which all the particles advanced towards the global optimal solution. The PCO is computationally inexpensive, effective and implemented simply, which is proved by the experiments as the follows.
5 Experimental Results Analyse Four test images, lenna, couple, flowers and peppers are used here. The sizes of codebooks were 64×4 and 64×16 respectively. This paper compared PSO-VQ with the LBG and SOFM algorithm. To demonstrate the effect of the PCO, we compared the PSO-VQ with PSO-VQ (no PCO) algorithm. The difference between the PSO-VQ and PSO-VQ (no PCO) is that there is no PCO step in PSO-VQ (no PCO) algorithm. The relative distortion threshold of LBG was set to be 0.05 [1]. The swarm size of PSO-VQ was 20, and the iterated times was 10. Parameters in Eq. (3) and (4) have great influence to the convergent rate of PSO. Clerc [7] had presented convergence coefficient model in order to improve the convergent rate. He has mathematically proved it could converge more quickly than original PSO, and his point of view is also supported by our experiments. According to this model, w=0.729, c1=c2=1.49445, which were adopted in our experiments. The PSNR of the decoded images were showed in table 1. Considering the paper’s limited length, Fig. 1 and 2 only shows the original images of lenna and couple, and the decoded images of the PSO-VQ and LBG algorithm, whose codebook size is 64×4.
Particle Swarm Optimization Applied to Image Vector Quantization
513
Table 1. The PSNR of the decoded images
Codebook size
PSNR/dB LBG SOFM PSO-VQ(no PCO) PSO-VQ
lenna
64×4 64×16
29.6943 30.2102 25.5411 25.8242
27.5039 24.2164
30.5722 25.9634
couple
64×4
31.0323 31.9026
30.0608
32.1557
64×16 64×4 64×16 64×4
26.9078 26.1317 22.3329 30.0839
27.7033 26.9749 22.6328 30.6387
25.8372 25.2693 20.8362 8.3694
27.6992 27.1563 22.5087 31.1764
64×16
26.9717 27.2392
25.7349
27.4269
flowers peppers
It can be seen from table 1 that the PSNR values of the PSO-VQ decoded images were larger than that of the other algorithms. For example, when the codebook size was 64×4, the average PSNR value of the PSO-VQ decoded images is larger than that of LBG, SOFM and PSO-VQ (no PCO) algorithms by 1.03dB, 0.33dB and 2.46dB respectively. From the Fig. 1 and 2 we could see that the visual qualities of the PSO-VQ decoded images were much better than that of LBG. The blocking effects and distortions appeared on the lady’s cap, forehead, and arm are obvious in the LBG decoded image of lenna. There were distortions alike in the LBG decoded image of couple, which were indicated by the red rectangles highlighted on the images; whereas this case didn’t appear in the PSO-VQ decoded images. In fact, it could be seen that the visual quality of PSO-VQ decoded image was much better than LBG even if the PSNR of the LBG decoded images was little larger than that of PSO-VQ. The Fig. 3 compared the LBG decoded image with the interim result of PSO-VQ (result of the fifth iteration). It was obvious that the visual quality of the interim result of PSO-VQ algorithm was better.
a) Original image
b) LBG
c) PSO-VQ
Fig. 1. The Fig. 1 a) is the original image of lenna. The Fig. 1 b) is the decoded image of the PSO-VQ, and the Fig. 1 c) is the decoded image of the LBG algorithm. The codebook size is 64×4.
514
X. Zhang, Z. Guan, and T. Gan
a) Original image
b) LBG
c) PSO-VQ
Fig. 2. The Fig. 2 a) is the original image of couple. The Fig. 2 b) is the decoded image of the PSO-VQ b), and the Fig. 2 c) is the decoded image of the LBG algorithm. The codebook size is 64×4.
a) LBG (PSNR: 29.6943dB)
b) PSO-VQ interim result (PSNR: 29.6122dB)
Fig. 3. The Fig. 3 a) is the decoded image of the LBG, and the PSNR is 29.6943. The Fig. 3 b) is the decoded image of the interim result of PSO-VQ algorithm, and the PSNR is 29.6122. The codebook size is 64×4.
In a word, PSO-VQ is prior to LBG not only on the objective criterion PSNR, but also the subjectively visual quality of the decoded images. PSO-VQ is a global optimization algorithm, and it can solve the problem that the optimal resolution is dependent on the initial codebook, which can not be avoided with LBG algorithm.
6 Conclusions The PSO is a rising optimization algorithm. The individuals of the particle swarm inherit the information of themselves and the best individual among the swarm, and adjust their owners’ positions by the random velocities in the solution space. Although PSO is relatively new, it has been shown to offer higher performance for global optimization as compared to canonical MO evolutionary algorithm [13]. The codebook design of VQ is a global optimization problem. It is a good resolution that designing the
Particle Swarm Optimization Applied to Image Vector Quantization
515
optimal codebook VQ with PSO. PSO was adopted here to implement image VQ compression, and the optimal codebook was obtained through iterative evolving from the initial codebooks. To ensure the solution converge to the global optimal codebook, PCO were presented in this paper as well. Finally, the results of the experiments showed that PCO was effective, and PSO-VQ solved the problem that the solution was strongly dependent upon the initial codebook of LBG algorithm, and improved the PSNR of the decoded image. Especially, it should be noted that a remarkable visual quality improvement of the decoded image was achieved in the experiments. As a new attempt, we developed the application of the PSO evolutionary theory in the image VQ. Acknowledgement. The authors would like to acknowledge the support of project “Geospatial Information Science and Technology IRT 0438”.
References 1. Linde, Y., Buzo, A., Gray, R.: An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications 28(1), 84–95 (1980) 2. Lancini, R., Tubaro, S.: Adaptive Vector Quantization for Picture Coding Using Neural Networks. IEEE Transactions on Communications 43( 234), 534–544 (1995) 3. Karayiannis, N.B., Pai, P.-I.: Fuzzy Vector Quantization Algorithms and Their Application in Image Compression. IEEE Transactions on Image Processing 4(9), 1193–1201 (1995) 4. Goldberg, D.E.: Genetic Algorithm in Search, Optimization and Machine Learning, 1st edn. Addison-Wesley, New York (1989) 5. Wu, Y., Coll, D.C.: BTC-VQ-DCT hybrid coding of digital images. IEEE Transactions on Communications 39(9), 1283–1287 (1991) 6. Kennedy, J., Eberhart, R.: Particle Swarm Optimization. Neural Networks, Proceedings. In: IEEE International Conference, Perth, WA, vol. 4, pp. 1942–1948. IEEE press, Los Alamitos (1995) 7. Clerc, M., Kennedy, J.: The Particle Swarm-Explosion, Stability, and Convergence in a Multidimensional Complex Space. IEEE Transactions on Evolutionary Computation 6, 58–73 (2002) 8. Kennedy, J.: The Particle Swarm: Social Adaptation of Knowledge. In: IEEE International Conference on Evolutionary Computation, pp. 303–308. IEEE press, Indianapolis (1997) 9. Van Den Bergh, F., Engelbrecht, A.P.: Training Product Unit Networks Using Cooperative Particle Swarm Optimisers. In: Proceedings of International Joint Conference on Neural Networks (IJCNN’01), vol. 1, pp. 126–132. IEEE press, Washington, DC (2001) 10. Feng, H.-M.: Particle Swarm Optimization Learning Fuzzy Systems Design. In: ICITA 2005. Third International Conference on information technology and applications, vol. 1, pp. 363–366. IEEE press, Los Alamitos (2005) 11. Kennedy, J., Eberhart, R.C.: A Discrete Binary Version of the Particle Swarm Algorithm. In: IEEE International Conference on Systems, Man, and Cybernetics, vol. 5, pp. 4104–4108. IEEE press, Orlando, FL (1997) 12. Abido, M.A.: Optimal Power Flow Using Particle Swarm Optimization. International Journal of Electrical Power & Energy Systems 24(7), 563–571 (2002) 13. Liu, D., Tan, K.C., Goh, C.K., Ho, W.K.: A Multiobjective Memetic Algorithm Based on Particle Swarm Optimization. IEEE Transactions on Systems, Man and Cybernetics, Part B 37(1), 42–50 (1996)
Face Detection Based on BPNN and Wavelet Invariant Moment in Video Surveillance Hongji Lin and Zhengchun Ye College of mathematic and computer Science, Fuzhou University, Fuzhou, 350002, Fujian, China {Lhj057,Yezhengchun526}@163.com
Abstract. A multi-view face detection method for Video surveillance is proposed in this paper. It provides the ability to extract high-level features in terms of human activities rather than low-level features like color, texture and shape. The method is capable of locating human faces over a broad range of views in color image sequences or videos with complex scenes. Firstly, an improved frame difference is used to acquire promising regions of the image. Then it uses the presence of skin-tone pixels to locate faces. Finally, an improved method based on wavelet invariant moment and BPNN is used to verify the candidate face regions. The experimental results show that the proposed algorithm has high speed and low error-detection rate, so it can be used in the real-time video surveillance system. The main distinguishing contribution of this work is being able to detect faces irrespective of their poses by using the wavelet invariant moments as input of the BPNN, whereas contemporary systems deal with frontal-view faces only. The other novel aspect of the work lies in its accuracy of acquiring the candidate area to segment objects from background with the help of motion information and skin information.
1 Introduction Face detection is becoming an important research topic, due to its wide range of possible applications, like security access control, content-based video indexing, or advanced human and computer interaction. The definition of face detection is: Given an arbitrary image, the goal of face detection is to determine whether or not there are any faces in the image and, if present, return the image location of each face [1]. The features included in the facial image are quite abundant, but which of these features will be more useful and how to make use of these features is quite an important issue for face detection. Two critical characteristics of face features are skin-color and gray features [2]. Skin-color is independent on the details of face and is robust to rotation and facial expression, so it can be used in fast face detection. The face detection methods using gray feature can be assigned into two categories: feature-based method and classification-based method. The former one searches for different facial features and then uses their spatial relationship to locate faces. The latter one detects faces by classifying all possible sub-images of a given image as face or non-face subimages. Neural network [3], Bayesian [4], support vector machines [5] and AdaBoost K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 516–525, 2007. © Springer-Verlag Berlin Heidelberg 2007
Face Detection Based on BPNN and Wavelet Invariant Moment in Video Surveillance
517
[6] are all used to construct a classifier to locate faces. However, little attention has been focused on the face detection in video or image sequence. The use of moving information will be helpful to reduce the search area to locate faces.
face
Classifier
Fig. 1. The image shows the flowchart of multi-view face detection in video surveillance
In this work, we designed a multi-view face detection method to detect faces of arbitrary poses in video frames. The system first eliminates the background from the image by performing motion detection. An improved motion detection algorithm is proposed in this paper, which can not only extract the more complete moving objects but also the objects with slight movement between frames. The purpose of this step is to improve the efficiency of algorithm. And then we use skin detection to detect face in the moving region. The subsequent face verification consists of a certifier based on wavelet invariant moment and BPNN. The remainder of the paper is organized as follows: An improved motion detection method is provided subsequently and then we use the skin information to detect face in moving regions. In Section 4 we present the performance of multi-view face detection verification. Finally, we get some conclusions by doing experiment and analysis.
2 Motion Region Abstract Many motion detection methods such as symmetrical frame difference and background subtraction are used to detect the moving object in video frames. The former two methods are complementary. So we can combine the advantages of the two methods together to bring forward an improved algorithm. 2.1
Traditional Motion Detection Methods
The symmetrical frame difference is an important method to detect the moving objects in images of intensity. Firstly, current frame is used to subtract its neighboring frame to get the residual image. Secondly, for each pixel in the residual image, if the value of point is bigger than the threshold we think it is moving, otherwise it is static.
518
H. Lin and Z. Ye
The output of this stage is a binary image wherein the white pixels denote movingtone in the original image. The procedure can be expressed as follow: ⎧⎪1 | f (x , y , t) − f (x , y , t − Δt) |> τ M d (x , y , t) = ⎨ ⎪⎩0 | f (x , y , t) − f (x , y , t − Δt) |≤ τ
(1)
The quantity t is the detecting time, (x, y) is the coordinate value of the pixel, f ( x , y , t ) is the gray value at t and Md (x, y , t) is the output binary image. The thresh-
old τ can be calculated by an adaptive threshold method [7]. An illustration of symmetrical frame difference is given in fig.2 using example images that will be used throughout the system description for illustration purpose.
Fig. 2. The first image is the current frame, the second one is its neighboring frame, the third one is the residual image and the last one is the binary image get by symmetrical frame difference motion detection method.
The motion detection using symmetrical frame difference has many advantages. First of all, the relative moving object between frames will be easily detected. A further point is that it is robust to illumination variations because the time interval between the frames is very short. Moreover, it has a high calculation speed, so we can use it in real-time system. As can be seen from the Fig.2, the algorithms can only detect edges of the moving object and what is more important is its insufficiency in detecting objects which are relative static between the frames. The background subtraction is similar to the symmetrical frame difference, but it relies heavily on the background, so its difficulty lies in background update. The procedure is expressed as follow: ⎧⎪1 | f (x , y ,t) − f (x , y ,0) |> τ Md (x, y ,t) = ⎨ ⎪⎩0 | f (x , y ,t) − f (x , y ,0) |≤ τ
(2)
The quantity f ( x , y ,0) is the background image and other quantities have the same meanings as former method. The background subtraction can abstract the moving object perfectly, but it is sensitive to noise and prone to false detections when background changes or illumination variations. 2.2 An improved Method of Motion Detection From analysis above, we can bring forward an improved algorithm: the modified dynamic background deduction algorithm together to present a dynamic background
Face Detection Based on BPNN and Wavelet Invariant Moment in Video Surveillance
519
model. Experimental results are given to demonstrate its validity and practically. The algorithm is shown as follow: Step1: Color images are converted into images of intensity. Step2: Before detection, one frame is saved as background ( f ( x , y ,0) ). Moving object may be included in it, so we use the following step to update it. Step3: Current frame f ( x , y , t ) subtracts the background f ( x , y ,0) and then we will get a binary image M d ( x, y , t ) by using a threshold τ . Step4: A dynamic and real-time the background image f ( x , y ,0) update strategy is put forward. We update the foreground and background using different strategies. For every pixel in the current frame f ( x , y , t ) , if the corresponding value in the binary image M d ( x, y , t ) is 0 , we think the point is static background, so we can use a weight A to update the background image f ( x , y ,0) . Considering the object’s permanent migration into and out of the image, we set a cumulative counter for each pixel in the background image. If the counter exceeds the threshold Tτ , we think the object is moving out of the image. So we can use the current frame f ( x , y , t ) ’s corresponding pixel to update the background’s pixel. Otherwise the value of pixel will remain unchanged and the counter will be added 1. We assume that there are N*M pixels in image, here N is width of the image and M is the height of the image. ∀i ∈ N * M , f ( xi , yi , t ) express the gray value of the ith pixel now. The dynamic and real-time background image updating strategy can be expressed as below: If ( M d ( xi , y i , t ) == 0 )//we think the pixel is background { T [i ] = 0 ;
// Reset the counter
//The dynamic background updating, the quantity A is the update weight
f ( xi , yi ,0) = A * f ( xi , yi , t ) + (1 − A) * f ( xi , yi ,0) ;} else { T [i ]+ = Δt ; if• T [i] >= Tτ • { T [i ] = 0 •// Reset the counter //Update the background using current frame }} f ( xi , yi ,0) = f ( xi , yi , t ) •}} step5: Repeat step 3 and step 4 until all pixels have been processed. The figures above show the residual image and binary image of the improved algorithm. Compared to motion detection using symmetrical frame difference, the improved algorithm has many advantages. The improved method can abstract more complete moving object region and will detect the moving object which can not be detected by the former method. For instance, the man stands in the right side of the image with minor motion will not be detected by symmetrical frame difference.
520
H. Lin and Z. Ye
Fig. 3. The above images are the residual image and binary image get by our method
3 Skin Detection The purpose of this stage is to eliminate all those image pixels from moving region where a human face is unlikely. Skin detection can be defined as the process of selecting which pixels of a given image correspond to human skin. Research result [8] shows that skin-color has clustering features in specific color space. Among all these color spaces, YCbCr has been widely used since the skin pixels form a compact cluster in the Cb-Cr plane.
Fig. 4. The image is the distribution of skin-tone pixels in Cr-Cb plane
Pixel classification is based on a manually marked training set of images taken from different sources. The output of this stage is a binary image wherein the white pixels denote skin-tone in the original image. This stage is able to yield the coarsest estimation of faces because only color information is considered and many false alarms may be generated due to background and non-face objects as skin-tone.
4 Candidates Face Verification After motion detection and skin detection only the moving faces and moving skins are contained in the candidate regions. In subsequent face verification, a major step is the abstraction of features from the candidate regions as basis for further verification. The wavelet invariant moments is robust to rotation, so we can use them as input of the BPNN to classify face of various pose.
Face Detection Based on BPNN and Wavelet Invariant Moment in Video Surveillance
521
4.1 Candidate Face Regions To minimize the effect of noise, shading and illumination variations, a certain amount of cleaning and preprocessing is performed prior to face verification. Math morphological method such as expansion and corrosion operation can be use as preprocessing. Corrosion operation can remove the region which is irrelevant to the structure of image, for example, the small isolate region in the image. In contrast, the small region near the object can be absorbed by using the method of expansion, so we can use expansion to fill the holes in the object. And then we use some red exterior boxes to mark all the candidate regions, as can be seen below.
Fig. 5. The first image is the binary image get by our method. The red box in second one shows the scope of the candidate region in binary image. And the last one shows the scope in the corresponding image of intensity.
4.2 Facial Features Abstraction To verify faces anywhere in the candidate region, the detector is applied at every location in that region, as can be seen more detail in [3].At first each input window is normalized to the same size, and then the shape of each region is used to distinguish face from other skin. The shape of objects is a difficult concept. A shape parameter is a set function value of which is independent on geometrical transformations such as translation, rotation, size changes and reflection. Therefore invariant moments are good characters to express the feature of the input windows. Since the conception of invariant moment emerged, a lot of new moments have brought forward to improved efficiency. The orthogonal property of the Zernike moments, which eliminates the redundancy and simplifies the process of image recognition, make the suggest feature selection approach practical. However they all act on global image, are more prone to come to wrong conclusion. Since these moments are designed to capture global information about the image, they are not suitable for classifying similar object when corrupted by a significant amount of random noise. In contrast wavelet moment invariants [9] presented for capturing both global and local information from the objects of interest and a method of selecting discriminative features based on a set of discriminative measures defined for the features. To get rotation invariant moments, typically the following generalized expression is used.
F pq = ∫ ∫ f (r ,θ ) g p (r )e jqθ rdrdθ
(3)
Where Fpq is the pq-order moments, g p (r ) is a function of radial variable r, and p, q are integer parameters. Wavelet transform is a method for accomplishing localized
522
H. Lin and Z. Ye
analysis, different from the traditional methods, wavelet transform is capable of providing both time and frequency localization. The characteristic of wavelet transform is particularly suited to extraction local discriminative features. g p (r ) in Eq.(3) can be treated as wavelet basis function and consider the family.
ψ m ,n (r ) =
1 m
ψ(
r−n ) m
(4)
Where m is a dilation parameter and n is a shifting parameter, form now on the basis function { g p (r ) } can be replaced by wavelet basis function {ψ m ,n ( r ) }. We consider using the cubic B-spline wavelet, which are optimally localized in space-frequency and are close to the forms of Zernike’s polynomial moments. The mother wavelet of the cubic B-spline in the Gaussian approximation form is ψ (r ) =
Where
4a n +1 2π (n + 1)
δ w cos(2πf 0 (2r − 1)) * exp(−
(2r − 1) 2 ) 2δ w2 (n + 1)
(5)
f 0 =0.409177, δ w2 =0.561145, a=0.697066 and n=3. || Fmwavelet ||=|| ,n ,q
2π 1
∫ ∫ F (r , θ )Ψ
mn
(r )e jqθ rdrdθ ||
(6)
0 0
m +1
;
Where m=0,1,2,3, …;n=0,1,…, 2 q=0,1,2,3; It is well known that the selection of discriminative features is a crucial step in any face verification, since the next stage sees only these features and acts upon them. The wavelet moments are so good that we can use them to express the facial features. 4.3 Face Verification In order to robustly verify highly variable face patterns of arbitrary pose in complex real world images, we present a novel face verification approach based on neural network architecture. For this type of network, the most common neural network is called Back Propagation neural network (BPNN).We then discuss below how to establish the face verification model. First of all, we need to describe how many layers and how many neurons per layer of the network are utilized in the network. By trial, we find that the BPNN which consists of three-layers is the best solution. The details of the BPNN are shown in fig.6. X1 X2 Xn
Input Hidden layer
Incentive Function f
Threshold:
θ
Y
Output
Fig. 6. The first image is the structure of the BPNN, and the next one demonstrates the processing element of the BPNN.
Face Detection Based on BPNN and Wavelet Invariant Moment in Video Surveillance
523
The first layer is merely used to supply the input patterns to the network. In this article, the best four wavelet invariant moments are selected as input(x1, x2, x3, x4). Each neuron of second layer receives a signal from the former neurons, and each of those signals is processed by processing element as can also be seen in fig.6. Each input of the processing element is multiplied by a separate weight value. The weighted inputs are summed, and passed through an incentive function which scales the output to a fixed range of values. The output of the incentive is then broadcast to all of the neurons in the next layer. The final output of the BPNN is a vector consists of three elements (y1, y2, y3), whose elements are corresponding to frontal face (y1), non-face (y2) and profile face (y3). If the value of y1 is 1, we think that the input image contains a frontal face, otherwise it isn’t. The processing element can be defined as follows: n
y = f ( ∑ wi x i − θ )
(7)
i =1
Where
wi is the weight of the input xi , θ is the threshold and f is the incentive func-
tion. The following incentive S-type function is used to scale the output: f ( x) =
1 1− e
− λx
(8)
BPNN learns by example, that is, we must provide a learning set that consists of some input examples and the known-correct output for each case. So, we use these input-output examples to show the network what type of behavior is expected, and the BPNN allows the network to adapt. More discussions on the theoretical foundation of BPNN can be found in [10]. The BPNN training process works in small iterative steps: one of the example cases (frontal face, profile face or nonface) is applied to the network, and the network produces some output based on the current state of its synaptic weights (initially, the output will be random). This output is compared to the known-good output, and a mean-squared error signal is calculated. The error value is then propagated backwards through the network, and small changes are made to the weights in each layer. The weight changes are calculated to reduce the error signal for the case in question. The whole process is repeated for each of the example cases, then back to the first case again, and so on. The cycle is repeated until the overall error value drops below some pre-determined threshold. At this point we say that the network has learned the problem "well enough" - the network will never exactly learn the ideal function, but rather it will asymptotically approach the ideal function. For training a neural network for face detection, we collect profile images and frontal images from image set CMU to train the BPNN. But the selection of nonface is more challenging because of its difficulty in characterizing prototypical nonface images. We present a straightforward procedure for aligning positive face examples for training. To collect negative examples, we use a bootstrap algorithm, which adds false detections into the training set as training progresses. Practically any image can serve as a nonface example because the space of nonface images is much larger than the space of face images. However, collecting a
524
H. Lin and Z. Ye
“representative” set of nonfaces is difficult. Instead of collecting the images before training is started, the images are collected during training, in the following manner: Step1. Create an initial set of nonface images by generating 1000 random images. Apply the preprocessing steps to collect the wavelet invariant moments for them. Step2. Train a neural network to produce an output of 1 for the face examples, and 0 for nonface. The training algorithm is standard BPNN algorithm. On the first iteration of this loop, the network’s weights are initialized randomly. After this, we use the weights computed by training in the previous iteration as the starting point. Step3. Run the system on an image of scenery which contains no faces. Collect images in which the network incorrectly identifies a face (an output activation > 0). Step4. Select up to 250 of these images at random, apply the preprocessing steps, and add them into the training set as negative examples. Go to step 2. Though the BPNN is widely used, two major problems of this algorithm are the slow convergence speed and the local minimum. So some modification can be made to accelerate the convergence of S-type incentive function. Due to the nonlinear character of the function, only when the value of the input bigger or smaller than certain value, the output will close to 0 or 1, which will make the correction process slow in certain extent. Specifically, when the S-type function of the actual output is less than 0.01 or greater than 0.99, the output will take 0.01 or 0.99 directly. In addition, a variable inertia coefficient correction can be used to improve the method.
5 Experimental Results and Summary Little attention has been focused on the mult-view face detection in video, so we can’t compare our method with other precisely. Here are some results of our method compared with other methods.
Fig. 7. Some detection result of our experiments Table 1. Face detection rate and speed of our method compared to other
Methods
Velocity (frame/second)
H. Rowley [3] P. Viola [6] Our method
1 15 20
Frontal face Detection rate (%) 86.0 92.1 90.3
Multi-pose face Detection rate (%) * * 83.4
Image size (by pixels) 320*240 384*288 320*240
Face Detection Based on BPNN and Wavelet Invariant Moment in Video Surveillance
525
Rowley [3] and Viola [6]’s method were all designed to detect frontal face in static image. In contrast, we use motion and skin information to determine the scope of the image, and then we use the wavelet invariant moments as input to verify the face in the candidate region. Due to the wavelet invariant moments’ robust to rotation, multiview face at any degree of rotation in image plane will be detected. It can be seen that the proposed method is very efficient and has significant value in application.
References 1. Yang, M.H., Kriegman, D., Ahuja, N.: Detecting faces in images: A survey. IEEE Trans. Pattern Analysis and Machine Intelligence 24, 34–58 (2002) 2. Lu-Hong, L., Hai-Zhou, A., Guang-You, X., Bo, Z.: A Survey of Human Face Detection. Chinese J. Computers 25, 450–458 (2002) 3. Rowley, H.A., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Trans. Pattern Analysis and Machine Intelligence 20, 23–38 (1998) 4. Schneiderman, H., Kanade, T.: A statistical method for 3D object detection applied to faces and cars. In: IEEE Conf. Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina (2000) 5. Osuna, E., Freund, R., Girosi, F.: Training support vector machines: an application to face detection. In: Proceedings of the Computer Vision and Pattern Recognition. Puerto Rico, pp.130–136 (1997) 6. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conf. Computer Vision and Pattern Recognition, Kauai, Hawaii, USA (2001) 7. Liang, Y.F., Wilder, J.: Real-time face tracking. SPIE (1998) 8. Chai, D., Ngan, K.N.: Face segmentation using skin-color map in videophone application. IEEE Trans. CSVT 9, 551–564 (1999) 9. Shen, D., Horace, H.S.Ip.: Discriminative Wavelet Shape Descriptors for Recognition of 2-D Patterns. Pattern Recognition 32, 151–165 (1999) 10. Specht, D.F.: Probabilistic neural networks. Neural Networks 3, 109–118 (1991)
Efficient Topological Reconstruction for Medical Model Based on Mesh Simplification Chunxiang Dai, Ying Jiang, Qingxi Hu, Yuan Yao, and Hongfei Yang Rapid Manufacturing Engineering Center, Shanghai University, 200444 Shanghai, P.R. China
[email protected]
Abstract. Due to the complex and time-consuming of the reconstructed medical model, a novel method was proposed to rebuild an apparent and complete topological structure, which is necessary for mesh simplification based on a STL mode1. Firstly the sequence of rebuilding the topology information for various geometric elements was determined. Then, the Red-Black-Tree structure was conducted to get high efficiency for both vertex equivalence testing and element searching when deleting a large number of redundant vertexes. This structure can also handle disorderly and unsystematic data well. Finally, the relation of a vertex and faces was fully exploited to build the edge records, which are used in mesh simplification. This method can improve the mesh simplification procedure more effectively
.
1 Introduction Some medical software, such as treatment planning systems, virtual surgeries, often needs to make a reconstruction operation from various modes of medical images. So that patient’s information can be displayed visually to the doctors. The medical 3D model reconstructed is often represented as triangles mesh that consists of a geometric and a topological component. Because the data of medical images are huge and it’s indispensable to interpolate some date into isotropic data field, amounts of triangles will be brought to the model after reconstruction. Rendering such complex meshes at an interactive rate is a time consuming work. The generation of a surface model typically produces tens to hundreds of thousands of triangles. Sometimes the number will be millions. It has become restricted for practical application because the 3D rendering can take a significant amount of time to compute. Real-time manipulation is often impossible for the massive number of triangle patches must be rendered on today’s medical workstations. For example, a medical 3D model with above millions of triangles will cost several minutes. It can be suffered if the rebuilding is operated before the surgery .But some 3D rendering manipulations, such as refurbish, often cost a few seconds [1, 2]. It can’t be sustained for such operations are often implemented during the surgery, especially when the surgery requires a very harsh time. On the other hand, the huge masses of triangles will take up too much memory. It brings greater difficulties for the model analysis (e.g. collision detection, animation K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 526–535, 2007. © Springer-Verlag Berlin Heidelberg 2007
Efficient Topological Reconstruction for Medical Model Based on Mesh Simplification
527
deformation of the surface analysis, etc.)[10, 11, 12, 13], also increase the pressure on the transmission network. For these reasons, triangular facets should be reduced to a few hundred thousand or even ten thousand magnitude of the order. The rendering time should be reduced to one-tenth of a second or even less. This has lead to a great deal of interest in developing reducing the number of mesh elements in order to speed up the rendering process. However, STL format contains only a set of unstructured triangles (vertex positions and optional face normal vectors) without their adjacency information [3]. For the efficient processing of a mesh, topology construction is essential before many downstream applications so that the data of STL file can be organized efficiently to meet the time requirements. For a long time, the methods commonly used [4] to establish topology required for 2
O(3nk ) level on the time. N is the total surfaces of each model; K is the average number of inter-proximal surfaces for each vertex. Since the 3D scan data are great, the number of triangular surfaces can be as high as tens of thousands or even hundreds of thousands. It causes to tremendous amount of computation. Using half edge structure [16] can get O(n) time complexity under ideal circumstances. However, the method can only achieve the index of its components’ edges and its adjacent vertices’ faces when the face is known. In this paper, a simple and efficient-stored algorithm is presented for topological construction of triangle mesh. It is useful for STL file processing, especially for handling simplification on mesh with large numbers of triangles [4, 5, 6]. Section 2 defines the flow to construct STL model’s topological information. Overall procedures and detail steps are explained in section 3. Section 4 shows the implementation results, followed with conclusions in the last section.
2 Flow to Construct STL Model Topological Information Among various representations of geometric models, the triangular mesh model has been, mainly due to its simplicity, increasingly used in many fields such as graphics, CAD/CAM/CAE, RE (Reverse Engineering), and RP (Rapid Prototyping) [7, 8, 9, 11]. STL is the most popular format for exchanging triangular mesh models. It is also supported by many commercial geometric modeling applications. However, STL format contains only a set of unstructured triangles (vertex positions and optional face normal vectors) without their adjacency information [10]. For the efficient processing of a mesh, topological construction is essential to many downstream applications such as mesh simplification, mesh compression, multi-resolution mesh hierarchy creation, subdivision surface generation, tool path generation, collision checking, etc [10, 11, 12, 13]. It’s essential to complete the explicit topological information rapidly to meet requirements of mesh simplification. Making full use of the relation of the vertexes and triangles, the paper completes the topological establishment for medical STL model rapidly by effective search strategy. Fig.1 shows the flow for topological reconstruction.
528
C. Dai et al.
Reading file
Record of vertex and face
Foundation of edge
Vertex merging Edge merging
Adjacent realation
Mesh simplification
Fig. 1. Flow of topological reconstruction for STL
2.1 Topology of Vertexes STL file records three vertices of their coordinate information for the grid model of each triangular surface, so there are huge redundant data. For example, there are 39300 vertexes in one mesh model with 13100 triangles, but the vertexes not repeated are only 6556. Consequently, the repeatability of the vertexes must be judged for every triangle that is read from the STL file. Only the vertex that is not repeated can be added to the records of the vertexes as a new record. Meanwhile, a new record of the triangle is build using the index of the vertex. In this way, not only the records of the faces and vertexes themselves (If the vertex is not registered on the memory of the vertexes) but also the links between vertexes and triangles is built. With huge numbers of triangles of the medical models dealt with in this paper, the adjacency relationship of the vertexes and the links between vertexes and triangles is not established when reading files. But it’s set up when the edge is built to quicken the rendering speed of opening files. By reading all the triangles of the STL file, the records of vertexes and faces are built at the same time. 2.2 Topology of Edges and Faces The repeatability of the edges must be judged to establish the memory structure and topological information using the existing geometric and topological information. Three edges can be got from one triangle each time. By iterating all the faces, only the edges without repetition can be added to the records of the edges as a new record. Each edge is registered by its two indexes. Meanwhile, the adjacent triangles of the edge can be accessed by the index of the triangle. By iterating all the faces and edges, the adjacent relation of the elements can be got using the links of the vertexes, edges and faces after getting the record of edges.
Efficient Topological Reconstruction for Medical Model Based on Mesh Simplification
529
3 Data Structure and Algorithm A good data structure and algorithm should ensure the efficient search because the basic operation is the comparability of the geometric elements. The information of vertexes, edges and faces and their mutual visits and traversals should be considered based on mesh simplification. The following is requested to achieve the major topology visits: 1)If the face is known, its three vertices can be got; 2)If the face is known, its three edges can be got; 3)If the face is known, all its adjacent triangles can be got; 4)If the vertex is known, all its adjacent faces can be got: 5)If the vertex is known, all the adjacent vertexes can be got; 6)If the vertex is known, all its adjacent can be got; 7)If the edge is known, its two vertexes can be got; 8)If the edge is known, its neighbor triangles can be got. 3.1 Data Structure of the Geometric Elements During the process of simplification, the mesh topology is always changing. Moreover, most updating operations normally require searching the different geometric elements. So memory structure is the key factor for various elements. The elements must be accessed and searched conveniently Here is the structure of the vertex:
Class suVert { Vec3 m_pos; //position of the vertex (x, y, z) Vec3 m_pos2d; //2d position of the vertex (x, y) Vec3 m_norm; //normal of the vertex Varray
m_adjV; // Adjacent vertex Varray m_adjEdge; //Adjacent edge Varray m_adjFace; //Adjacent suFace } The structure of the edge and face is designed as following:
Class suEdge { int m_idx[2]; // two vertexes of the edge int m_fidx[2]; // adjacent suFace } Class suFace { UINT m_idx [3]; //3 points’ index UINT m_nidx [3]; //normal point int m_eref[3]; //index of three edge int m_adj[3]; //adjacent face } Besides, there is also a indispensable mesh class. It saves the records of all the elements in a container. The corresponding information can be accessed by their index conveniently.
530
C. Dai et al.
The mesh class is defined as following:
Class suMesh { varray<suVert> m_vert; varray<suEdge> m_edge; varray<suFace> m_face; suVert& GetV(int i); suEdge& GetE(int i); suFace& GetF(int i); int m_adj[3]; }
// all the vertexes // all the edges // all the triangles //get the vertex // get the edge // get the face //adjacent face
With the structure definition, the basic elements of vertex, edge and face are built. Fig.2 (a) shows the sketch map of the topological relation in this paper. Every face has three vertexes and three edges. Two triangles use a common edge. All the topological relations can be shown in this paper in order to visit conveniently. They can be accessed by their index. The famous software package OpenMesh[16] has used half structure to establish its topological structure. Its main elements are of half-edge. All the elements can be visited by the half edge. Fig.2 (b) shows the sketch map of the topological relation using half structure. Compared to the half structure of OpenMesh, the method in this paper is more efficient on memory because it can save time of searching and iterating. Here is an example to access the one-ring structure of a given vertex. For the typical one-ring structure in the mesh, the elements can only be got by the way of enumeration method in the OpenMesh. It is very complex and discommodious. In order to access the onering structure, the adjacent half edge of the current vertex should be searched first. Then the end vertex of the half edge will be looked for. At last, the adjacent half edge of its opposite is found. In this paper, we can access the elements by its index easily.
(a) Fig. 2. (a) Topological relation of vertex, edge and face
(b) (b) Half structure
3.2 Vertex Merging When the STL file is read, three vertexes can be got from one triangle each time. Only the vertex without repetition can be added to the records of the vertexes, which is taken as a new record. The confirmation process is to find the same points in the record of vertexes. If the simple list is chosen to store all the vertexes, the entire chain
Efficient Topological Reconstruction for Medical Model Based on Mesh Simplification
531
should be searched each time. The search for each triangle has to perform three times. Consequently, list is not a suitable choice for the vertexes. The huge amounts of data bring on that the geometric locations of adjacent vertices are very close. If we create Hash Functions only using the coordinates of the vertexes, conflicts are unavoidable. Binary search tree may appear the worst possible conditions in the creation process, then, the performance of the search is almost the same to the linear list [14, 15]. AVL trees are actually easier to implement than red-black trees because there are fewer cases. And AVL trees require O(1) rotations on an insertion, whereas red-black trees require O(log n). In practice, the speed of AVL trees versus red-black trees will depend on the data that are inserted. If the data is well distributed, so that an unbalanced binary tree would generally be acceptable (i.e. roughly in random order), but red-black trees will be faster to handle bad cases because they do less unnecessary rebalancing of already acceptable data[14,15]. In this paper, red-black tree is employed to store the vertexes and judge the repetition. Fig.3 illustrates a red-red violation where the uncle is colored black.
Fig. 3. Insertion - Red Parent, Black Uncle
Map class in the STL (Standard Template Library) is a typical application of the red-black tree [14, 15].Not only the vertex (suVert type) data but also the index of vertex is stored in the map container. As suVert is a user-defined class, a custom comparing operation is required to make use of finding function, which belongs to the map class. The geometric locations of adjacent vertices are very close, hence we should choose a threshold for the comparing operation. The vertex whose error is within the given threshold should be designated as the repetitive vertex. Otherwise it would be added into the record of the vertex as a new element. The vertex is also saved as one point of the current triangles. It is stored by the index. This operation can also be deemed as a primary simplification of the number of the mesh model. 3.3 Foundation of the Edge Three edges can be got from one triangle, and the edge is used by two triangles. It’s the same with the foundation of vertexes, the repeated edge must also be judged. Only the edges without repeation can be added to the edge records as a new record. Likewise, the map class is employed to build the structure. Not only the edge (suEdge
532
C. Dai et al.
type) data but the index of edge is stored in the map container. During the process of judging edge repetition, current triangle is added to the neighboring faces of the edge. The comparing operation is also used for the edge class (It’s defined as suEdge class). One edge with vertex v1 and v2 should be deemed as the same with the edge with v2 and v1. 3.4 Adjacent Relation The records of vertexes, edges and faces have been built so far. During the course of mesh simplification, a lot of elements and adjacent search are used frequently. Wherefore, establishing a perfect adjacent relation is absolutely necessary for the follow-up steps according to the existing records. By traversing faces and edges, the adjacent relation can be built easily.
)
1 Adjacent edges and vertexes of the vertex The neighboring edges and vertexes of the vertex can be built quickly based on the existing records of edge. When traversing the edge record, the current edge is the adjacent edge of its two vertexes. One of its vertexes is the adjacent vertex of the other one. 2 Adjacent faces of the vertex By traversing the face record, the current face is the adjacent face of its three edges.
)
So far we have built all the topological relation of three elements. All elements and their adjacent elements can be accessed conveniently by the index. The weight of every edge will be calculated by iterating edges in the first instance when mesh simplification is dealt. Then the edge collapse is disposed according to the weight of the edge. After collapsing the edge, lots of elements will be deleted and inserted [5, 6]. So the time-consuming searching and iterating of elements are a boring problem. In the paper, the classical Quadric Error Meric [6] is used to calculate the weight of edges for simplifying operation. In the algorithm proposed by Garland it defines the error of the vertex as the sum of squared distances to its relative planes. The idiographic computing method is detailed as follows: In three-dimensional space, one plane can be defined by equation: ax+by+cz+d=0, T is the where a2+b2+c2=1 .It can also be expressed by T = 0, =[ unit normal, is a constant. The squared distance of any vertex in the space to this plane is defined as the [formula below:
D 2 (v ) =
(ax + bx + cz + d ) 2 = (ax + bx + cz + d ) 2 = ( p T v) 2 = (v T p ) 2 (a 2 + b 2 + c 2 ) 2
(1)
The quadric error of any vertex on the mesh model is defined by the sum of squared distances to its relative planes:
Δ (v ) =
∑(p
p ∋ plane ( v )
T
v) 2
(2)
Efficient Topological Reconstruction for Medical Model Based on Mesh Simplification
533
The formula above can be rewritten as a quadratic form:
Δ (v) =
∑
v T ( pp T )v = v T (
p ∋ plane ( v )
∑K
p ∋ plane ( v )
P
)v
(3)
Where
⎡ a 2 ab ac ad ⎤ ⎢ ⎥ ab b 2 bc bd ⎥ T ⎢ K p = PP = ⎢ ac bc c 2 cd ⎥ ⎢ 2⎥ ⎣⎢ad bd cd d ⎦⎥ K p is the quadric error metrics of the plane
4 Implementation Results and Discussion The proposed algorithm was implemented in VC++ 7.1 on Windows XP platform, and tested on a PC with an Intel Pentium 4 processor (2.6 GHz) and 512MB of RAM. We have tested the impact with a numerous practical data from medical images. For example, 55 pieces of CT images are imposed to generate the skull 3D model using MIMICS software (from Belgium Materialise Company). 122992 triangles and 61186 vertexes can be got after 3D reconstruction. This model has too much triangles to be rendered effectively. Fig.4 (a) shows the skull model with 122992 triangles, Fig.4 (b) shows the skull model with 37090 triangles.
(a)
Fig. 4. (a) Mesh with 122992 triangles
(b)
(b) Mesh with 37090 triangles
A good data structure and algorithm should ensure the efficient search because the basic operation is the comparability of the geometric elements. Due to the structure of the red-black tree, the vertex and edge can be merged in O(log n) time. The adjacent topology can be built in O(n) time taking advantage of the records of vertex, face and edge. N is the number of corresponding faces or edges. In virtue of the red-black tree as the structure of the vertex, the time complexity of searching vertex is O(log n) (n is the number of vertex). The edge can be searched in O(log n) time since the edge also adopted the red-black tree. Table 1 shows the comparison of several commonly used topological algorithms.
534
C. Dai et al. Table 1. Comparison of topological algorithm search
arithmetic half structure this paper
Support complex index
vertex
edge
--
O(n)
no
O(log n)
yes
O(log n)
Table 2 shows the several selected examples of various data size after simplification using Quadric Error Meric [6]. Time of mesh simplification means the time that a model is collapsed to the model with fewer vertexes from original one. As can be seen from the results, the model quality of the resulting simplified meshes is very satisfactory in our experiments, while the display quality is retained. The 3D model with fewer triangles becomes fitter for further processing. This algorithm keeps the topology of the original model, which makes the algorithm more robust. Table 2. Results of Implementation after collapse using Quadric Error Meric Triangles number
Vertexes number
before
122992
61186
34.89
--
after
46852
23115
14.73
410
after
37090
18235
12.58
491
Model
Time of Mesh Generation (s)
Time of mesh simplification(s)
The smaller model which has been created in this way can be rendered much more quickly by the imaging workstation. The topology saves the time of search and iterating and optimizes the algorithm. The proposed method is simple and memory efficient. 3D model after collapse can conduct real-time 3D display even the cutting operating is greatly accelerated.
5 Conclusion and Future Work Complex polygonal mesh models are often generated from various applications with a huge amount of data, such as scientific and medical visualization systems, mechanical CAD systems, and automatic modeling systems. STL is the most popular format for exchanging triangular mesh models. It is supported by many commercial geometric modeling applications. However, STL format contains only a set of unstructured triangles without their adjacency information .How to efficiently reconstruct the topological structure with a large number of data is a bottle-neck problem. Based on the effective storage structure, the paper completes the topological establishment rapidly for medical STL model through effective search strategy. Experimental results demonstrated that the speed of pre-process is fast, and memory efficient with the proposed methods for mesh simplification. However, there is a plenty of space for further improvement. For example, smoothing is also absolutely necessary after collapsing in
Efficient Topological Reconstruction for Medical Model Based on Mesh Simplification
535
order to meet the visual requirement. How to resolve this issue is a challenging task for further study. Acknowledgements. The author thanks to Shanghai Municipal Education Committee’s Development Fund (No. 06AZ029) for its financial support in this research.
References 1. Qin, X.J.: Three dimensional reconstruction of medical imaging and visualization technique [D3]. Dalian University of Technology, LiaoNing (2001) 2. Louis Karat Picker International, Nuclear Medicine Division, Simplification of Triangle Meshes for Fast Surface Rendering of Tomography Data, pp. 1438-1442 3. Hattangady, N.V.: A fast topology manipulation algorithm for compaction of mesh/faceted models. Computer-Aided Design v30, 835–843 (1998) 4. Rock, S.J., Michael, J.W.: Generating topological information from a bucket of facet. In: Marcus, H.L. (ed.) Solid Freeform Fabrication Symposium Proceedings, The University of Texas at Austin, Austin, TX, USA, pp. 251–259 (1992) 5. Hoppe, H., Depose, T.: Duchamp Teta1. Mesh optimizations. In: Judd, R.L. (ed.) Reach M. SIGGRAPH’93 Ec. Anaheim, pp. 19–26. Addison-Wesley, Reading (1993) 6. Garland, M., Heckbert, P.S.: Surface Simplification using Quadric Error Metrics. In: Proc. SIGGRAPH’97, pp. 209–215 (1997) 7. Becket, E., Chillier, J.C., Troche, F.: Generation of a finite element MESH from stereo lithography (STL) files. Computer-Aided Design 34, 1–17 (2002) 8. Berg, M.D., Kreveld, M.V., Mars, M.O., Schwarzkopf, O.: Computational Geometry (Algorithms and Applications), 2nd edn. Springer, Heidelberg (1998) 9. Choir, B.K., Jerald, R.B.: Sculptured Surface Machining: Theory and Applications. Kluge (1998) 10. Jun, C.S., Kim, D.S., Park, S.: A new curve-based approach to polyhedral machining. Computer-Aided Design 34, 379–389 (2002) 11. Jun, C.S., Kim, D.S., Park, S.: A new curve-based approach to polyhedral machining. Computer-Aided Design 34, 379–389 (2002) 12. Levin, A.: Combined Subdivision Schemes, PhD Thesis, School of Mathematical Science, Tel-Aviv Univ. (2000) 13. El-Sane, J., Varshney, A.: Topology Simplification for Polygonal Virtual Environments. IEEE Trans. Visualization and Computer Graphics 4, 133–144 (1998) 14. Weimin, Y., Weimin, W.: Data Structure(c language edition). Tsinghua University Press, Beijing (1998) 15. Essential c++. Tsinghua University Press, Beijing (2001) 16. Borsch, M., Steinberg, S., Bischoff, S., Kobbelt, L.: OpenMesh - a generic and efficient polygon mesh data structure, http://www.cgal.org
Repetitive Motion Planning of Redundant Robots Based on LVI-Based Primal-Dual Neural Network and PUMA560 Example Yunong Zhang, Xuanjiao Lv, Zhonghua Li, and Zhi Yang Department of Electronics and Communication Engineering Sun Yat-Sen University, Guangzhou 510275, China [email protected]
Abstract. A primal-dual neural network based on linear variational inequalities (LVI) is presented in this paper, which is used to solve the repetitive motion planning of redundant robots. To do so, a drift-free criterion is exploited. In addition, the physical constraints such as joint limits and joint velocity limits are incorporated into the problem formulation of such a scheme. The scheme is finally reformulated as a quadratic programming (QP) problem and resolved at the velocity-level. Compared to other computational strategies on inverse kinematics, the LVI-based primal-dual neural network is designed based on the QP-LVI conversion and Karush-Kuhn-Tucker (KKT) conditions. With simple piecewiselinear dynamics and global (exponential) convergence to optimal solutions, it can handle general QP and linear programming (LP) problems in the same inverse-free manner. The repetitive motion planning scheme and the LVI-based primal-dual neural network are simulated based on PUMA560 robot manipulator with effectiveness demonstrated.
1
Introduction
A manipulator is said to be functionally redundant when more degrees of freedom (DOF) are available than the minimum number required to perform a specific end-effector primary task [1]. Our human arm, elephant trunk and snake are also such redundant systems [2]. The potential application capacity of a robot manipulator is determined by the DOF number. The end-effector motion is not performed accurately or even can not be fulfilled, if a manipulator does not have sufficient degrees of freedom. In recent years, robotic researchers have focused on solving a variety of tasks requiring sophisticated motion in complex environments, for instance, working in hazardous or rough-and-tumble environments, carrying radioactive materials or heavy objects, doing repetitive work, and exploring unpredictable regions such as blue water and outer space. Redundant manipulators have wider operational space and extra DOF to meet a number of functional constraints. One fundamental issue in operating such redundant manipulator systems is the inverse-kinematics problem [1]. The general description of this robotic problem K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 536–545, 2007. c Springer-Verlag Berlin Heidelberg 2007
Repetitive Motion Planning of Redundant Robots Based
537
is that, given the desired Cartesian trajectory r(t) ∈ Rm at the manipulator endeffector, how could we generate the corresponding joint trajectory θ(t) ∈ Rn in real time t? Note that m < n, and thus there exist multiple solutions to the inverse-kinematics problem. Contrarily, given the joint variable θ(t), then the location of the end-effector r(t) is determined uniquely and directly, which is the so-called forward kinematics [1]. By resolving the redundancy properly, robots can avoid obstacles, joint physical limits, as well as to optimize various secondary criteria [1][3][4]. However, for a redundant manipulator performing a specified end-effector task, multiple solutions (or even an infinite number of solutions) exist. In this sense, the redundancy of joint motion significantly complicates the manipulator control problem, in addition to the kinematic and dynamic nonlinearities. To make proper use of the redundancy, various computational schemes have thus been developed. The conventional solution of such an inverse-kinematics problem is the pseudoinverse-based formulation, i.e., one minimum-norm particular solution plus a homogeneous solution [5][6]. The research of recent ten years shows that the redundancy-resolution problem could be solved in a more favorable manner via online optimization techniques [1][4]. When the end-effector traces a closed path in its workspace, the joint angles may not return back to their initial ones after completing such an end-effector task. In other words, the path obtained in the joint space may not be a closed one. This is so called joint angle drift phenomenon, or repeatability problem. A dual neural network was applied to solving this problem, and computer-simulation results based on PA10 manipulator were presented in [7]. However, the dual neural network requires the online matrix inversion and thus is only able to handle strictly convex quadratic programs [8]. On the other hand, the LVI-based primal-dual neural network to be presented in this paper has the capability of handling general QP and LP problems in the same inverse-free manner, as it does not entail online matrix inversion. Applying the drift-free criterion to handling PUMA560 robot manipulator is first investigated in this paper. As shown via computer simulations, the PUMA560 joint-angle drift problem will be solved by using the LVI-based primal-dual neural network. The remainder of this paper is organized in four sections. The background information and problem formulation of the drift-free inverse-kinematics are discussed in Section 2 for physically constrained robot manipulators. An LVI-based primal-dual neural network is developed for the online drift-free redundancy resolution in Section 3. Section 4 presents computer-simulation results based on PUMA560 robot manipulator. Section 5 concludes this paper with final remarks.
2
Preliminaries and Problem Formulation
Given the forward-kinematics equation: r = f (θ), m
(1)
where r ∈ R is the position-and-orientation vector of the end-effector in the Cartesian space, θ ∈ Rn is the joint variable vector; and f (·) is a continuous
538
Y. Zhang et al.
nonlinear mapping function with a known structure and parameters for a given manipulator. For 6-DOF PUMA560 robot manipulator (n = 6), if we only consider the positioning of the end-effector, then m = 3 and the degrees-ofredundancy is n − m = 3. The nonlinearity enhances the difficulty of solving the kinematic redundancy [9]. Thus, by differentiating (1), we can get the linear relationship between Carte˙ sian velocity r˙ and the joint velocity θ: J(θ)θ˙ = r, ˙
(2)
where J(θ) ∈ Rm×n is the Jacobian matrix defined as J(θ) = ∂f (θ)/∂θ, and the J elements are functions of the joint configuration. In redundant manipulators, since m < n, equations (1) and (2) are under-determined and admit an infinite number of solutions. This, however, can be used to avoid obstacles, joint physical limits, singularity points, as well as to perform repetitive movement. 2.1
Drift-Free Inverse Kinematics
It is shown in [10] that the pseudoinverse-type solutions are generally not repeatable. In other words, a closed path of the end-effector does not yield a closed path in the joint space. Such joint angle drift will be undesirable for cyclic motion control. Certainly, we can readjust the manipulator’s configuration via manipulator self motion (i.e., to make the robot arm changed from one joint configuration to another expected one but without affecting the end-effector), but this would be inefficient. To make the kinematic control repeatable, the minimization of the joint displacement between the current state and the initial state was investigated in [7][11]. In their problem formulation, the performance index is 1 ˙ (θ + c)T (θ˙ + c) with c = λ(θ − θ(0)), 2 where λ is a positive design-parameter used to scale the magnitude of the manipulator response to such a joint displacement. Thus, we have the following basic problem formulation for repetitive motion planning of robot manipulators: 1 ˙ (θ + c)T (θ˙ + c), 2 subject to J(θ)θ˙ = r, ˙ c = λ(θ − θ(0)). minimize
It is worth mentioning that, when considering more subtask criteria and physical constraints, the inverse-kinematics problem becomes very time-consuming either by computing the pseudoinverse-type solution or numerically solving QP problems. The real-time computational requirement in sensor-based high-DOF robotic systems motivates the emergence of much more efficient parallel schemes to replace the numerical algorithms for online control of redundant manipulators.
Repetitive Motion Planning of Redundant Robots Based
2.2
539
Joint Limits Conversion
Because almost all manipulators are physically constrained by their joint limits and joint velocity limits, it is realistic and useful to consider the following problem formulation (which considers the drift-free criterion as well): minimize subject to
1 ˙ (θ + c)T (θ˙ + c), 2 J(θ)θ˙ = r, ˙ c = λ(θ − θ(0)), θ− θ θ+ , θ˙− θ˙ θ˙+ ,
(3) (4)
where superscripts + and − denote the upper and lower limits of a joint variable vector, respectively. As the redundancy is resolved at velocity level, the limited joint range [θ− , θ+ ] ˙ may be converted as follows into an expression based on joint velocity θ: μp (θ− − θ) θ˙ μp (θ+ − θ),
(5)
˙ The where the intensity coefficient μp > 0 is used to scale the feasible region of θ. ˙ choice of coefficient μp should make sure that the feasible region of θ converted by joint limits (3) is no less than the original one made by joint velocity limits (4). That is, μp is selected no less than max1in {(θ˙i+ − θ˙i− )/(θi+ − θi− )}. Note that large values of μp may cause quick joint deceleration when the manipulator approaches its limits. For computer simulations based on 6-DOF PUMA560 robot manipulator, the intensity coefficient μp is usually selected as 20. Equations (4) and (5) can thus be combined into the bound constraint ξ − ˙θ ξ + , where the ith elements of ξ − and ξ + are defined respectively as ξi− = max{θ˙i− , μp (θi− − θi )}, ξi+ = min{θ˙i+ , μp (θi+ − θi )}. 2.3
QP Problem Reformulation
Therefore, the above physically-constrained drift-free redundancy-resolution problem of robot manipulators can be reformulated as minimize subject to
1 ˙T ˙ ˙ θ W θ + cT θ, 2 J θ˙ = b, ξ − θ˙ ξ + ,
where coefficients W := I, c := λ(θ − θ(0)), b := r, ˙ and ξ ± have been defined before. To keep all joint variables within their mechanical range, it is straightforward and concise to use bound constraints.
540
Y. Zhang et al.
Furthermore, by defining the decision variable vector x = θ˙ ∈ Rn , the physicallyconstrained drift-free redundancy-resolution problem could finally be expressed as the following quadratic program: 1 T x W x + cT x, 2 subject to Jx = b, ξ− x ξ+, minimize
(6) (7) (8)
where coefficients have been defined previously.
3
An LVI-Based Primal-Dual Neural Network
As we know, the dual neural network approach entails online inversion of W . To overcome this weakness and to improve the efficacy of the online solution of QP problem (6)-(8), in this section, we present a primal-dual neural network as the QP real-time solver, which is designed based on linear variational inequalities (LVI) [12]-[15]. By the dual theory [16], for the primal QP problem (6)-(8), its dual QP problem can be derived with the assistance of dual decision variables. The dual decision variable is usually defined as the Lagrangian multiplier for each constraint such as (7) and (8). Furthermore, in order to reduce the QP-solver complexity, we use an elegant treatment to cancel the dual decision variables for bound constraint (8). That is, we only need to define the corresponding dual decision vector y ∈ Rm for equality constraint (7). Then, the primal-dual decision variable u and its bounds u± are defined respectively as + − ξ ξ x (9) u= , u+ = , u− = ∈ Rn+m , 1v −1v y where 1v := [1, · · · , 1]T denotes an appropriately-dimensioned vector composed of ones, and 0 is sufficiently large to represent +∞ for simulation and/or implementation purposes. The convex set Ω is defined as Ω = {u ∈ Rn+m |u− u u+ }. By defining coefficient matrix M and vector q respectively as W −J T c M= , q= ∈ Rn+m , J 0 −b we have the following theorem (with proof omitted due to space limitation). Theorem 1. Quadratic program (6)-(8) can be converted to an LVI problem; i.e., to find a primal-dual equilibrium vector u∗ ∈ Ω such that (u − u∗ )T (M u∗ + q) 0, ∀u ∈ Ω. It is known that the above LVI problem is equivalent to the following system of piecewise-linear equations [7][13]-[18]: PΩ (u − (M u + q)) − u = 0,
(10)
Repetitive Motion Planning of Redundant Robots Based
541
0.4
θ1 θ2 θ3 θ4 θ5 θ6
Z
End-effector trajectory 0.3
0.8 0.6
0.2
Final state Wrist trajectory
0.4
0.1
Elbow trajectory
0.2
0
0
Initial state −0.2 0.4
−0.1
Base Shoulder trajectory 0.2
0.5 0.4
0
Y
−0.2
0.3 0.2
−0.2
0.1 −0.4
0
time t (s)
X −0.3
0
1
2
3
4
5
6
7
8
9
10
Fig. 1. PUMA560 does not return back to the initial state after its end-effector completing a circular-path task, which is the so-called joint angle drift problem
where PΩ (·) is a projection operator onto Ω, which is defined as PΩ = [PΩ (u1 ), PΩ (u2 ), · · · , PΩ (un+m )]T with ⎧ − − ⎪ ⎨ui , if ui < ui − PΩ (ui ) = ui , if ui ui u+ i , ∀i ∈ {1, 2, · · · , n + m}. ⎪ ⎩ + + ui , if ui > ui To solve (10), guided by the design-experience on dynamic-system solvers [7][14] [15][19], we could adopt the following dynamics for the LVI-based primal-dual neural network (LVI-PDNN) [which is the real-time solver for QP (6)-(8)]: u˙ = γ(I + M T ){PΩ (u − (M u + q)) − u},
(11)
where γ is a positive design-parameter used to scale the convergence rate of the neural network. Note that by the definition of u in (9), being the first n elements of u˙ in (11), the joint acceleration θ¨ could also be explicitly generated for joint torque control [4]. Furthermore, we could have the following theoretical results on global exponential convergence of neural network (11). Theorem 2. With the existence of at least one optimal solution x∗ to QP (6)(8), starting from any initial state u(0), the state vector u(t) of LVI-based primaldual neural network (11) is convergent to an equilibrium point u∗ , of which the first n elements constitute the optimal solution x∗ to QP problem (6)-(8). Moreover, if there exists a constant > 0 such that u − PΩ (u − (M u + q))22 u − u∗ 22 , then the exponential convergence can be achieved for (11) with a convergence rate proportional to γ .
4
Simulation Studies
We performed computer simulations based on PA10 robot manipulator and dual neural network in [7], while, in this paper, the repetitive motion-planning scheme
542
Y. Zhang et al. 0.4
θ1 θ2 θ3 θ4 θ5 θ6
Z
End-effector trajectory 0.3
0.8 0.6
0.2
Wrist trajectory
0.4
0.1
Elbow trajectory
0.2
0
0
Base
−0.2 0.4
−0.1
Shoulder trajectory
0.2
0.5 −0.2
0.4
0
0.3
Y
0.2
−0.2
time t (s)
X
0.1 −0.4
−0.3
0
0
1
2
3
4
5
6
7
8
9
4
x 10
eX eY eZ
0.1 3 0.05
2
0
1
−0.05
0
θ˙1 θ˙2 θ˙3 θ˙4 θ˙5 θ˙6
−0.1
−0.15
−0.2
−1
−2
−3
time t (s) −0.25
10
−8
0.15
0
1
2
3
4
5
6
7
8
9
time t (s) 10
−4
0
1
2
3
4
5
6
7
8
9
10
Fig. 2. PUMA560 motion trajectories and transients when its end-effector moving along a circle, which is synthesized by LVI-based primal-dual neural network (11)
is simulated based on PUMA560 robot manipulator and LVI-based primal-dual neural network (11). In the simulation of this paper, the following limited joint ranges of PUMA560 robot manipulator are used: [−2.984, 2.984], [−3.378, 0.807], [−0.974, 3.378], [−2.064, 3.190], [−1.877, 0.038], [−3.378, 3.378] in radian respectively for θ1 through θ6 . In addition, the limited range for joint velocity θ˙ is [−1.5, 1.5]6 in radian-per-second. 4.1
Circular-Motion Example
In this subsection, firstly, we show the inverse-kinematics results of PUMA560 robot arm without considering drift-free criterion (6). That is, with the drift-free coefficient λ := 0, the presented LVI-based primal-dual neural network (11) is applied to the control of PUMA560 robot arm. The design parameter γ := 108 . The desired motion of the PUMA560 end-effector is a circular path with radius 0.1m and the revolute angle about x-axis being π/6rad. The motion task time is 10 seconds, and the initial joint variables θ(0) = [0, 0, 0, 0, 0, 0]T in radian. Fig. 1 illustrates the motion trajectories and joint-variable transients of the PUMA560 manipulator when its end-effector moving along a circle in the 3-dimensional workspace. It shows that there is no joint variable exceeding its mechanical range, because all the physical constraints have been considered. However, the solution
Repetitive Motion Planning of Redundant Robots Based
543
Z
2.5
0.8
End-effector trajectory
θ1 θ2 θ3 θ4 θ5 θ6
2
0.6
1.5
0.4 1
Elbow trajectory
0.2 0
0.5
Wrist trajectory
0
−0.2 1
Base 0.5
Y
0.6
Shoulder trajectory
0.4
−0.5
0.2
0
0 −0.2 −0.5
−0.4
X
time t (s) −1
0
2
4
6
8
10
12
14
16
18
20
Fig. 3. PUMA560 motion trajectories and transients when its end-effector moving forwards and then backwards along a straight-line path, which is synthesized by driftfree criterion (6) and LVI-based primal-dual neural network (11)
is not repeatable in the sense that the final state of the PUMA560 manipulator does not coincide with its initial state. In mathematics, θ3 (10) = θ3 (0), θ4 (10) = θ4 (0) and θ5 (10) = θ5 (0). Hence an additional self-motion readjustment may be needed if cyclic motion control is pursued. This would certainly be inefficient and unwanted in repetitive path-following tasks. For comparison, being the second computer simulation, the drift-free criterion (6) with λ := 4 is exploited. For the same circular-path following task, the LVIbased primal-dual neural network (11) is then applied again to the control of PUMA560 robot arm. As see from Fig. 2, all joint variables have been kept within their mechanical ranges, and the solution is repeatable in the sense that the initial state and final state of PUMA560 manipulator are equal. Fig. 2 also shows that the maximal positioning error is less than 4 × 10−5 mm. Here, eX , eY and eZ denote the components of the positioning error e(t) := r(t) − f (θ(t)) respectively along the X, Y and Z axes in the base frame of the robot system. The circular-path following experiments demonstrate the capability of LVIbased primal-dual neural network (11) for online resolving the drift-free redundancy problem of physically constrained PUMA560 robot manipulators. 4.2
Straight-Line Example
In this subsection, the PUMA560 manipulator is expected to move forwards and then backwards along a straight line, e.g., in a repetitive spot-welding task. The straight line of length 2.25m, at every motion cycle, starts from the PAUMA560 initial state θ(0) = [0, 0, 0, 0, 0, 0]T rad, and shall finally return back to the initial state. Angles of the desired straight line making with XY, YZ and ZX planes are respectively 0rad, π/4rad and π/4rad. The duration of the path-following task at every motion cycle is 20 seconds. The drift-free criterion (6) and LVI-based primal-dual neural network (11) are then applied to the control of physically constrained PUMA560 manipulator. In this straight-line example, the design parameters of LVI-based primal-dual
544
Y. Zhang et al.
neural network (11) are selected the same as those of the circular-path example. As shown in Fig. 3, the final state of the PUMA560 manipulator coincides with its initial state. In addition, all the joint variables have remained within their limited ranges. Similar to the circular-path following example, this straight-line path following example also demonstrates the efficacy of the drift-free problem formulation (6)-(8) and its online LVI-PDNN solution (11) on the real-time redundancy resolution of physically-constrained robot manipulators.
5
Concluding Remarks
In this paper, as illustrated via PUMA560 examples, we have solved the joint angle drift problem of physical-constrained redundant manipulators. The solution is based on a quadratic-programming problem formulation and an LVI-based primal-dual neural network as the real-time solver. As bound constraint (8) has been elegantly cast into Ω, such an LVI-based primal-dual network is composed of only n + m neurons. The network architecture or computational complexity is thus simpler than other recurrent neural networks’. This is also in view of its piecewise-linear dynamics. Moreover, the neural network is shown to be globally exponentially convergent to optimal solutions. Simulation results based on PUMA560 robot manipulator have demonstrated the efficacy of the drift-free problem formulation (6)-(8) and its neural-network solver (11) on the real-time inverse-kinematics control of joint-constrained redundant manipulators. Future research may lie in circuit/hardware implementation or discrete-time models of LVI-based primal-dual neural network (11). Acknowledgements. This work is funded by National Science Foundation of China under Grant 60643004 and by the Science and Technology Office of Sun Yat-Sen University. Before joining the Sun Yat-Sen University in 2006, the corresponding author, Yunong Zhang, had been with National University of Ireland, University of Strathclyde, National University of Singapore, Chinese University of Hong Kong, since 1999. He has continued the line of this research, supported by various research fellowships/assistantship. His web-page is now available at http://www.ee.sysu.edu.cn/teacher/detail.asp?sn=129.
References 1. Zhang, Y.: Analysis and Design of Recurrent Neural Networks and Their Applications to Control and Robotic Systems. Ph.D. Thesis, Chinese University of Hong Kong (2002) 2. Latash, M.L.: Control of Human Movement. Human Kinematics Publisher, Chicago (1993) 3. Zhang, Y., Wang, J., Xu, Y.: A Dual Neural Network for Bi-Criteria Kinematic Control of Redundant Manipulators. IEEE Transanctions on Robotics and Automation 18, 923–931 (2002)
Repetitive Motion Planning of Redundant Robots Based
545
4. Zhang, Y., Ge, S.S., Lee, T.H.: A Unified Quadratic Programming Based on Dynamical System Approach to Joint Torque Optimization of Physically Constrained Redundant Manipulators. IEEE Transanctions on Systems, Man, and Cybernetics 34, 2126–2132 (2004) 5. Liegeois, A.: Automatic Supervisory Control of the Configuration and Behavior of Multibody Mechanisms. IEEE Transactions on Systems, Man, and Cybernetics 7, 868–871 (1977) 6. Khatib, O., Bowling, A.: Optimization of the Inertial and Acceleration Characteristics of Manipulators. Proceedings of IEEE International Conference on Robotics and Automation 4, 2883–2889 (1996) 7. Zhang, Y., Wang, J., Xia, Y.: A Dual Neural Network for Redundancy Resolution of Kinematically Redundant Manipulators Subject to Joint Limits and Joint Velocity Limits. IEEE Transactions on Neural Networks 14, 658–667 (2003) 8. Zhang, Y., Wang, J.: A Dual Neural Network for Convex Quadratic Programming Subject to Linear Equation and Inequality Constraints. Physics Letters A 298, 271–278 (2002) 9. De Luca, A., Lanari, L., Oriolo, G.: Control of Redundant Robots on Cyclic Trajectories. In: Proceeding of IEEE International Conference on Robotics and Automation, vol. 1, pp. 500–506 (1992) 10. Klein, C., Huang, C.: Review of Pseudoinverse Control for Use with Kinematically Redundant Manipulators. IEEE Transactions on Systems, Man, and Cybernetics 13, 245–250 (1983) 11. Cheng, F.T., Chen, T.H., Sun, Y.Y.: Resolving Manipulator Redundancy Under Inequality Constraints. IEEE Journal of Robotics and Automation 10, 65–71 (1994) 12. Kinderlehrer, D., Stampcchia, G.: An Introduction to Variational Inequalities and their Applications. Acedemic, New York (1980) 13. Ferris, M.C., Pang, J.S.: Complementarity and Variational Problems: State of the Art. SIAM, Philadephia, Pennsylvania (1997) 14. Xia, Y., Wang, J.: A Recurrent Neural Network for Solving Linear Projection Equations. Neural Networks 13, 337–350 (2000) 15. He, B., Yang, H.: A Nual Network Model for Monotone Linear Asymmetric Variational Inequalities. IEEE Transactions on Neural Networks 11, 3–16 (2000) 16. Bazarra, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming – Theory and Algorithms. Wiley, New York (1993) 17. Li, W., Sweits, J.: A New Algorithm for Solving Strictly Convex Quadratic Programs. SIAM Journal on Optimization 7, 595–619 (1997) 18. Zhang, Y., Wang, J.: A Dual Neural Network for Constrained Joint Torque Optimization of Kinematically Redundant Manipulators. IEEE Transactions on Systems, Man, and Cybernetics 32, 654–662 (2002) 19. Bertsekas, D.P.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs, New Jersey (1989)
Tensile Test to Ensure a Safety of Cannula Connection in Clinical Ventricular Assist Device (VAD) Takashi Tanaka, Tomohiro Shima, Masateru Furusato, Yuma Kokuzawa, Kazuhiko Ito, Kiyotaka Iwasaki, Yi Qian, and Mitsuo Umezu Integrative Bioscience and Biomedical Engineering, Waseda University Graduate School 58-322, 3-4-1 Okubo, Shinjuku, Tokyo, 169-8555, Japan [email protected]
#
Abstract. A pneumatic driven ventricular assist device (VAD) is not only designed for short-term usage of a “bridge to recovery” (BTR) in cardiac function recovery, but also developed to apply as a “bridge to transplantation” (BTT) for cardiac transplant. However, in the latter, the VAD must be exchanged before its expiries of guarantee. In this research, the authors have investigated the connector’s strength between connector and cannula, and safety during the VAD exchange.
1 Introduction As a lifesaving tool, the VAD is widely applied for seriously heart failure patient. It had been reported that there were approximately 700 cases in Japan from 1980 to 2004 (Fig.1) [1]. Thoratec, 20 BVS5000, 61 HeartM ate, 25 Novacor, 22
ZEM EX, 158
Toyobo, 411
Fig. 1. Number of case of the VAD in Japan
As shown in Fig.2, almost 60% of VAD used in Japanese hospital is Toyobo VAD. Since XMEX VAD was discontinued by the VAD manufacture, Toyobo VAD has become only one Japanese made pneumatic driven VAD [2]. K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 546–552, 2007. © Springer-Verlag Berlin Heidelberg 2007
Tensile Test to Ensure a Safety of Cannula Connection
547
Acute
(a)Toyobo
(b)BVS5000
Chronic
(c)EVEHEART
(d)Norvacor
(e)Jarvik 2000
Fig. 2. The VAD used in Japan
As shown in Fig. 3, the inlet and outlet of Toyobo VAD were designed in parallel. The tip edge of connector is made by stainless steel. Cannula is made by polyvinyl chloride and the inside of cannula surface which blood passes through was coated using biocompatible material; Segmented polyurethane (TM-3). Cannula
Cable tie
Stainless steel connector Fig. 3. The VAD (Toyobo)
Considering the shortage of organs donations in Japan, comparison with shortterm heart failure recovery (BTR), the VAD is going to be increased to apply in longterm treatments (BTT) to wait for heart transplantation (Fig. 4) [3][4].
548
T. Tanaka et al.
11 1 2 2
3 20
1 4
5
8
1 month
2 months
3 months
4 months
5 months
6 months
7 months
8 months
11 months
12 months
14 months
20 months
10
Fig. 4. Number of pumps about each duration of usage (Toyobo type)
The Toyobo VAD was provided one-month critical safety guarantee period. There is a critical danger for thrombus formation in overusing the VAD. Therefore, it is necessary to exchange the VAD periodically. During 2003~2005, the Toyobo VAD was swapped by 150 times among 100 clinical cases; it indicates averagely 1.5 times of exchange per patient was carried out the VAD exchanging. Fig.5 gives a summary of the number of VAD application, patient and swapping VAD. 300
Number of VAD
250 200
new
150
exchange total
100 50 0 2003
2004
2005
total
Year Fig. 5. Number of usage in Toyobo VAD
The VAD exchange process is leased as follow: (1) To clamp the cannula (2) To cut the cable tie by which the cannula was fixed (3) To detach VAD from connectors (4) To cut cannula at tip edge
Tensile Test to Ensure a Safety of Cannula Connection
549
(5) To insert a new VAD connector into the cannula (6) To fix the cannula by the cable tie In above exchanging process, original inlet and outlet cannulae are remained with patients, only the VAD and connectors have been renewed. Therefore, the new contacting surfaces between existed cannula and new connectors will be one of the most important factors for risk analysis. In this study, the authors performed a series of tension tests to measure the load when the cannula was torn away from the connector.
2 Method Tension testing instrument “Autograph” (Shimazu AG-25TB) (Fig.6) was used in the tension tests. The maximum strain force was measured at a constant speed of 25 mm/min under the static load. Figure 7 shows the components used in the experiment. The test tube (Fig. 8) was 45 mm length which cut from a clinical cannula.
1170 mm (a) Connector
(c) Cable Tie
2412 mm
(d) Cable Tie Tool
(b) Cannula
Fig. 7. Materials used in tensile test 45 mm Fig. 6. Testing Instrument
17.5 mm O.D. Fig. 8. Test tube
The stainless steel connector of the VAD was set at the up and down side of a chuck, and the chuck is installed to be available a change angle by an adapter (Fig.9).
550
T. Tanaka et al.
At the up side of the chuck the connector was fixed in with stainless-steel wire, and the down side of the chuck was fasten the same as clinical condition using cable tie (Fig.10). Test parameters are listed as follows: a)A pulling angle 0°,15°,30°,45°,60° b)Clamping force NON: no cable tie MIN: 6kgf INT: 8kgf STD: 13.5 kgf c)Situation of connector-cannula No-water (dry) Connect with water (wet)
30 mm upper side
under side
Fig. 9. Chucking parts
Tension
45mm
Pulling angle
A B
Fig. 10. Tensile Test
3 Results Classic detached load curve are shown in Figure 11. (1) At first, the tube was stretched by a linear function. (2) The maximum detached load was measured at [A] position in Fig. 10 when first cable tie pulled out. (3) The second cable tie was left, and the tube was pull out from the connector (Figure 10 [B] position). As for any condition when the tensile force approached its maximum load, the cable tie fixed at the tip of connector started to leave from the connector. Right after then, the second cable tie was pulled out (B) and the tube was tendentiously fallen out more rapidly to compare with the first tube broke.
Tensile Test to Ensure a Safety of Cannula Connection
551
Tensile load kgf
50
②
40 30
①
20
dry STD dry NON wet STD
③
10 0 0
10
20 30 Displacement mm
40
Fig. 11. Three steps of tensile test ( 0 degree)
Figure 12 shows a comparison of the cannula detached load curve under various straining angle. Detached tensile load is observed to increase against the rising of the straining angle (except 0 and 5). The results indicated that the cannula is easily pulled out at straining force at axial direction or a small angle to compare with large angle.
Tensile load kgf
25 20 15 10 5 0 0
5
15 30 angle to pull degree
45
60
Fig. 12. Comparison of detached load under different tensile angle
Figure 13 provides the results of detached tensile force at NON, MIN, INT, and STD conditions under dry and wet surface conditions. Although the detached tensile force of the dry conditions are completely higher than the results of wet surface condition, both cases of detached tensile force increases against the clamping force and connecting strength. The authors understand from Fig. 13, detached strength depends on the connecting surface condition; the detached strength of the ‘wet’ condition is 50% lower than that
T. Tanaka et al.
Tensile load kgf
552
50 45 40 35 30 25 20 15 10 5 0
NON MIN INT STD
dry wet Situation of connector-cannula Fig. 13. Difference between maximum tensile loads in tightening strength
obtained at the condition ‘dry’ condition. The results indicate that it has to be very careful during swapping VAD while cannula is full by saline.
4 Conclusions In order to recognize a critical safe detached load during the swapping VAD, the authors investigated the detached tensile load between connector and cannula. The results have designated that water was stuck on the surface of connector and cannula just after VAD was renewed. Besides, the contact strength of ‘wet’ surface is about 50% down to compare with the ‘dry’ surface. The authors would like to advise both nursing and patients to take enough attention during the VAD renewal. Acknowledgments. This research was organized by Biomedical Engineering Research on Advanced Medical Treatment, Advanced Research Institute for Science and Engineering, Waseda University (05P29), and it was financially supported by Health Science Research Grants from the Ministry of Health, Labor and Welfare, Japan (H17-F-003).
References 1. Matsumiya, G.: Artificial Heart (Clinical ). J. Artificial Organs 33(3), 167–170 2. Takano, H., Nakatani, T.: Ventricular assist system in Japan with Toyobo pump and Zeon pump. Am. Thorac. Surg. 55, 317–322 (1996) 3. Matsuda, H., Matsumiya, G.: Current status of left ventricular assist devices the role in bridge to heart transplantation and future perspectives. J. Artif. Organs 6, 157–161 (2003) 4. Muller, J., et al.: Weaning from mechanical cardiac support in patients with idiopathic dilated cardiomyopathy. Circulation 96, 542–548 (1997)
A Reproduction of Inflow Restriction in the Mock Circulatory System to Evaluate a Hydrodynamic Performance of a Ventricular Assist Device in Practical Conditions Masateru Furusato1, Tomohiro Shima1, Yuma Kokuzawa1, Kazuhiko Ito1, Takashi Tanaka1, Kiyotaka Iwasaki1, Yi Qian1, Mitsuo Umezu1, ZhiKun Yan2, and Ling Zhu2 1
Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Waseda University, #58-322, Waseda University 3-4-1, Okubo Shinjyuku, Tokyo169-8555, Japan [email protected] 2 Zhejiang Provincial Hospital, 158, Shangtang road, Hangzhou, 310014, P.R. China Abstract. A novel in vitro mock circulatory system, which enables to reproduce an inflow restriction, simulating blood volume pooling due to heart failure, was developed to evaluate a hydrodynamic performance of a pulsatile ventricular assist device (VAD) in practical conditions. The concept of this development was motivated by a difference of an inflow restriction between in vitro and in vivo environments. The major idea of this study is to reproduce an inflow restriction by using a centrifugal pump placed at an inflow side of a left ventricular model instead of a constant head reservoir in a conventional circuit. In the novel circuit, the maximum flow rate was obtained at lower systolic fraction as compared with a conventional circuit. This similar tendency by the novel one was observed in an acute animal experiment in sheep. This result suggests that a new mock circuit is effective to confirm a practical drive strategy of the VAD for various diseased conditions.
1 Introduction Animal experiment is necessary towards a clinical application of artificial hearts, but number of animal experiments must be reduced because of both ethic and cost problems. Therefore, the aim of this research is to develop a novel in vitro mock circulatory system which can be used to confirm an effective drive condition of ventricular assist device (VAD) prior to animal experiments.
2 Animal Experiment 2.1 Aim of the Animal Experiment and Experimental Conditions An acute animal experiment was performed to compare the hydrodynamic performance of VAD between in vitro and in vivo experiments. Test animal was a sheep K. Li et al. (Eds.): LSMS 2007, LNBI 4689, pp. 553–558, 2007. © Springer-Verlag Berlin Heidelberg 2007
554
M. Furusato et al.
(40kg in weight), and the experiment was performed at Zhejiang Provincial Hospital, Hangzhou, China. Spiral Vortex Pump (SV Pump), shown in Fig.1, was employed for the experiment as a LVAD. This SV Pump, which was originally invented in Australia, was redesigned for mass production process at Waseda University [1] [2]. An inlet cannula, shown in Fig.2 (a), was used in the acute experiment, but in the middle of the experiment, the length of cannula was shorten by 190mm from 450mm to 260mm, as shown in Fig.2 (b). Then, changes in the effect of the cutting on hydrodynamics were compared. Another end of the inlet cannula was inserted into the apex as shown in Fig.3.
Fig. 1. Spiral Vortex Pump
φ10mm Cuf
φ12mm (a) Original inlet cannula: L=450mm
(b) Cutting portion of the inlet cannula (L=190mm), to shorten the length
Fig. 2. Two types of inlet cannulae
Inlet Cannulae Inlet Cannulae
(a) Original long inlet cannula was used: L=450mm
(b) Short Cannula was used: L=260mm
Fig. 3. Implantation of the VAD in animal experiment
A Reproduction of Inflow Restriction in the Mock Circulatory System
555
2.2 Results of Animal Experiments One of the results in the acute animal experiment is shown in Fig.4, as a relationship between systolic fraction and flow rate under two different lengths of inlet cannula. The data in Fig.4 are summarized as follows: 1) As an increase of the drive pressure, while diastolic vacuum pressure was kept constant, higher pump flow was ejected. This is recognized as a general tendency of hydrodynamic characteristics in pneumatically-driven pulsatile pump with moving diaphragm. 2) Maximum pump flow was recorded at the systolic fraction of around 25%, when the original long inlet cannula was used. 3) On the other hand, pump flow was remarkably increased by 50%, when short inlet cannula was used, and at the maximum flow condition, the systolic fraction value was shifted from 25% to 35%. This is also recognized as a general tendency: Under low inlet impedance, higher pump flow is produced at higher systolic fraction. Short
Original
Mean flow rate of SV Pump L/min
㪤㪼㪸㫅㩷㪽㫃㫆㫎㩷㫉㪸㫋㪼㩷㫆㪽㩷㪪㪭㩷㪧㫌㫄㫇㪣㪆㫄㫀㫅
㪊
Short cannula
㪉㪅㪌 㪉
㪈㪍㪇㪆㪄㪋㪇 㪉㪇㪇㪆㪄㪋㪇 㪉㪋㪇㪆㪄㪋㪇 㪉㪇㪇㪆㪄㪋㪇 㪉㪋㪇㪆㪄㪋㪇
㪈㪅㪌
Original cannula
㪈 㪇㪅㪌 㪇 㪇
㪈㪇
㪉㪇
㪊㪇
㪋㪇
㪌㪇
Systolic 㪪㫐㫊㫋㫆㫃㫀㪺fraction 㪽㫉㪸㪺㫋㫀㫆㫅 % 㩼 Fig. 4. Relationship between systolic fraction and flow rate of the SV Pump under different drive condition and/or inlet cannula length
3 Development of a Novel in Vitro Mock Circulatory System 3.1 Aim of the Development and Special Features A novel in vitro mock circulatory system, which enables to reproduce hydrodynamics of the acute animal experiment, was originally designed and developed. The concept of this system is to simulate a blood volume pooling condition due to heart failure. The circuit is mainly composed of three silicon models: left atrium (LA), left ventricle (LV), and aorta as shown in Fig.5. Inflow restriction from the pulmonary circulation was reproduced by a centrifugal pump placed at an inflow side of the left ventricle model. Capacity of left ventricle model is 110ml, and left atrium is 50ml, respectively, the volume of which was determined by human anatomical data [3]. The relationship between revolution number and pump flow of the centrifugal pump is shown
556
M. Furusato et al.
in Table1, but in the practical usage in the mock system, pump flow is determined by a pressure difference between venous reservoir pressure and the artificial LA pressure. The SV Pump as a left VAD is installed between the apex of the artificial LV and artificial aorta, simulating a similar installation of the animal experiment. There are three electromagnetic flow meter probes to measure BF (Bypass flow) at the inlet of the SV Pump, CO (Cardiac output) at the outlet of the LV model, and TF (Total flow: CO + BF) at the mitral position between LA and LV. Venous reservoir
Peripheral resistance Aortic compliance CO Left atrium
Centrifugal pump Left ventricle
TF =CO+BF
BF SV Pump
VCT-50
Drive console Fig. 5. In vitro test circuit; centrifugal pump was installed between venous reservoir and the artificial left atrium Table 1. Relationship between revolution number and flow in the centrifugal pump rpm of centrifugal pump rpm
Restricted flow L/min
1200
1
1420
2
1650
3
1920
4
3.2 Test Results and Discussion Typical hydrodynamics for three inflow rate conditions of the SV Pump is shown in Fig.6. The result exhibited the same tendency observed in the acute animal
A Reproduction of Inflow Restriction in the Mock Circulatory System
557
Pump flow rate L/min
experiment: The maximum flow rate was shifted at lower systolic fraction from 40 to 30%, as a decrease of inflow rate. Fig.7 is a photograph of the normal (left) and collapse (right) LV taken under the insufficient inflow condition when the high stroke volume (50ml) was preset in the IABP drive console. Due to an excess diastolic vacuum pressure, oscillation of inflow waveforms was observed, whereas the oscillation was disappeared by a reduction of the setting stroke volume from 50 to 40ml. These waveform changes were indicated in Fig.8. Therefore, it was suggested that a small change in the setting stroke
in 6 m /L et 5 ar 4 w olf 3 na e2 m p1 m uP0
2.5L/min
0
10
3.75L/min
5.0L/min
20 30 40 50 Systolic fraction %
60
70
Fig. 6. Relationship between systolic fraction and pump flow at the drive condition of 240/50mmHg under different inflow restricted conditions
Inlet cannula
Inlet cannula
Apex
Apex SV Pump SV Pump
a) Normal
b) Collapse
Fig. 7. Collapse of the left ventricle under insufficient inflow condition at the stroke volume of 50ml, driven by IABP console
558
M. Furusato et al.
Preset stroke volume: 40ml 50ml
15
SV Pump inlet flow waveform L/min
tea r w olf te ln i p m u P V S
10
ni m /L 5 0 0
0.5
1
1.5
2
-5 Time sec
Fig. 8. Inlet waveforms of the SV Pump driven by the IABP console at the stroke volume of 40ml and 50ml
volume and/or adjustment of the diastolic vacuum pressure should be conducted to achieve a non-oscillation of ejection/filling state. This procedure will be effective for the optimal circulatory assist with LVAD, including an avoidance of a peripheral blood pooling phenomenon. As the conventional mock circulatory system is never simulated above restricted inflow phenomenon, the novel mock circuit might be useful to confirm a practical usage of the VAD drive methodology for several diseased circulatory conditions prior to animal experiment or clinical application. Acknowledgements. This research was organized by “Biomedical Engineering Research on Advanced Medical Treatment” at Advanced Research Institute for Science and Engineering, Waseda University (05P29), and it was financially supported by Health Science Research Grants from the Ministry of Health, Labor and Welfare, Japan (H17-F-003).
References 1. Umezu, M., Nugent, A.H., Ye, C.X., Germon, G., Pittelkow, K., Aitchison, F., Nakamura, T., Chang, V.P.: Flow characterisitics of an Australian-made ventricular assist device-effects of diaphragm material. In: Proc. of the 7th Int. Conf. on Biomed Engng., pp. 600–604 (1990) 2. Iwasaki, K., Umezu, M., Ye, C.X., Nugent, A.H., Nakamura, T., Arita, M., Qian, Y., Tanaka, T., Imachi, K., Ishihara, K., Chang, V.P.: Spiral vortex ventricular assist device: improvement by engineering analysis and simulation. In: Asian simulation Conf. the 5th Int. Conf. on System Simulation and Scientific Computing (Shanghai), November 3-6, pp. 1035–1039 (2002) 3. Hamada, M.: Dilatative cardiomyopathy. Iyaku-Journal, Inc, Tokyo, pp.159-160 (2000) (in Japanese)
Author Index
Ai, Dongmei Ban, Xiaojuan Bing, Zhigang
78 78 492
Cai, Chao-Feng 470 Callaghan, Vic 67 Chen, Jianxin 129 Chen, Jing 129 Chen, Luonan 21 Chen, Xian 197 Chen, Yanxi 111 Cheng, YongMei 172 Cui, Shigang 492 Dai, Chunxiang 97, 526 Davies, Marc 67 Ding, Fan 208 Ding, Yongsheng 28, 227 Ding, Zuquan 111 Dong, Feng 264 Fan, Yubo 104 Fang, Minglun 97, 136, 146 Fang, Xiaoyong 208 Fei, Minrui 58 Feng, David 370 Feng, Dong 254 Feng, Hailin 453 Feng, Yong 422 Fukui, Koichi 292 Furusato, Masateru 429, 546, 553 Gan, Tianhong 507 Ge, Bo 326 Gormley, Padhraig 480 Guan, Zequn 507 Hao, Ying 462 Harada, Tetsuji 292 He, Dongping 406 Hou, Jianrong 364 Hu, Liangjian 28 Hu, Qingxi 97, 136, 146, 390, 445, 526 Huang, Dan 364
Igarashi, Toshihiro 429 Irwin, George W. 480 Ito, Kazuhiko 429, 546, 553 Iwasaki, Kiyotaka 429, 546, 553 Ji, Yubin 436 Jiang, Lijun 317 Jiang, Ying 526 Ju, Luyue 399 Ju, Peijun 501 Kang, Haifeng 218 Kim, Jong-Min 274 Kokuzawa, Yuma 429, 546, 553 Kong, Jun 218 Lai, Jiyu 380 Lee, Yick Kuen 246 Lee, Ying Ying 246 Li, Bo 120 Li, Deyu 104 Li, JunHui 172 Li, Kang 8, 480 Li, Shaobin 335 Li, Shejiao 120 Li, Shouju 414 Li, Shutao 162 Li, Xiaoou 8 Li, Xiaoyang 46 Li, Xiumin 254 Li, Ying 1, 21 Li, Zhiguo 422 Li, Zhihua 307 Li, Zhonghua 536 Lian, Zhengguang 492 Liang, Pei-Ji 470 Liao, Chen 162 Lin, Guoxiang 406 Lin, Hongji 516 Lin, Liulan 136, 146 Lin, Wenhan 436 Liu, Guocai 501 Liu, Hongde 180 Liu, Xia 399 Liu, Yihui 188
560
Author Index
Liu, Yingxi 414 Liu, Zengrong 1, 21 Liu, Zhen 284 Liu, Zhihua 153 Luo, Zhigang 208 Lv, Xuanjiao 536 Ma, Shiwei 307 Ma, Zhiqiang 218 Miao, Zuohua 344 Murayama, Yuichi 292 Ni, Xia 38 Ning, Shurong 78 Nishinaka, Tomohiro Niu, Haijun 104 Niu, Wenxin 111 Pu, Fang
429
104
Qi, Miao 218 Qian, Yi 292, 546, 553 Qin, Chaoyong 380 Shang, Tao 236 Shao, Chenxi 335, 453 Shen, Liping 67 Shi, Jing 78 Shim, JeongYon 300 Shima, Tomohiro 429, 546, 553 Shu, Yunxing 326 Si, Wenjie 264 Sun, Xiao 153, 180 Sun, Xiuzhen 414 Takao, Hiroyuki Tanaka, Takashi Tian, Li 501 Tian, Ran 436 Tong, Aili 136, Tu, Chengyuan
146 46
Umezu, Mitsuo
88, 292, 429, 546, 553
Wang, Wang, Wang, Wang, Wang, Wang, Wang,
292 429, 546, 553
Jiang 254, 264 Jie 129 Jingzhou 406 Jizhe 414 Qing 104 Shuhua 218 Shuoyu 236
Wang, Xianhua 344 Wang, Xiuying 370 Wang, Zhenghua 208 Wei, Lisheng 58 Wen, Guihua 317 Wen, Jun 317 Wu, Bin 38 Wu, Hongtao 180 Wu, Jiansheng 180 Wu, Minghui 197 Wu, Yizhi 227 Wu, ZhongFu 422 Xi, Guangcheng 129 Xing, Yanwei 129 Xu, Hong 344 Xu, Hongan 227 Xu, Min 28 Xu, Xuelian 492 Yan, ZhiKun 553 Yang, Banghua 307 Yang, Hongfei 445, 526 Yang, HuiFeng 172 Yang, Taicheng 58 Yang, Yunfeng 111 Yang, Zhi 536 Yao, Yuan 136, 390, 445, 526 Ye, Zhengchun 516 Ying, Jing 197 You, Yan 406 Yu, Guangrong 111 Yuan, Bo 208 Yuan, Feng 111 Yun, Shiwei 326 Zeng, Yanjun 46 Zhang, Fan 120 Zhang, Hongying 38 Zhang, Huicun 136, 146 Zhang, Jianbao 1, 21 Zhang, Luwen 354 Zhang, Ping 46 Zhang, Pu-Ming 470 Zhang, Quan 390 Zhang, ShaoWu 172 Zhang, Shengnan 236 Zhang, Wei 501 Zhang, Wu 354 Zhang, Xubing 507
Author Index Zhang, YunLong 172 Zhang, Yunong 536 Zhao, Chunxiao 462 Zhao, Hui 364 Zhao, Li 492 Zheng, Jianguo 380 Zheng, Xiao 406 Zheng, Yongping 104 Zhong, Gaojian 399
Zhong, Jiang 422 Zhong, Limei 264 Zhong, Ning 462 Zhou, GuoPing 172 Zhou, Haoyan 180 Zhou, Jiaqian 111 Zhu, Fanwei 197 Zhu, Ling 553
561