Lecture Notes in Electrical Engineering
For further volumes: http://www.springer.com/series/7818
Sio-Iong Ao Mahyar Amouzegar Burghard B. Rieger l
Editors
Intelligent Automation and Systems Engineering
Editors Sio-Iong Ao International Association of Engineers Unit 1, 1/F, 37-39 Hung To Road Hong Kong, China
[email protected]
Mahyar Amouzegar College of Engineering Cal Poly Pomona California State University 3801 West Temple Avenue Pomona, California 91768, USA
Burghard B. Rieger Universita¨t Trier Trier, Germany
ISSN 1876-1100 e-ISSN 1876-1119 ISBN 978-1-4614-0372-2 e-ISBN 978-1-4614-0373-9 DOI 10.1007/978-1-4614-0373-9 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2011933226 # Springer Science+Business Media, LLC 2011 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
A large international conference on Advances in Intelligent Automation and Systems Engineering was held at UC Berkeley, California, USA, October 20–22, 2010, under the auspices of the World Congress on Engineering and Computer Science (WCECS 2010). WCECS is organized by the International Association of Engineers (IAENG). IAENG is a non-profit organization dedicated to engineers and computer scientists, which was founded in 1968 and has been undergoing rapid expansion in recent years. The WCECS conferences have served as excellent venues for the engineering community to meet with each other and to exchange and share ideas in diverse areas of engineering and computer science. Moreover, WCECS continues to strike a well balance between theoretical research and application development. WCECS conference committees are formed with over two hundred members, who are mainly research center directors, deans, department heads (chairs), professors, and research scientists from over thirty countries. The full committee list is available at WCECS web site (www.iaeng.org/WCECS2010/committee. html). The conference participants were truly international, representing high levels of research and development from many countries. The response and the quality of papers in 2010 was excellent, we received more than seven hundred manuscripts, and after a thorough peer review process, 54.94% of the papers were accepted. This volume contains thirty-two revised and extended research articles written by prominent researchers participating in the conference. Topics covered include Expert system, Intelligent decision making, Knowledge-based systems, Knowledge extraction, Data analysis tools, Computational biology, Optimization, Experiment designs, Complex system identification, Computational modeling, and industrial applications. The book offers the state of the art research and development in intelligent automation and systems engineering. It also serves as an excellent reference book for researchers and graduate students working on intelligent automation and systems engineering. Sio-Iong Ao Mahyar Amouzegar Burghard B. Rieger v
Contents
1
2
Evidence Fusion for Real Time Click Fraud Detection and Prevention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chamila Walgampaya, Mehmed Kantardzic, and Roman Yampolskiy Biped Locomotion with Arm Swing, Based on Truncated Fourier Series and Bees Algorithm Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . Ebrahim Yazdi, Vahid Azizi, and Abolfazle T. Haghighat
1
15
3
Bio-Inspired Pneumatic Muscle Actuated Robotic System . . . . . . . . . . . Andrea Deaconescu and Tudor Deaconescu
4
Evaluation of an Orthopedic Surgical Robotic System Orthoroby on Bone Cadaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yasin Gu¨ven and Duygun Erol Barkana
41
Natural Intelligence Based Automatic Knowledge Discovery for Medical Practitioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Veenu Mangat
53
Design and Optimal Control of a Linear Electromechanical Actuator for Motion Platforms with Six Degrees of Freedom . . . . . . . . Evzen Thoendel
65
5
6
7
Parametric Identification of a Power-System Emulator . . . . . . . . . . . . . . Rube´n Salas-Cabrera, Oscar Martı´nez-Herna´ndez, Julio C. Rosas-Caro, Jonathan C. Mayo-Maldonado, E. Nacu´ Salas-Cabrera, Aaron Gonza´lez-Rodrı´guez, Hermenegildo Cisneros-Villegas, Rafael Castillo-Gutierrez, Gregorio Hernndez-Palmer, and Rodolfo Castillo-Ibarra
8
Enhanced Adaptive Controller Applied to SMA Wire Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Samah A.M. Ghanem
27
79
93
vii
viii
9
10
Contents
A Standalone System to Train and Evaluate Operators of Power Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jose´ Tavira-Mondrago´n, Rogelio Martı´nez-Ramı´rez, Fernando Jime´nez-Fraustro, Roni Orozco-Martı´nez, and Rafael Cruz-Cruz Lube Oil Systems Models for Real Time Execution Used on a Training Power Plant Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . Edgardo J. Rolda´n-Villasana, Yadira Mendoza-Alegrı´a, and Iva´n F. Galindo-Garcı´a
107
121
11
Stochastic Analysis and Particle Filtering of the Volatility . . . . . . . . . Hana Baili
12
Numerical Simulation of Finite Slider Bearings Using Control Volume Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mobolaji Humphrey Oladeinde and John Ajokpaoghene Akpobi
151
Automated Production Planning and Scheduling System for Composite Component Manufacture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mei Zhongyi, Muhammad Younus, and Liu Yongjin
161
Assessment Scenarios of Virtual Prototypes of Mining Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teodor Winkler and Jarosław Tokarczyk
175
A Framework for the Selection of Logistic Service Provider Using Fuzzy Delphi and Fuzzy Topsis . . . . . . . . . . . . . . . . . . . . . Rajesh Gupta, Anish Sachdeva, and Arvind Bhardwaj
189
13
14
15
16
17
18
19
Bee Algorithm for Solving Yield Optimization Problem for Hard Disk Drive Component under Budget and Supplier’s Rating Constraints and Hueristic Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wuthichai Wongthatsanekorn and Nuntana Matheekrieangkrai Integrated Modeling of Functions and Requirements in Product Design and Factory Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.P. Politze, J.P. Bathelt, K. Wegener, and D.H. Bergsjo¨ Towards a Strategy to Fight the Computer Science (Cs) Declining Phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marcela Porta, Katherine Maillet, Marta Mas, and Carmen Martinez Development of Power Plant Simulators and Their Application in an Operators Training Center . . . . . . . . . . . . . . . . . . . . . . . Jose´ Tavira-Mondrago´n and Rafael Cruz-Cruz
137
203
217
231
243
Contents
20
21
22
23
24
25
26
27
28
29
ix
Benefits of Unstructured Data for Industrial Quality Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian Ha¨nig, Martin Schierle, and Daniel Trabold
257
Evaluation the Objectivity Measurement of Frequent Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Phi-Khu Nguyen and Thanh-Trung Nguyen
271
Change-Point Detection Based on SSA in Precipitation Time Series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naoki Itoh and Ju¨rgen Kurths
285
Non-linear Image Recovery from a Single Frame Super Resolution Using Pearson Type VII Density . . . . . . . . . . . . . . . . . . . . . . . . . Sakinah Ali Pitchay
295
Research on the Building Method of Domain Lexicon Combining Association Rules and Improved TF*IDF . . . . . . . . . . . . . . Shouning Qu and Simon Xu
309
Combining Multiscale and Multi Directional Analysis for Edge Detection Using a Statistical Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Padma Vasavi, E.V. Krishna Rao, M. Madhavi Latha, and N. Udaya Kumar
325
Soft Vector Quantization with Inverse Power-Function Distributions for Machine Learning Applications . . . . . . . . . . . . . . . . . . . Mohamed Attia, Abdulaziz Almazyad, Mohamed El-Mahallawy, Mohamed Al-Badrashiny, and Waleed Nazih Anfis-Based P300 Rhythm Detection Using Wavelet Feature Extraction on Blind Source Separated Eeg Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan Manuel Ramirez-Cortes, Vicente Alarcon-Aquino, Gerardo Rosas-Cholula, Pilar Gomez-Gil, and Jorge Escamilla-Ambrosio A Hyperbola-Pair Based Road Detection System for Autonomous Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Othman O. Khalifa, Imran M. Khan, and Abdulhakam A.M. Assidiq Application of the Real-Time Concurrent Constraint Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Gerardo and M. Sarria
339
353
367
379
x
30
Contents
Teaching and Learning Routing Protocols Using Visual Educational Tools: The Case of EIGRP . . . . . . . . . . . . . . . . . . . . . . Jesu´s Expo´sito Marquina, Valentina Trujillo Di Biase, and Eric Gamess
31
Relevance Features Selection for Intrusion Detection . . . . . . . . . . . . . . . Adetunmbi Adebayo Olusola, Oladele S. Adeola, and Oladuni Abosede Daramola
32
A Visual Application for Teaching and Learning the Advanced Concepts of the Diffusing Update Algorithm for EIGRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Valentina Trujillo Di Biase, Jesu´s Expo´sito Marquina, and Eric Gamess
393
407
419
Chapter 1
Evidence Fusion for Real Time Click Fraud Detection and Prevention Chamila Walgampaya, Mehmed Kantardzic, and Roman Yampolskiy
1.1
Introduction
Web search is a fundamental technology for navigating the Internet and it provides access to information for millions of users per day. Internet search engine companies, such as Google, Yahoo, and Bing have revolutionized not only the use of the Internet by individuals but also the way businesses advertise to consumers (Immorlica et al. 2005; Mahdian 2006; Walgampaya et al. 2010). Typical search engine queries are short and reveal a great deal of information about user preferences. This gives search engine companies a unique opportunity to display highly targeted ads to the user. These search services are expensive to maintain and depend upon advertisement revenue to remain free (Mahdian 2006) for the end user. Many search service companies such as Google, Yahoo and MSN generate advertisement revenue by selling clicks. This business model is known as Pay-Per-Click (PPC) model. In the PPC model, internet content providers are paid for each time an advertisement link on their website is clicked leading to the sponsoring company’s content. There is an incentive for dishonest service providers to inflate the number of clicks their sites generate. In addition, dishonest advertisers tend to simulate clicks on the advertisements of their competitors to deplete their advertising budgets (Metwally et al. 2007). Generation of such invalid clicks either by humans or software with the intension to make money or deplete competitor’s budget is known as click fraud (CF). The diversity of CF attack types makes it hard for a single counter measure to attain desired results. Therefore, it becomes one of the new hot spots in research how to combine multiple data sources with multiple measures to provide the PPC
C. Walgampaya (*) Computer Engineering and Computer Science Department, University of Louisville, Louisville, KY 40292, USA e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_1, # Springer Science+Business Media, LLC 2011
1
2
C. Walgampaya et al.
system with more effective protection from CF. A real time click fraud detection and prevention system based on multi-model and multi-level data fusion is proposed in this paper. Each independent component can be considered as an invisible data mining module, in which “smart” software incorporates data mining into its functional components, often unbeknownst to the user (Han and Kamber 2006). Evidence for CF from multiple models are “fused” in this system, using Dempster-Shafer evidence theory (Shafer 1976), so that it achieves improved accuracy for detecting fraudulent traffic. Conversely it increases the quality of clicks reaching advertisers’ websites. This paper is organized as follows. In Sect. 1.2 an introduction to the multisensor data fusion and Dempster-Shafer evidence theory is given. Related work, where Dempster-Shafer theory was used as the fusion mechanism, is presented in Sect. 1.3. In Sect. 1.4 the fusion architecture of the CCFDP system is described, while a case study is explained in Sect. 1.5. Experimental results and discussion are given in Sect. 1.6. Conclusions are given in Sec. 1.7.
1.2
Multi-Sensor Data Fusion with Dempster-Shafer Evidence Theory
Data fusion is “a process dealing with the association, correlation, and combination of data and information from single and multiple sources to achieve refined position and identity estimates, and complete and timely assessments of situations and threats, and their significance” (Lambert 2009). The resulting information is more satisfactory to the user when fusion is performed than simply delivering the raw data (Wald 2001). Different fusion methods are discussed in literature, such as statistical estimation (Durrant-Whyte 1987; Hager et al. 1993), Kalman filter (Yukun et al. 2007), fuzzy integration (Solaiman et al. 1999), neutral networks (Dai and Khorram 1999), D-S evidence theory (Wu et al. 2002) and so on. Of these fusion methods, D-S evidence theory is widely known for better handling uncertainties. Moreover, it provides flexible information processing and can deal with asynchronous information (Ouyang et al. 2008). In the following section, terminology of theory of evidence and the notation used in this paper are defined.
1.2.1
Frame of Discernment
If Y denotes the set of yN ðyN 2 YÞcorresponding to N identifiable objects, let Y ¼ y1 ; y2 ; :::; yN be a frame of discernment. The power set of Y is the set containing all 2N possible subsets of Y, represented by PðYÞ ¼ ff; fy1 g; fy2 g; :::; fyN g; fy1 ; y2 g; fy1 ; y3 g; :::; Ygwhere f denotes the null set.
1 Evidence Fusion for Real Time Click Fraud Detection and Prevention
1.2.2
3
Basic Probability Assignment Function (BPA)
The BPA is a primitive of evidence theory. The BPA, represented by m, defines a mapping of the power set to the interval between 0 and 1, where the BPA of the null set is 0 and the summation of the BPA’s of all the subsets of the power set is1. The value of the BPA for a given set A, represented as mðAÞ, expresses the proportion of all relevant and available evidence that supports the claim that a particular element of Y belongs to the set A but to no particular subset ofA. The elements of PðYÞ that have none-zero mass are called focal elements. Formally, this description of m can be represented with the following three equations: m : PðYÞ ) ½0; 1 X A2PðYÞ
mðAÞ ¼ 1
mðfÞ ¼ 0
1.2.3
(1.1) (1.2) (1.3)
Belief Function Bel(A)
Given a BPA m, a belief function Bel is defined as: X BelðAÞ ¼ mðBÞ BA
(1.4)
The belief function BelðAÞ measures the total amount of probability that must be distributed among the elements of A.
1.2.4
Combination of Rule of Evidence m(c)
Supposed m1 and m2 are two mass functions formed based on information obtained from two different information sources in the same frame of discernment; according to Dempster’s orthogonal rule we define mðCÞ ¼ ðm1 m2 ÞðCÞ If C ¼ f ðm1 m2 ÞðCÞ ¼ 0
Else P ðm1 m2 ÞðCÞ ¼
A\B¼C
m1 ðAÞm2 ðBÞ 1K
(1.5)
4
C. Walgampaya et al.
Where K represents basic probability mass associated with conflict defined as: K¼
X
m1 ðAÞm2 ðBÞ<1
(1.6)
A\B6¼’
In our system, evidence supports a click to either be valid or invalid. Therefore it becomes a two class problem. Accordingly we have modified the calculation of mðCÞ for the CCFDP system (NetMosaics 2009). For a two class problem, we can simplify the equation for combination of evidence to: Q
S¼ Q i¼1;n
ri Q ri þ ð1 ri Þ i¼1;n
(1.7)
i¼1;n
Where ri is the output from each model and n is the number of models.
1.3
Related Work
The Dempster-Shafer theory of evidence reasoning (D-S theory) has been widely discussed and used, because it is a reasonable, convenient, and promising method to combine uncertain information from disparate sources with different levels of abstraction. Carvalho et al. (2003) proposed a general Data Fusion Architecture (DFA) based on Unified Modeling language (UML) and using a taxonomy based on the definitions of raw data and variables or tasks. Their DFA can be reconfigured according to the measured environment and availability of the sensing units or data sources, providing a graceful degradation in the view of the environment as resources change. Clerentin et al. (2000) has applied the D-S theory to study the cooperation between two omni-directional perception systems for mobile robot localization. In this paper, an absolute localization paradigm based on the cooperation of an omni-directional version system composed of a conical mirror and a CCD camera and a low cost panoramic range finder system is reported. Authors presented the absolute localization method that uses three matching criteria fused by the combination rules of the D-S theory. Distributed databases allow us to integrate data from different sources which have not previously been combined. The Dempster-Shafer theory of evidence and evidential reasoning are particularly well suited to the integration of distributed databases. Cai et al. have studied the suitability of evidential functions to represent evidence from different sources. They have carried out evidential reasoning by the well-known orthogonal sum method (Cai et al. 2000). In their article, Janez et al. (2000) present a strategy to report in automatic way significant changes on a map by
1 Evidence Fusion for Real Time Click Fraud Detection and Prevention
5
fusion of recent images in various spectral bands. They have shown that D-S theory as a more suitable formalism for configurations of partial overlapping between map and images, which may be difficult or even impossible to formalize the approach suggested within a probability framework. Tian et al. have described the use of D-S evidence theory and its data fusion technology in their intrusion detection model. This model merges alerts from different intrusion detection systems, makes intelligent inference by applying D-S evidence theory, and estimates the current security situation according to the fusion result (Tian et al. 2005). The military typically operates in demanding, dynamic, semi-structured and large-scale environments. This reality makes it difficult to detect, track, recognize/classify, and response to all entities within the volume of interest, thus increasing the risk of late response to the ones that pose actual threat. Benaskeur et al. proposed an Adaptive Data Fusion and Sensor Management information gathering and fusion process by automatically allocating, controlling, and coordinating the sensing and the processing resources to meet mission requirements (Benaskeur and Rheaume 2007).
1.4
Fusion of Evidences of Click Fraud in the CCFDP System
The collaborative click fraud detection and prevention (CCFDP) system was developed to collect data about each click, involving the data fusion between client side log and server side log (Ge and Kantardzic 2006). In CCFDP there are three modules that contribute to the process of finding fraudulent clicks. They are rule based module, click map module, and outlier detection module. In each of these modules, output is a probabilistic measure of evidence for the click being fraudulent. Authors have discussed the functionality of each of these modules in detail before (Kantardzic et al. 2008, 2009). In addition, CCFDP maintains an online fraudulent database of suspicious sources of clicks in terms of IP, referrer, country etc. When the score of an IP or a country etc. reaches a predefine threshold value the CCFDP system moves it to the online fraudulent database and inform the service providers with the instructions to block future traffic originating from these sources. Scores for each parameter are updated after a click found suspicious based on the combined evidences of the modules that we mentioned above. The model-driven fusion process of CCFDP is depicted in Fig. 1.1. Real-time data feeds from three sources ðk ¼ 3Þ: server side, client side, and extended context of the click ðS1 ; S2 ; S3 Þ. This is represented by sensors in Fig. 1.1. In the data preprocessing stage we standardize (align) the input data (Waltz 1998). The concept of alignment is an integral part of the fusion process, and assumes “common language” between the inputs and includes the standardization of measurement
Sensor data S2
Sensor data Sk
DM model 1
Output model 1
DM model 2
Output model 2
DM model m
Output model n
Decision Making
Input Sources
Sensor data S1
DS Evidence Fusion
C. Walgampaya et al.
Data Preprocessing
6
Fig. 1.1 Model-driven fusion process of CCFDP
units. The scores from the rules based module (DM model 1), outlier detection module (DM model 2), and click map module (DM model 3) are then combined using D-S evidence theory at the decision level ðm ¼ 3Þ. The combination of scores will be used to dynamically adjust advertising profiles in such a way that low quality sources of traffic will no longer be shown advertisements.
1.5
A Case Study
In this section, we demonstrate the application of D-S evidence theory to combine evidences of sources. Evidence 1: Repeated clicks from IP during past minute detected by the rule based module ðm1 Þ . Evidence 2: Java Script is allowed in the browser detected by the rule based module ðm2 Þ . Evidence 3: Country Morocco is detected suspicious by outlier module ðm3 Þ. In the two classes Fraud is represented by fFg and non-Fraud is represented by fNg. Let Y ¼ fF; Ng, we define the power set and Basic Probability assignments as follows: PðYÞ ¼ ff; fFg; fNgg m1 ðfÞ ¼ 0; m1 ðfFgÞ ¼ 0:6; m1 ðfNgÞ ¼ 0:4 m2 ðfÞ ¼ 0; m2 ðfFgÞ ¼ 0:5; m2 ðfNgÞ ¼ 0:5 m3 ðfÞ ¼ 0; m3 ðfFgÞ ¼ 0:7; m3 ðfNgÞ ¼ 0:3
1 Evidence Fusion for Real Time Click Fraud Detection and Prevention Table 1.1 Fusion of evidence 1 and evidence 2 fFg0:5 ðM1 M2 Þ fFg0:6 fFg0:3 fNg0:4 F 0:2
Table 1.2 Fusion of evidence 1, 2 and evidence 3 fFg0:7 ðM1 M2 M3 Þ fFg0:3 fFg0:21 F 0:3 F 0:21 F 0:2 F 0:14 fNg0:2 F 0:14
1.5.1
7
fNg0:5 F 0:3 fNg0:2
fNg0:3 F 0:09 F 0:09 F 0:06 fNg0:06
Calculation of M1 M2
For the convenience we use the fusion tables, introduced by Shafer in (Shafer 1976), to show the calculations. Fusion tables are given in Table 1.1 and 1.2. Using (1.6): K ¼ 0:5 0:6 þ 0:4 0:5 ¼ 0:5 P New Belief function, Belm1 m2 ðfFgÞ ¼ mðBÞ ¼ 0 þ 0:6 ¼ 0:6 BfFg
1.5.2
Calculation of M1 M2 M3
Using (1.6): K ¼ 0:09 þ 0:21 þ 0:09 þ 0:14 þ 0:06 þ 0:14 ¼ 0:78 P New Belief function, Belm1 m2 m3 ðfFgÞ ¼ mðBÞ ¼ 0 þ 0:78 ¼ 0:78 BfFg
In this example we considered the local suspicious scores of 0.6, 0.5, and 0.7. D-S evidence theory is used to find the final evidence. The belief that the click is fraudulent is 0.78.
1.6
Experimental Results and Discussion
The real time version of CCFDP is now available online at http://www.netmosaics. com. All of our experiments use click data from Hosting.com and thebestmusicsites.org websites. The process was started on January 7th, 2007 and is still in collecting data. As of June 30th, 2008 we have collected around 1,400,000 natural and 25,000 paid click data. Initial version of CCFDP was designed using only a rule based system. The new CCFDP has outlier module and the click map module in addition to an improved rule based system with additional click context information. Experiments are
8
C. Walgampaya et al.
Table 1.3 Top IP and country counts Top IP count Top country count
Top referrer count
IP 71.235.26.170 68.88.239.191 136.165.67.74 199.231.146.254 89.139.234.179 203.162.3.146 170.20.96.116 71.193.114.12 74.133.47.66 74.192.144.103
Referrer No referrer www.r1.com www.r2.com www.r3.com www.r4.com www.r5.com www.r6.com www.r7.com www.r8.com www.r9.com
Count 122 112 94 86 82 80 80 72 68 68
Country US IN CA GB NULL MX AU TR BR PH
Count 19,784 1,278 856 666 574 544 534 518 506 456
Count 8,800 4,568 2,192 604 546 538 510 450 420 414
performed on both old and new versions of CCFDP. In this research, initial experiments are conducted to observe and compare the changes in the scores of parameters such as IP, country, and referrer in both systems. After all paid click data has been processed we have selected the top 10 IPs, countries, and referrers with the highest fraudulent scores to see if the fusion process has any effect on updating individual scores of these parameters. Table 1.3 lists the IPs, countries, and referrers that have the highest fraudulent scores respectively. The results are slightly modified to protect privacy of some publisher websites. For example the actual domain names and referrer names are replaced with dummy identifiers. In Fig. 1.2 (top) the variation of scores for IPs are depicted. Except for one IP address (136.165.67.74) all others have higher fraudulent scores after combining the evidences from all the modules. In the rule based system, evidence is collected by considering only the changes detected in a limited neighborhood. For example with only the rule based system, it will be difficult to detect a Bot associated to a particular IP which sends http requests in the time intervals greater than 15 min. But with the outlier detection module that covers larger neighborhood one of the clicks, the pattern becomes observable. Once a suspicious activity is detected this evidence will contribute to increase of corresponding partial scores in the CCFDP system. IP address with higher scores has increased probability of being blacklisted sooner. Once the IP addresses are on the blacklist the search provider will be notified to eliminate future traffic from the corresponding sources. This will improve the quality of the traffic redirected to the advertiser’s website. One of the biggest advantages of using a multi-model system in CCFDP is its ability to cover wider area in the time domain. While the rule based module deals with events within couple of minutes of each other the outlier detection module handles events in a 24 h window. Figure 1.2 (center) shows the final scores of top ten countries from which we have received most of the traffic. With the rule based module alone we were unable to detect patterns and variations in the time axis. Therefore almost all countries have a score less than 0.1, which implies clicks from these countries are not suspicious at all. But with the outlier module, which keeps
1 Evidence Fusion for Real Time Click Fraud Detection and Prevention
Fig. 1.2 Variation of IP score (top), country score (center), and referrer score (bottom)
9
10
C. Walgampaya et al. Multi modal fusion
Click Count
Rule based
12000 I
II
III
IV
10000 8000 6000 4000 2000 0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Click Score
Fig. 1.3 Score distribution
track of traffic for extended period of time, we were able to detect abnormal traffic from most of the countries. For example some of these countries send traffic only during certain hours of the day. A similar behavior is observed with the top referrers of traffic to hosting.com site. Figure 1.2 (bottom) shows the variation of scores of top ten referrers. All these referrers appear normal when they are evaluated only with the rule based system. But when they are evaluated together with click map module and the outlier detection module referrer scores were drastically increased. Some of these referrers are from outside the US. When the countries suspicion score increases so does the scores of associated referrers. For example we mentioned in the above example that certain countries send traffic only in certain hours of the day. When we include the click context it is observed that most of these referrers are associated with those countries. This behavior will be very hard to detect if we are using only the rule based score. In traditional system (rule based) country and referrer did not influence on the score almost at all. Inclusion of additional modules make country score and referrer score become much more sensitive. For example the new system includes country parameter in 73% of clicks from US in the final score. Figure 1.3 shows the distribution of final scores for all the clicks with two versions of CCFDP. The lighter graph (L) corresponds to the first version of CCFDP where only rule based module was used. The darker one (D) is the new version with multiple modules. Area I represents most of the valid clicks. This corresponds to the records with attributes which do not have presence in the fraudulent database and all key attributes satisfies the requirements defined in the algorithm to be a legitimate click. The percentage of traffic present in Area I with system L is much higher than that of system D. With the inclusion of multiple models the suspiciousness of clicks has increased and the graph is shifted to the Area II with the system D, which is still in the safer region. Area III shows
1 Evidence Fusion for Real Time Click Fraud Detection and Prevention Table 1.4 Distribution of clicks in each region in Fig. 1.3 I II Rule based system 12,198 1 Multi model based system 4,197 4,650
11
III 3 3,817
IV 520 643
3000 2444
Click Count
2500 2000 1500
1036
1000 500
583 307 71
138
22
5
0 2/14/07-2/29/07
8/24/07-9/6/07
Google direct traffic
Google partner network traffic
valid traffic for Google direct
invalid traffic for Google direct
valid traffic for Google partner
invalid traffic for Google partner
Fig. 1.4 Improvement of quality of traffic
the suspected clicks. These are records with the attributes present in the fraudulent database or attributes that exceed certain threshold values. It can be clearly seen in the graph how the scores have increased after fusing multiple pieces of evidence from different modules. Area IV includes invalid clicks. Blocked traffic is identified as clicks with highly suspicious scores usually greater than 0.9. As shown in Table 1.4 with the traditional system (rule based system) we were able to block only 520 fraudulent clicks but with the multi model system it was 643, which is about 24% additional clicks. We believe that advertisers should not be billed for any of these clicks. We looked at the changes in quality of traffic after implementing the multimodel based CCFDP system. A summarized version is depicted in Fig. 1.4. The dataset was used in the rule based module alone and found that the average 53% of traffic is suspicious (Kantardzic et al. 2008). When running the outlier detection module alone on the dataset we discovered that about 34.6% of all clicks had one or more attributes that were found to have an outlying attribute-value pair count (Kantardzic et al. 2009). Clicks found to have an outlier will contribute evidence effecting partial scores. The CCFDP system will compute the final score measuring suspicion for each click. And with the multi-model system classified about 64% of paid traffic as fraudulent.
12
C. Walgampaya et al. Avg. Score of traffic
Avg. Quality of traffic
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Rule based module Outlier detection module Combined system alone alone
Fig. 1.5 Traffic analysis for Google
In addition we have observed the changes in the online fraudulent database. In the traditional system, only with rule base, the fraudulent database has recorded 71 IPs as fraudulent. The multi model system recorded 283 IPs as fraudulent with the same data set, which is nearly four times more than the traditional system. This is a greater improvement in terms of prevention of fraudulent traffic. As we discussed in Fig. 1.2, the traditional system has very little effect on country score and referrer score when calculating the total score. But with the multi model system scores for countries such as India, Morocco, and Mexico have shown enough suspicion. Clicks came from these countries received a higher fraudulent score but the system did not have enough suspicious clicks to block any of the countries completely. Similar results are observed for referrers. We defined the quality of traffic as (1-score). Using only the rule based module and the outlier module we have about 47% and 65% quality scores respectively. With the combined model we were able to get much better traffic with about 36% of quality. With these results we can see that the multi model based CCFDP system is capable of improving the detection of fraudulent traffic at least by 10% compared to same models working alone. In addition we analyzed the volume of total clicks from Google and its partner network during these two periods of time. Figure 1.5 shows the total and invalid traffics from Google and Google partner networks for the second month and the eighth month. First thing to observe is there is much higher volume of traffic in the eighth month compared to the second month. Second thing to observe is that traffic from Google partner networks in the eighth month is almost negligible. In the second month out of 307 direct Google referrals 71 are observed invalid, while 138 of 583 Google partner network referrals are detected invalid. In the eighth month total Google only traffic is 2,444 and nearly 50% (1,036) of that traffic is found to be invalid.
1 Evidence Fusion for Real Time Click Fraud Detection and Prevention
1.7
13
Conclusion
In this paper we proposed a multi-model real time detection and prevention system for click fraud. The CCFDP system uses multi-level data fusion to enhance the description of each click, and to obtain better estimation of a click traffic quality. The CCFDP system analyzes the detailed user activities on both, server side and client side collaboratively to better evaluate the quality of clicks. Extended click record includes also context data available in fraudulent and blocking databases. Our system analyzes the extended click record using three independent data mining modules: rule based, outlier and click map. A score is assigned to each click based on the individual scores estimated by independent modules. Scores are combined using the Dempster-Shafer evidence theory. We have tested the system with a data from actual ad campaign in 2007 and 2008. Results show that higher percentage of click fraud is present even with most popular search engines such as Google. The multi-model based CCFDP estimated the average score as 64% where the 53% is the highest average score recorded by any individual module that ran the data alone. By these additional refinements we were also able to increase the quality of the click traffic by 10%.
References Benaskeur AR, Rheaume F (2007) Adaptive data fusion and sensor management for military applications. Aerosp Sci Technol 11(4):327–338 Cai D, McTear M, McClean S (2000) Knowledge discovery in distributed databases using evidence theory. Int J Intell Syst 15(8):745–761 Carvalho H, Heinzelman W, Murphy A, Coelho C (2003)“A general data fusion architecture”. In: International conference on information fusion, Cairns, Queensland, Australia, pp 1465–1472 Clerentin A, Delahoche L, Brassart E (2000) “Cooperation between two omnidirectional perception systems for mobile robot localization”. In: Proceedings of the 2000 IEEE/RSJ intemational conference on intelligent robots and systems, Takamatsu, Japan, pp 1499–1504 Dai X, Khorram S (1999) Data fusion using artificial neural networks: a case study on multitemporal change analysis. Comput Environ Urban Syst 23(1):19–31 Durrant-Whyte H (1987) “Integration, coordination and control of multi-sensor robot systems”. Dissertation Abstracts International 47(10) Ge L, Kantardzic M (2006) Real-time click fraud detecting and blocking system. US Patent App. 11/413,983, 1 May 2006 Hager G, Engelson S, Atiya S (1993) “On comparing statistical and set-based methods in sensor data fusion”. In: IEEE international conference on robot automation. Atlanta, USA Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco Immorlica N, Jain K, Mahdian M, Talwar K (2005) Click fraud resistant methods for learning click-through rates. Lecture Notes in Computer Science 3828:34–45 Janez F, Goretta O, Michel A (2000) “Automatic map updating by fusion of multispectral images in the Dempster-Shafer framework”. In: Proceedings of SPIE, vol 4115. p 245 Kantardzic M, Walgampaya C, Wenerstrom B, Lozitskiy O, Higgins S, King D (2008) “Improving click fraud detection by real time data fusion”. In: IEEE international symposium on signal processing and information technology, 2008. ISSPIT 2008, pp 69–74
14
C. Walgampaya et al.
Kantardzic M, Wenerstrom B, Walgampaya C, Lozitskiy O, Higgins S, King D (2009) “Time and space contextual information improves click quality estimation,” e-Commerce 2009, p 123 Lambert D (2009) A blueprint for higher-level fusion systems. Inf Fusion 10(1):6–24 Mahdian M (2006) “Theoretical challenges in the design of advertisement auctions”. In: The capital area theory symposia, University of Maryland, Spring Metwally A, Agrawal D, El Abbadi A. (2007) “Detectives: detecting coalition hit inflation attacks in advertising networks streams”. In: Proceedings of the 16th international conference on World Wide Web, ACM, NetMosaics (2009) NetMosaics Inc. internal documentation Ouyang N, Liu Z, Kang H (2008)“A method of distributed decision fusion based on SVM and DS evidence theory”. In: 5th international conference on visual information engineering, pp 261–264 Shafer G (1976) A mathematical theory of evidence. Princeton university press, Princeton Solaiman B, Pierce L, Ulaby F (1999) Multisensor data fusion using fuzzy concepts: application to land-cover classification using ERS-1/JERS-1 SAR composites. IEEE Trans Geosci Remote Sens 37(3):1316–1326 Tian J, Zhao W, Du R, Zhang Z (2005) “DS evidence theory and its data fusion application in intrusion detection”. In: Lecture notes in computer science, 3802 Wald L (2001)“The present achievements of the EARSeL-SIG’data fusion, in a decade of trans-european remote sensing cooperation”. In: Proceedings of the 20th Earsel Symposium, Taylor & Francis, Dresden, 14–16 June 2000, p 263 Walgampaya C, Kantardzic M, Yampolskiy R (2010) “Real time click fraud prevention using multi-level data fusion, lecture notes in engineering and computer science. In: proceedings of the world congress on engineering and computer science 2010,” WCECS 2010, 20–22 Oct 2010, San Francisco pp 514–519 Waltz E (1998) “Information understanding: integrating data fusion and data mining processes”. In: IEEE international symposium on circuits and systems, pp 553–556 Wu H, Siegel M, Stiefelhagen R, Yang J (2002) “Sensor fusion using Dempster-Shafer theory”. In: IEEE instrumentation and measurement technology conference proceedings, vol 1. pp 7–12 Yukun C, Xicai S, Zhigang L (2007) Research on Kalman-filter based multisensor data fusion. J Syst Eng Electron 18(3):497–502
Chapter 2
Biped Locomotion with Arm Swing, Based on Truncated Fourier Series and Bees Algorithm Optimizer Ebrahim Yazdi, Vahid Azizi, and Abolfazle T. Haghighat
2.1
Introduction
Bipedal walking is a difficult task due to its intrinsic instability and developing successful controller architectures for this mode of locomotion has proved substantially more difficult than for other types of walking (Beer et al. 1998). This type of walking has been tackled from different directions that all of them have been divided to two major approaches of static walking (Kato et al. 1974) and dynamic walking (Takanishi et al. 1982). Static walking assumes that the robot is statically stable. This means that, at any time, if all motion is stopped the robot will stay indefinitely in a stable position. It is necessary that the projection of the center of gravity of the robot on the ground must be contained within the foot support area. This approach was abandoned because only slow walking speeds could be achieved, and only on flat surfaces. Biped dynamic walking allows the center of gravity to be outside the support region for limited amounts of time. There is no absolute criterion that determines whether the dynamic walking is stable or not. Indeed a walker can be designed to recover from different kinds of instabilities(Hodgins and Raibert 1990). However, if the robot has active ankle joints and always keeps at least one foot flat on the ground then the Zero Momentum Point (ZMP) can be used as a stability criterion. There are two major approaches, model-based and model-free, in Dynamic walking researches. In model-based approaches, controller of robot is dependent on model of robot and from one robot to another everything in controller should be changed. Two well-known methods in this approach are “Zero Moment Point” (Zhang et al. 1999; Vukobratovic et al. 2001) (ZMP) and “Inverted Pendulum” (Kajita et al. 2001). ZMP specifies the point with respect to which dynamic reaction
E. Yazdi (*) Department of Electerical & Computer Engineering, Mechatronics Research Labrotoary (MRL), Qazvin Azad University, Qazvin, Iran e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_2, # Springer Science+Business Media, LLC 2011
15
16
E. Yazdi et al.
force at the contact of the foot with the ground does not produce any moment, i.e. the point where total inertia force equals 0 (zero).The ZMP is no longer meaningful if the robot makes multiple non-planar contacts. In Inverted Pendulum, walking is often likened to the motion of two coupled pendulum, because the swing leg behaves like an inverted pendulum moving about the stance foot and like a regular pendulum swinging about the hip. In model-free approach, it is common to make use of the sensory information and associate it with motions. No physical model is used in this method that eases the implementation of the skills. There are three important studies done in this field; Passive Dynamic Walking (PDW) (McGeer 1980), Ballistic Walking (Mochon and McMahon 1980) and Central Pattern Generator (CPG). Passive dynamics as an approach to robotic movement control (especially walking), is based on utilizing the momentum of swinging limbs for greater efficiency. This method is based on using the morphology of a mechanical system as a basis for necessary controls. Ballistic Walking is using from Lagrange equations. Central Pattern Generator (CPG) as a model-Free approach uses from a set of neural oscillators for controlling robot and uses genetic algorithm as a weight optimizer. In this paper, a model free approach described, a Truncated Fourier Series (TFS) formulation has been used for controlling of robot. TFS was used in 2006 for the first time for gait generation in bipedal locomotion (Yang et al. 2006). TFS together with a ZMP stability indicator are used to prove that TFS can generate suitable angular trajectories for controlling bipedal locomotion. In the TFS model applications for 2D walking, three key parameters determine the locomotion: the fundamental frequency which determines the pace of walking, the amplitude of the functions which determines the stride, and the constant terms used to adjust to different inclinations of the terrain (Yang et al. 2007). Humans naturally swing their arms when they walk or run. Although arm swing has often been compared with pendulum motion, it is not a purely passive phenomenon. Muscle activity controls arm swing magnitude and timing during human locomotion (Hinrichs 1990). Elftman first proposed that arm swing during walking balances torso torques caused by swinging of the lower limbs (Elftman 1939). This idea has been studied further by others with the same general conclusions (Hinrichs 1990; Li et al. 2001). Our approach is also capable to produce hand angular trajectories with emphasizing the role of hands in smooth and robust walking. In this approach, the Bees Algorithm (BA) (Pham et al. 2006) technique with constraint handling on angles and time is used to find optimum parameters of TFS and train the robot to achieve fast bipedal forward and backward walking for the first time.
2.2
Simulator and Robot Model
The target Robot of our study is a 22-DOF (degrees of freedom) NAO robot with 4.5 Kg weight and 57 Cm stand height. The robot has four DOF for each arm, six DOF for each leg and has a head with 2 degrees of freedom.
2 Biped Locomotion with Arm Swing. . . Table 2.1 Morphological major parameters of the NAO robot
Joint name Ankle Hip Knee Shoulder
17
Motion Front and back (Y) Front and back (Y) Front and back (Y) Front and back (Y)
Range 75–55 100–25 130–0 120–120
The simulation performed by Rcssserver3d simulator which is a generic three dimensional simulator based on Spark and Open Dynamics Engine (ODE) Smith (http://www.ode.org). Spark is capable of carrying out scientific distributed multi agent calculations as well as various physical simulations ranging from articulated bodies to complex robot environments (Boedecker 2005). The time-integrated simulation is processed with a resolution of 50 simulation steps per second. In this approach, the body trunk of the robot is not actuated. Our experiences show, the leg and arm joints are more effective joints for walking, that the joints of hip, knee, ankle and shoulder which move on the same plane of forwardbackward are the major ones and these joint have restricted degree (Table 2.1 shows description of them).
2.3
Joint Angular Trajectory
Human motions are recognized flexible and periodic but more challenging for motion stability issue. Therefore, human-like motion patterns are included into the research objectives. The walking trajectory is divided into several types. Positional trajectory and angular trajectory are two of them. In this paper angular trajectory has been used.
2.3.1
Lower Body Joint Angular Trajectory
Similar to (Kagami et al. 2003) Foot was kept parallel to the ground by using ankle joint in order to avoid collision. Therefore ankle trajectory can be calculated by adding hip and knee and multiply with (1), so ankle DOF parameters are eliminated. Successful walking is defined as an acceptable motion tracking based on the optimized walking (Hinrichs 1990). If a gait period divides to six time slice, (2.1) and (2.2) formulate the joint control in the smooth plane. The following notations have been used for variables in the joint angular trajectory formulations: 1. Subscripts 1 and 2 refers to the joint trajectory from the beginning of swing and stand phase respectively. 2. Subscripts h and k denote the hip and knee respectively. 3. Subscripts l, r, lo refers to left leg, right leg and lock phase respectively.
18
E. Yazdi et al.
ykl ¼ ylo ykl ¼ ykl
½t 0 ; t 2 ½t2 ; t6
ykr ¼ yk1 ½t0 ; t4 ykr ¼ ylo ½t4 ; t6 yhl ¼ yh2 yhl ¼ yh1
½t0 ; t3 ½t3 ; t6
yhr ¼ yh1
½t0 ; t3
yhr ¼ yh2
½t3 ; t6
ð2:1Þ
ð2:2Þ
According to (2.1) and (2.2) we can realize that the trajectories for both legs are identical but they have been shifted in time relative to each other by half of the walking period. The joint angle trajectories can be separately looked by offsets. The values of the defined offsets actually influence the biped’s posture during walking. Note that in lock phase knee joints only have those offsets for angular trajectory. It can be note that we assume left leg is stand leg in first half period.
2.3.2
Upper Body Joint Angular Trajectory
As humans change walking speed, their nervous systems adapt muscle activation patterns to modify arm swinging for the appropriate frequency. Humans have neural connections between their upper limbs and lower limbs which coordinate muscle activation patterns during locomotors tasks. Mechanical analysis indicates that arm swing during human locomotion helps to stabilize rotational body motion (Zehr and Duysens 2004). During human walking, the arms normally swing in opposite manner to legs, which helps to balance the angular momentum generated in the lower body (Hinrichs 1990; Elftman 1939). Humans swing their arms close to 180 out of phase with their Respective legs during walking (Collins et al. 2001). It means that left arm swing opposite of right leg and right arm pendulum opposite manner of left leg.
2.4
Truncated Fourier Series Motion Generator
In (2.3) the original Fourier series of periodic function F (t) has been written: X 1 1 X 1 2pi 2pi f ðtÞ ¼ a0 þ t þ t ai sin bi cos 2 T T i¼1 i¼1
(2.3)
ai and bi are constant coefficients and T is the time period. When i is infinite value this formula can produce any complicated signal. But when the value of i is definite
2 Biped Locomotion with Arm Swing. . .
19
accuracy of signal is decrease. Because, walking angular trajectories are too complicated signals, this equation cannot create true signals with definite i. Therefore a modified definite Fourier series as a Truncated Fourier series (TFS) has been used as follow: f ðtÞ ¼
n X
ai sinðiotÞ þ cf
(2.4)
i¼1
Where ai and cf are constants that should be determined and o is the fundamental frequency determined by the desired period of the gait. n determines the number of terms and can be chosen as a trade-off between the accuracy of the approximation required and the computational load. Note that cf is an offset of angular trajectories that we talked about it before. With this approach and (2.1) and (2.2), the TFS for hip-pitch and knee-pitch angles are formulated as follow: yh1 ¼
n X
Bi sinðioh th1 Þ þ ch ; oh ¼
i¼1
yh2 ¼
n X
2p Th
Ai sinðioh th2 Þ þ ch
i¼1
yk1 ¼
n X
Ci sinðioh tk1 Þ þ ck ; ok ¼
i¼1
2p Tk
yklo ¼ ck ; ck 0
(2.5)
In these equations Ai, Bi and Ci are constant coefficients. Ck and Ch are signal offsets and Tk and Th are assumed as period of knee and hip trajectory respectively. With reliance to the fact that all joints in walking motion have equal movement frequency we can assume Tk ¼ Th. Th1, Th2 show the end time of hip stand trajectory and hip swing trajectory respectively and their values can be calculated with half period of walking. Tk1 represents the end time of the lock phase that computed with optimizer. As we said arm, swing in opposite manner of leg, so we can use from hip motion generator to control arm. But according to this fact that wave length of arm joint angle trajectories and hip joint angle trajectories are different and according to this truth that angle of arms is different from angle of hip in start of walking, joint angular motion generator for arms can be written as (2.6). yh1 ¼
n X
Di sinðioh ts1 Þ þ ca ; oh ¼
i¼1
yh2 ¼
n X i¼1
Ei sinðioh ts2 Þ þ ca
2p Ts (2.6)
20
E. Yazdi et al.
In these equations Di, Ei are constant coefficients. ca is a signal offset and Ts assumed as period of arm trajectory that it is equal with Th.ts1 and ts2 show the end time of arm stand trajectory and arm swing trajectory respectively and their values are exactly equal to th1 and th2.
2.5
Bees Algorithm
Bees Algorithm (BA) which is a nature-inspired algorithm, mimicking the food foraging behavior of swarms of honey bees. It is developed by Pham and et al. recently. The BA algorithm is simple in concept, few in parameters, and easy in implementation, it has been successfully applied to various benchmarking and realworld. The algorithm requires a number of parameters to be set, namely: number of scout bees (n), number of elite bees (e), number of patches selected out of n visited points (m), number of bees recruited for patches visited by “elite bees” (nep), number of bees recruited for the other (m-e) selected patches (nsp), and size of patches (ngh). The pseudo code of the BA is as follows: 1. 2. 3. 4. 5. 6. 7. 8.
Initialize population with random solutions. Evaluate fitness of the population. While (stopping criterion not met) Select sites for neighborhood search. Recruit bees for selected sites (more bees for better sites) and evaluate fitness. Select the fittest bee from each patch. Assign remaining bees to search randomly and evaluate their fitness. End while
In Step 1, the BA matrix is filled with as many randomly generated solution vectors as the m. Their results calculate in Step 2. In Step 4, bees that have the highest fitness are chosen as “selected bees” and sites visited by them are chosen for neighborhood search. Then, in Steps 5 and 6, the algorithm conducts searches in the neighborhood of the selected sites, assigning more bees to search near to the best e sites. The bees can be chosen directly according to the fitness associated with the sites they are visiting. Step 7; assign the remaining bees in the population, randomly around the search space scouting for new potential solutions. These steps are repeated until a stopping criterion is met.
2.6
Applying Bees Algorithm
With the rule base fixed, the Bees Algorithm was used to tune the parameters of the input and output membership functions and the scaling gains for the input and output variables.
2 Biped Locomotion with Arm Swing. . . Table 2.2 Lower band and upper band that used for harmony memory initialization
Parameter A B C D E Ch Ck Ca tk tk1
21
Lower bound 0 40 40 0 40 10 40 40 0.01 0.01
Upper bound 40 0 0 40 0 30 0 0 1 1
In theory, each bee is a vector comprising 10 real numbers. Eight of those numbers have been reserved for constant parameters and two have been reserved for the parameter variable of time period and Tk1. Bees are generated randomly and uniformly for the first iteration between lower and upper bound. In this study the lower and upper bound data for initialization are depicted in Table 2.2. The following parameter values are set for optimization, population (n ¼ 100), number of selected sites (m ¼ 15), number of elite sites (e ¼ 5), initial patch size (ngh ¼ 2) for coefficients and (ngh ¼ 0.04) for Time period, number bees around elite points (nep ¼ 10), number of bees around other selected points (nsp ¼ 3). Note that ngh defines the initial size of the neighborhood in which follower bees are placed. Fitness function has a critical role in BA and is used to judge how well a solution represented by a Bee is. To achieve more stable and faster walk, a fitness function based on robot’s straight movement with having limited time for walking is assumed. The amount of deviation from straight walking is subtracted from the fitness as a punishment to force the robot to walk straight. We ran the simulator for 15 s, first the robot is initialized its X and Y values equal to zero and fitness function is calculated whenever robot falls or time duration for walking is over. Equation 2.7 shows the pseudo code of computing of fitness function in forward walking respectively. if ðCurrentTime TimeDurationÞf fitness ¼ X Y; if ðRoboIsFallenÞ fitness ¼ g
XY ; TimeDuration CurrentTime
else fitness ¼ 10000;
(2.7)
When robot falls down during walk the fitness is divided to remaining time of simulation. This punishment forces the robot to achieve a stable walk.
22
2.7
E. Yazdi et al.
Result
We ran the simulator on a Pentium IV 3 GHz Core 2.6 Duo machine with 4 GB of physical memory, with Linux SUSE 10.3. The time period for the simulation was 15s. About 2 h after starting BA under the MATLAB, 800 trials were performed. Figure 2.1 shows the average and best fitness values during these eight generations. In our previous work (Yazdi et al. 2010) without swing of arm during walking robot could walk only 4.2 m in 15 s, Figs. 2.2 and 2.3 shows the result of learning without arm swing. Gait period at the best found fitness equals to 0.54 s. according to this consequence Fig. 2.4 shows angular trajectories of hip, knee, ankle, and arm generated by oscillators. In human walking left leg and right leg angular trajectories have half period difference to occur, Figs. 2.5 and 2.6 shows this truth for hip and knee joint angular trajectories respectively. Arms swing is in same manner of their opposite hips with difference in wave length. Figure 2.7 shows that arm’s wave length is about twice greater than hip’s wave length in same manner.
Fig. 2.1 BA convergence, with using hand during walking robot could walk 6.8 m in 15 s with average body speed 0.45
2 Biped Locomotion with Arm Swing. . .
23
Fig. 2.2 BA convergence, without swing hands during walking robot could walk only 4.2 m in 15 s with average body speed 0.28
Fig. 2.3 Shows y coordinate of robot during walking with and without arm swing, and from this figure we can result that robot could walking straighter with arm swing
Fig. 2.4 In sequence angular trajectories of Left ankle, Left hip, Left arm and Left knee joints during walking. Arm swing in opposite manner of its respective leg. Every 50 iterations equal to 1 s
Fig. 2.5 Left and right hip angle trajectories
Fig. 2.6 Left and right knee angle trajectories
2 Biped Locomotion with Arm Swing. . .
25
Fig. 2.7 Trajectories of right hip and left arm joint during walking
2.8
Concolution
In this paper, we are able to increase the speed and stability of the robots walking compare to previously presented model by adding arm swing during walking. The current implementation is capable of walking in a straight line on a planar surface without the use of proprioceptive input. In this study TFS with BA is implemented in a simulated NAO robot that can walk fast and stable. Using from BA as optimizer shows that The Bees Algorithm is a computationally fast multi-objective optimizer tool for complex engineering multi-objective optimization problems.
References Beer RD, Chiel HJ, Quinn RD, Ritzmann RE (1998) Bio robotic approaches to the study of motor systems. Curr Opin Neurobiol 8(6):777–782 Kato I, Ohteru S, Kobayashi H, Shirai K, Uchiyama A (1974) Information-power machine with senses and limbs. First CISM-IFToMM symposium on theory and practice of robots and manipulators, Springer-Verlag Takanishi A, Naito G, Ishida M, Kato I (1982) Realization of plane walking by the biped walking robot WL-10R. Robotic and Manipulator Systems 283–393 Hodgins JK, Raibert MH (1990) Biped gymnastics. Int J Robot Res 9(2):115 Zhang S, Zhu C, Sin JKO, Mok PKT (1999) A novel ultrathin elevated channel low-temperature poly-Si TFT. IEEE Electron Device Letts 20:569–571
26
E. Yazdi et al.
Vukobratovic M, Borovacand B, Surdilovic D (2001) Zero-moment point proper interpretation and new applications. In: Proceedings of the 2nd IEEE-RAS international conference on humanoid robots, pp 237–244 Kajita S, Kanehiro F, Kaneko K, Yokoi K, Hirukawa H (2001) The 3D linear inverted pendulum mode a simple modeling for a biped walking pattern generation. In: Proceedings of the 2001 IEEE/RSJ international conference on intelligent robots and systems, pp 239–246 McGeer T (1980) Passive dynamic walking. Int J Robot Res 9(2):62–82 Mochon S, McMahon TA (1980) Ballistic walking. J Biomech 13:49–57 Yang L, Chew CM, Poo AN (2006) Adjustable bipedal gait generation using genetic algorithm optimized fourier series formulation. In: Proceedings of the 2006 IEEE/RSJ international conference on intelligent robots and systems, pp 4435–4440 Yang L, Chew CM, Zielinska T, Poo AN (2007) A uniform biped gait generator with off-line optimization and on-line adjustable parameters. Robotica 25(5):549–565 Hinrichs RN (1990) Whole body movement, coordination of arms and legs. In: Winters JM, Woo SLY (eds) Walking and running. Springer-Verlag, New York, pp 694–705 Elftman H (1939) The function of the arms in walking. Hum Biol 11:529–535 Li Y, Wang W, Crompton RH, Gunther M (2001) Free vertical moments and transverse forces in human walking and their role in relation to arm-swing. J Exp Biol 204:47–58 Pham DT, Ghanbarzadeh A, Koc E, Otri S, Rahim S, Zaidi M (2006) The Bees Algorithm A Novel Tool for Complex Optimization Problems. In: Proceedings 2nd international virtual conference on intelligent production machines and systems(IPROMS), pp 454–45 Smith R (2007) Homepage of open dynamics engine project; http://www.ode.org. Boedecker J (2005) Humanoid robot simulation and walking behavior development in the spark simulator framework, Artificial Intelligence Research University of Koblenz technical report Kagami S, Mochimaru M, Ehara Y, Miyata N, Nishiwaki K, Kanade T, Inoue H (2003) Measurement and comparison of humanoid H7 walking with human being. Robot Auton Sys 48:177–187 Zehr EP, Duysens J (2004) Regulation of arm and leg movement during human locomotion. J Neuroscientist 10:347–361 Collins SH, Wisse M, Ruina A (2001) A 3-D passive dynamic walking robot with two legs and knees. Int J Robot Res 20:607–615 Yazdi E, Azizi V, Haghighat AT (2010) Evolution of biped locomotion using bess algorithm based on truncated fourier series. In: Proceedings of the world congress on engineering and computer science 2010(WCECS 2010), pp 378–382
Chapter 3
Bio-Inspired Pneumatic Muscle Actuated Robotic System Andrea Deaconescu and Tudor Deaconescu
3.1
Introduction
From an engineering perspective, the functional morphology of living organisms represents a permanent inspiration for identifying high-tech innovative solutions. In this context over the last years a new branch of science has emerged and grown, namely bionics, combining skills and proficiency from biology, mathematics, medicine and engineering. Bionics draws upon biological intuition and engineering pragmatism in order to adapt projects from nature to the requirements of modern technology. Nature is thus the starting point for innovation, it offers clues for what is useful and should be deployed in a mechanism. Starting from such clues the engineer’s task is to develop, test and improve the analyzed system. As part of the objectives of current bionic research, special attention is granted to the study of actuator elements. The study of such elements together with their control transmission processes represents an essential part of bionics. In this context humans, mammals, birds and fish represent the source of inspiration for developing various motion generating systems, with immediate utility, for example, in robotics. Numerous categories of robots are destined for operation in the immediate vicinity of humans. In order to be able to operate close to humans or to interact with these, a first requirement to be met by the new generations of robots is safe functioning. This entails preventing undesired collisions between the robot and humans, or in the worst case, minimizing the effects of such collisions. Reliability means the display of a compliant behaviour of the robot or, in other words, the possibility of continuous control of its stiffness. Human-friendly robots characterized by variable stiffness entail a structure including compliant actuators. Variable stiffness actuators (VSAs) or adjustable
A. Deaconescu (*) Dept. of Economic Engineering and Manufacturing Systems, Transilvania University of Bras¸ov, Romania, 29 Eroilor Bd, Bras¸ov, Romania e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_3, # Springer Science+Business Media, LLC 2011
27
28
A. Deaconescu and T. Deaconescu
compliant actuators are being designed and implemented because of their ability to minimize large forces due to shocks, to safely interact with the user, and their ability to store and release energy in passive elastic elements (Ham et al. 2009). Bio-inspired actuation systems, like pneumatic muscles, meet such requirements, due to their adaptive compliant behaviour, materialized by the possibility of continuous stiffness variation. Compliant behaviour, however, has its disadvantages, as it limits the robot’s performance because it reduces control bandwidth, due to structural resonance (Tondu and Lopez 1997). Compliant execution elements can be divided into two categories: active compliant actuators where a controller of a stiff actuator mimics the behaviour of a spring and passive compliant actuators, respectively, the latter including a compliant element capable of storing and/or releasing energy (Vanderborght et al. 2008). Passive compliant actuators are not designed for applications requiring high positioning accuracy, like for example, pick and place type activities. They are preferred in novel robots where safe human- robot interaction is required or in applications where energy efficiency must be increased by adapting the actuator’s resonance frequency. As both positioning accuracy and collision safety are important requirements, a robotic system should exhibit very low stiffness when subjected to a collision force greater than the one causing human injury, but maintain very high stiffness otherwise (Park et al. 2009). The paper discusses a technical solution meeting both these requirements and presents a pneumatic muscle actuated rotation – translation system applicable in robotics. These actuators are regarded as bio-inspired elements of passive compliant type. An important step in the development is identifying the properties and performance that the artificial system is expected to achieve.
3.2
Construction of the Rotation-Translation System
Numerous international studies have revealed a continuous increase of the number of physically challenged persons worldwide. With the aggravation of their physical disability, these persons increasingly lose their autonomy and in tackling daily tasks become dependent on the support of caregivers and medical assistance. Eventually, the financial burden generated for the respective social system by such a situation requires the identification and implementation of new technologies as part of the process of medical assistance. For these reasons and following the choice of affected persons to increase their autonomy in everyday and professional life by means of modern rehabilitation equipment, over the last years intensive scientific research has been conducted in view of developing, improvement and implementation of robots in the fields of physical rehabilitation, medical assistance and professional activity support. The studies have yielded numerous so-called rehabilitation robots, available as laboratory prototypes or commercial products. The applicability
3 Bio-Inspired Pneumatic Muscle Actuated Robotic System
29
Fig. 3.1 Robotic arm on a wheelchair
of such systems ranges from “aid in the conducting of everyday tasks” and “caregiving” to “assistance in the professional environment”. Increasing the degree of autonomy of persons with locomotion disabilities is achievable by means of wheelchairs. Depending on the degree of the handicap, the chairs can be endowed with robotic arms allowing the user to carry out certain actions. The (one or two) robotic arms are mounted on the back or armrest of the wheelchair. Manipulators mounted on wheelchairs have the highest degree of flexibility in relation to their applicability. They allow the gripping of objects located randomly in space, the opening of doors, drawers, taps or the conducting of simple manipulation tasks like pouring a drink into a glass and raising that glass to the mouth. The costs of such flexibility depend on the cognitive capacity of the user while carrying out certain tasks, in other words, costs depend on the degree of physical handicap to be compensated. Figure 3.1 presents such robotic systems, designed for people with paralysis to permit greater independence, dignity and quality of life. Many applications require increased robot stiffness, what entails the utilization of electro-mechanical actuation systems. In such applications, where compliant behaviour is only secondary, the utilized electric motors offer a reduced forceto-weight ratio, as far as 16/1. For this reason, generation of large forces calls for the utilization of either high power electric motors or of reducing gears with high transmission ratios. In both situations the resulting structures are of large dimensions and high weight, what will cause high inertia forces with negative impact on the dynamic behaviour of the robot. Considering these aspects, the utilization of pneumatic muscles in the construction of robots becomes of interest, as compared to electric motors pneumatic muscles offer a clearly superior force-to-weight ratio (as far as 100/1), thus allowing the construction of small and light robots with favourable dynamic behaviour. In addition, air compressibility endows the robotic system with compliance, rendering unnecessary a further elastic element (Park et al. 2009).
30
A. Deaconescu and T. Deaconescu
Fig. 3.2 Working principle of pneumatic muscles
The utilization of pneumatic muscles, or, in other words of so-called compliant elements with specifically adjustable compliances offers the possibility to transmit motions just by structural deformations. The pneumatic muscle is a system based on a contracting membrane, that under the action of compressed air increases its diameter while decreasing its length. Thus the pneumatic muscle carries out a certain stroke, depending on the level of the feed pressure. The operational behaviour of such a system is similar to human muscles, as shown in Fig. 3.2. Pneumatic muscles are operational elements that have started to replace pneumatic cylinders in certain applications. Compared to pneumatic cylinders, the muscles are about eight times lighter, while generating an about ten times greater force, for identical interior diameters. By means of pneumatic muscles very slow and small amplitude motions can be carried out, as muscles are completely stick-slip free, hence their superior dynamic behaviour compared to pneumatic cylinders. Pneumatic muscle actuated robots entail an extremely light construction of increased flexibility, while meeting the safety requirements for equipment deployed in the immediate vicinity oh humans or in narrow spaces (Deaconescu and Deaconescu 2009a, b; Mihajlov et al. 2006; Mihajlov and Ivlev 2005; Deaconescu and Deaconescu 2008). The pneumatic muscle actuated rotation-translation system presented in Fig. 3.3 is part of a robotic arm mounted on a wheelchair. The rotation-translation system is actuated by two pairs of pneumatic muscles. The first pair with the function of generating the rotation is based on the principle of antagonistic actuation, while the muscles of the second pair with the function of generating the translation motion are actuated synchronously (agonistic actuation). The generation of rotation is based on an innovative solution, that differs from others by the two actuators (pneumatic muscles) rotating together with the actuated element. The role of the two muscles is to generate a 45 rotation, and
3 Bio-Inspired Pneumatic Muscle Actuated Robotic System
31
Fig. 3.3 Pneumatic muscle actuated rotation-translation system
to ensure the balance of any intermediary position of the actuated system. The rotation of the entire robotic system is quite similar to motion generated by human muscles, as being based on the same agonist-antagonist system illustrated in Fig. 3.4. Two flexible steel cables are affixed to the free ends of the two pneumatic muscles. The cables are then passed through the groove of a reel of 2R ¼ 38 mm diameter, and rigidly fixed at the opposite ends. This construction structure allows by the antagonistic inflation/deflation of the two muscles the generation of rotation in one or the other direction of the entire mechanical structure, thus also of two pneumatic actuators. The bearing of the entire system is located at its lower end, thus allowing its rotation in either direction. The contraction ratio of a muscle e is defined by (3.1): e¼
Li L DL ¼ Li Li
(3.1)
where Li is the initial length of the muscle, at a pressure equal to zero, and L – is the length of the muscle inflated at a random pressure p. The maximum contraction ratio emax is: emax ¼
DLmax Li
(3.2)
where DLmax is the maximum stroke carried out by the free end of a muscle, when loaded at maximum pressure.
32
A. Deaconescu and T. Deaconescu
Fig. 3.4 Working principle of the rotation module
The force developed by a pneumatic muscle can be computed by (3.3): e F ¼ Fmax 1 emax
(3.3)
It can be noticed that for e ¼ emax, the force developed by the muscle is F ¼ 0. In order for the rotation to be possible, the first step is the pre-loading of the two muscles with compressed air at a pressure p0 equal to ½ of the maximum working pressure. Upon pre-loading, the pneumatic actuators will have contracted to length L0, at a contraction ratio for each of e0, computed by (3.4): DLmax DLmax e0 ¼ 2 ¼ Li 2 Li
(3.4)
Figure 3.5 presents the constructive solution adopted for compensating the displacement of the two muscles upon their pre-loading (inflation at pressure p0).
3 Bio-Inspired Pneumatic Muscle Actuated Robotic System
33
Fig. 3.5 Compensating for muscle shortening by DLmax/2
The system is based on allowing the axles of the two cable reels to glide in a guide, while the support of the reels is strained by means of compression springs. When the pre-loading of the pneumatic muscles is achieved, the horizontal axles of the two reels are simultaneously lifted by half the maximum possible stroke of each muscle. Upon releasing the air from the muscles, the compression springs cause the reels to return to their resting position. When a rotation by an angle y is desired, one of the muscles will be fed additional compressed air, up to pressure of p1 ¼ p0 + Dp, while the second muscle will be deflated to a pressure p2 ¼ p0 – Dp. By feeding different pressures to the two muscles, their lengths will be modified in relation to their initial state as follows: the muscle inflated to pressure p1 will shorten to a length L1 ¼ L0 – DLmax/2, while the second muscle will expand to a length L2 ¼ L0 + DLmax/2. Upon rotating the joint by angle ymax, the contractions of the two muscles become, respectively: e1 ¼ e0 þ
R ymax DLmax ¼ Li Li
(3.5)
e 2 ¼ e0
R ymax ¼0 Li
(3.6)
where R is the radius of the reel guiding the cable connecting the free ends of the two pneumatic muscles.
34
A. Deaconescu and T. Deaconescu
Fig. 3.6 Construction of the translation module
The forces developed by the two muscles are determined by (3.7) and (3.8): F1 ¼ Fmax 1
e1 emax
¼0
e2 ¼ Fmax F2 ¼ Fmax 1 emax
(3.7)
(3.8)
while the generated torque will be: M ¼ R ðF1 F2 Þ ¼ R Fmax
(3.9)
Air compressibility renders the pneumatic muscles compliant. Compliance C can be expressed as the inverse of stiffness, K (Pashkov et al. 2004): C1 ¼ K ¼
dF dL
(3.10)
As the magnitude of the force depends on that of the feeding pressure of the two muscles, it follows that compliance can be adapted by controlling the pressure. The construction of the translation module includes as its two main elements a pair of pneumatic muscles operated synchronously by feeding of compressed air. Upon being fed compressed air the muscles shorten their initial (resting) length by dL, thus causing the displacement of a vertical slide, guided by two pairs of rollers. Figure 3.6 shows the construction of the translation module. The two muscles work simultaneously upon being fed compressed air. By feeding compressed air to the muscles they decrease their initial (resting) length
3 Bio-Inspired Pneumatic Muscle Actuated Robotic System
35
Fig. 3.7 Actuation diagram of the rotation-translation system
by up to 60 mm, thus setting a vertically gliding slide into motion. The gliding of the slide along the standardized profile is ensured by two pairs of rollers. Figure 3.7 presents the actuation diagram of the rotation-translation system, based on three proportional pressure regulators PR. Two of these (PRR1 and PRR2) have the function of controlling the rotation, while the third one, PRT is responsible for feeding the two muscles, thus achieving translation (Deaconescu and Deaconescu 2010). The electrical diagram presented in Fig. 3.8 shows that the asynchronous displacement of the two muscles responsible for the rotation is controlled by a double potentiometer, that antagonistically commands the opening of the two proportional pressure regulators.
3.3
System Performance
Research conducted on the experimental rotation module was aimed at analyzing the evolution of the pressure required for the activation of the two pneumatic muscles, the evolution of the consumed airflow, as well as at determining the response times of the muscles in contraction/inflating and relaxation/deflating, respectively. Measurements were carried out by loading the muscles with compressed air at the
36
A. Deaconescu and T. Deaconescu
Fig. 3.8 Electrical diagram of the rotation-translation system
maximum pressure of 4.3 bar, the evolution of pressure versus time being recorded by means of two pressure sensors with indicators. At the same time the evolution of the air flow rate feeding the two muscles was monitored. Figure 3.9 shows the diagrams corresponding to pressure and feed flow rate variation for two cycles. The upper one of the two graphs displays the evolution of the feeding pressure of the two muscles, with continuous and dotted lines, respectively. Initially, the two muscles are fed simultaneously to a pressure amounting half its maximum working value (p0 2 bar). During actual operation it can be observed that while one of the muscles is loaded additionally with compressed air, the other one is relaxed, and vice versa. In relation to the airflow, impulse growths can be noticed corresponding to the switching of the system from one state to another, namely when left or right rotation is initiated. Starting from the graph in Fig. 3.10, illustrating a pre-loading – loading sequence of a pneumatic muscle, the duration of these events can be determined. Thus, the pre-loading of the muscles requires about 1.25 s, while their loading to 4.2 bar requires 0.6 s. Experimental research on the performance of the translation module were oriented towards determining the positioning precision of the mobile slide for various levels of the feeding pressure. The conducted experiments included loading the slide with five different weights, namely of 10 N, 34 N, 74 N, 98 N and 122 N, respectively. The measurements consisted of determining the contraction DL of the two muscles, upon being fed compressed air (inflated) as well as upon releasing air (deflated).
3 Bio-Inspired Pneumatic Muscle Actuated Robotic System
Fig. 3.9 Pressure and airflow evolution versus time
Fig. 3.10 Determining pre-loading and loading times
37
38
A. Deaconescu and T. Deaconescu
Fig. 3.11 Hysteresis of the pneumatic muscles
Figure 3.11 presents the evolution of the axial contraction of the two muscles versus the pressure of the fed air, for two values of the attached weights (10 and 98 N). The occurrence of a hysteresis phenomenon can be noticed, characterized by a delay (a lag) in the muscles regaining their initial form upon deflation. The maximum difference of muscle contractions in inflating-deflating, for the same value of the feeding pressure is of about 7 mm for a load of 10 N, and of 10 mm for a load of 98 N, respectively. These experimentally determined values imply, for the two values of the attached weights, a difference of the relative deformation of the muscles in inflating and deflating, respectively, namely DL/Li·100, ranging between 2.3% and 3.3%. The hysteresis phenomenon can be explained by the friction between the exterior wall of the muscle’s elastic tube and the mesh enveloping it. The occurrence of hysteresis represents a major disadvantage to the utilization of pneumatic muscles in applications requiring high precision motions. The measurements allowed the determination of two equations for computing the axial contraction of the pneumatic muscles, contraction expressed as DL ¼ f(p, F): – in the case of muscle inflation: DLðp; FÞ ¼ 1:251 p2:481 F0:185
(3.11)
– in the case of muscle deflation: DLðp; FÞ ¼ 12:006 p0:93 F0:08 þ 0:5
(3.12)
3 Bio-Inspired Pneumatic Muscle Actuated Robotic System
39
Fig. 3.12 DL ¼ f(p, F)
Equation 3.12 together with the graph describing muscle behaviour at deflation reveal that for an air pressure of zero bar, the pneumatic actuator does not resume its resting length Li. A remnant contraction of 0.5 mm is noticed, that disappears only after a longer period of time. Figure 3.12 presents the evolution of the axial contraction DL versus feeding air pressure p and the load F attached to the slide of the translation module.
3.4
Conclusions
The paper presents and discusses a novel variant of a bio-inspired rotation-translation system, actuated by pneumatic muscles. Such a solution lends itself particularly to developing robotic manipulation systems for rehabilitation activities of physically disabled persons. The solution of pneumatic actuation was selected due to the compliance that is characteristic to these actuators. Achieving the driving of systems by pneumatic muscles proves that these actuators, yet insufficiently known and deployed, offer numerous advantages related to dynamic behaviour as well as to involved costs.
References Ham R, Sugar T, Vanderborght B, Hollander K, Lefeber D (2009) “Compliant actuator designs”. IEEE Robot Autom Mag 16(3):81–94 Tondu B, Lopez P (1997) “The McKibben muscle and its use in actuating robot-arms showing similarities with human arm behaviour”. The Industrial Robot (Bedford) 24(6):432
40
A. Deaconescu and T. Deaconescu
Vanderborght B, Sugar T, Lefeber D (2008) “Adaptable compliance or variable stiffness for robotic applications”. IEEE Robot Autom Mag 15(3):8–9 Park JJ, Song JB, Kim HS (2009) “Safe joint mechanism based on passive compliance for collision safety”. In: Lecture notes in control and information sciences, Springer, Berlin/Heidelberg, pp 49–61 Deaconescu A, Deaconescu T (2009) “Performance of a pneumatic muscle actuated rotation module”. In: Proceedings of the world congress on engineering 2009, WCE 2009, vol 2. London, pp 1516–1520 Deaconescu A, Deaconescu T (2009) “Pneumatic muscle actuated robotized arm for rehabilitation systems”. Proceedings of the international conference on industrial engineering 2009, vol 2. Hong Kong, pp 1872–1875 Mihajlov M, Ivlev O, Gr€aser A (2006) “Design and control of safe robotic arm with compliant fluidic joints”. In: International symposium on robotics and 4th German conference on robotics, Munich, 15–17 May Mihajlov M, Ivlev O (2005) Development of locally controlled fluidic robotic joints actuated by rotary elastic chambers Deaconescu T, Deaconescu A (2008) “Study of a non-anthropomorphic pneumatic muscle actuated gripper”. In: 6th international fluid power conference proceeding, Shaker Verlag, Dresden, pp 267–277 Pashkov E, Osinskiy Y, Chetviorkin A (2004) “Electropneumatics in manufacturing processes”. Isdatelstvo SevNTU Sevastopol Deaconescu A, Deaconescu T (2010) “Bio-inspired rotation-translation system for rehabilitation robots”. In: Proceedings of the world congress on engineering and computer science 2010, WCECS 2010, vol 1. San Francisco, pp 357–360
Chapter 4
Evaluation of an Orthopedic Surgical Robotic System Orthoroby on Bone Cadaver Yasin G€ uven and Duygun Erol Barkana
4.1
Introduction
Orthopedic surgery is one of the most common operations in hospitals. Most of the bone related orthopedic surgeries are performed to straighten bone deformities, to extend bone length, and to remove bone regions inflicted on by tumors and infections. Current manual surgical techniques often result in inaccurate placing and balancing of hip replacements, knee components, or soft-tissues. In recent years, computer-assisted robotic systems are developed for orthopedic surgeries, which improve the precision and accuracy of the surgery (Davies 2000). Some of the orthopedic surgery robotic systems use serial manipulators and some of them use parallel manipulators. Robodoc (Schulz et al. 2007), Caspar, Acrobot (Jakopec et al. 2003) and Arthrobot (McEwen et al. 1989) are well known orthopedic surgical robots that belong to the serial manipulators with large workspace which are somewhat heavy and suffer from low stiffness and accuracy, and possess low nominal load/weight ratio. Parallel manipulators are preferred for orthopedic surgeries because they provide advantages in medical robotics such as small accumulated positioning errors and high stiffness. Parallel manipulators are closed kinematic structures that hold requisite rigidity to yield a high payload to self-weight ratio. MARS is one of the well known patient-mounted parallel robot (Shoham et al. 2003; Pechlivanis et al. 2009). Similar to the MARS miniature orthopedic robot, MBARS (Wolf et al. 2005) robot employs a parallel platform architecture, which has been used for machining the femur to allow a patella implant to be positioned (Wolf et al. 2005; Jaramaz et al. 2006). Compact robot system for image-guided orthopedic surgery (CRIGOS) is another parallel robot developed for planning of surgical interventions and for supervision of the robotic
Y. G€uven (*) Electrical and Electronics Engineering Department, Yeditepe University, 34755 Istanbul, Turkey e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_4, # Springer Science+Business Media, LLC 2011
41
42
Y. G€uven and D.E. Barkana
device (Brandt et al. 1999). Additionally, Orthdoc (Kwon et al. 2002) use parallel manipulators for orthopedic surgery. Hybrid bone attached robot (HyBAR) has also been developed with a parallel and serial hybrid kinematic configuration for joint arthroplasty (Song et al. 2009). A parallel robot has been developed with automatic bone drilling carriage (Tsai and Hsu 2007). On the other hand, Praxiteles is another patient-mounted surgical robot which comprised of 2 motorized degrees of freedom (DoF) whose axes of rotation are arranged in parallel, and are precisely aligned to the implant cutting planes with a 2 DoF adjustment mechanism (Plaskos et al. 2005). We have developed an orthopedic surgery robot called OrthoRoby, which consists of a parallel robot and a cutting tool (Erol Barkana 2010a, b). Note that not only the design but also the control of an orthopedic surgical robotic system to complete the surgical operation in a desired manner is an important issue. The low-level controllers, which are developed for surgical robotic systems, are responsible to complete surgical operations in a desired manner. Active constraint control, which constrains the motion to a predefined region, has been used for Acrobot (Jakopec et al. 2003). Arthrobot is controlled with a pneumatically powered and electronically controlled positioner (McEwen et al. 1989). MARS is positioned with a PC-based controller card that receives joint position feedback and calculates the inverse kinematics for joint-level control (Shoham et al. 2003). A proportionalintegral-derivative (PID) controller has been embedded with Peripheral Interface Controller (PIC) microcontroller to control the motion of MBARS (Wolf et al. 2005). PID controller has also been used to position Orthdoc (Kwon et al. 2002). Industrial high-bandwidth motion controller with embedded amplifier has been developed to control the motion of HyBAR (Song et al. 2009). A master-slave microcontroller has been used to control a parallel surgical robotic system (Tsai and Hsu 2007), and passive guided control with notification has been selected for Praxiteles (Plaskos et al. 2005). We choose a computed torque controller for OrthoRoby surgical robotic system (Erol Barkana 2010a, b). Computed torque controller (low-level controller) is responsible to supervise the OrthoRoby to produce necessary coordinated motion to complete bone cutting operation in a desired manner. It is also important to complete the surgical operation in a safe manner when using robotic systems during the surgery. Thus, we develop a high-level controller which is responsible to monitor the progress and safety of the cutting operation such that necessary dynamic modifications can be made (if needed) to complete bone cutting operation in a safe manner. The OrthoRoby robotic system has been evaluated on a bone cadaver. Various real-time experiments are performed to demonstrate the efficacy of the low-level and the high-level controllers of OrthoRoby on a bone cadaver.
4.2
Orthoroby
The OrthoRoby robotic system includes parallel robot with cutting tool (OrthoRoby), a Cartesian system, a camera system and a bone cadaver (Fig. 4.1). The parallel robot is controlled via a 3.2 GHz Pentium 4 PC with 2 GB of RAM.
4 Evaluation of an Orthopedic Surgical Robotic System Orthoroby on Bone Cadaver
43
Fig. 4.1 OrthoRoby robotic system
The hardware is controlled through the MatLab Real Time Workshop Toolbox from Mathworks, and WinCon from Quanser Consulting. All data I/O is handled by the Quanser Q8 board. The joint angles of the parallel robot are acquired using encoders of actuators with a sampling time of 0.001 s from a Quanser Q8 card. The torque output to the parallel robot is given with the same card with the same sampling time. A control card is developed to drive DC motors (actuators) of parallel robot. Position feedback of the actuators is received from internal encoders of actuators, which is transmitted to the Quanser Q8 board via this control card. Cartesian system of OrthoRoby robotic system is controlled using a touch screen panel. Two Logitech C600 HD webcam with fixed focus cameras are used in the OrthoRoby robotic system to measure the depth of cutting and to detect if cutting tool of parallel robot is close enough to the bone.
44
Y. G€uven and D.E. Barkana
Fig. 4.2 Control architecture of OrthoRoby
4.3
Control Architecture
The control architecture that is developed for orthopedic surgical robotic system OrthoRoby is shown in Fig. 4.2 (Erol Barkana 2010a, b). The control architecture is used to track a desired bone cutting trajectory in a desired and safe manner. The control architecture consists of a parallel robot and a cutting tool (OrthoRoby), a Cartesian system, a medical user interface, a camera system, a high-level controller and a low-level controller. Control architecture details have been given in (Guven and Barkana Erol 2010).
4.4
High-Level Controller
The high-level controller is the decision making module of the control architecture which makes intermittent decisions in a discrete manner. A hybrid system modelling technique is used to design the high-level controller. A set of hypersurfaces that separate different discrete states are defined for the high-level controller. The hypersurfaces are not unique and are decided considering the capabilities of the OrthoRoby, Cartesian system, camera system and medical user interface (Table 4.1). Each region in the state space of the plant, bounded by the hypersurfaces, is associated with a state of the plant. A plant event occurs when a hypersurface is crossed. A plant event generates a plant symbol to be used by the high-level controller. The next discrete state is activated based on the current state and the associated plant symbol (Table 4.2). In order to notify the low-level controllers the next course of action in the new discrete state, the high-level controller generates a set of symbols, called control symbols. In this application, the purpose of the high-level controller is to activate/ deactivate the Cartesian system, parallel robotic device and the cutting tool device in a coordinated manner so that the bone cutting operation does not enter critical regions of the state space to ensure safety. The control states are given in Table 4.3. The transition function uses the current control state and a plant symbol to determine the next control action. The high-level controller generates a control
4 Evaluation of an Orthopedic Surgical Robotic System Orthoroby on Bone Cadaver Table 4.1 Hypersurfaces h1 ¼ ðsb ¼¼ 1Þ
h2 ¼ jx xt j<j2j
h3 ¼ ðjxct j jxdb 2ct jÞ ^ ðcto ¼¼ 1Þ
h4 ¼ jx xi j<j21 j
( h5 ¼
ll < l
) ¼ d dlimit
h6 ¼ ðeb ¼¼ 1Þ h7 ¼ ðpb ¼¼ 1Þ h8 ¼ ðpb ¼¼ 0Þ ^ ðeb ¼¼ 0Þ
45
OrthoRoby’s cutting tool is positioned close to the bone using the Cartesian system, the camera system starts monitoring, and start button of the overall system is active. x and xt are OrthoRoby’s cutting tool position and the bone’s position, respectively. e is a value used to determine if the OrthoRoby’s cutting tool is close enough to the bone. cto (cutting tool on) is a binary value, which will be 1 when it is pressed and 0 when it is released. xct and xdb, which are calculated using the camera system, are the cutting tool depth and the depth in the bone, respectively. ect is a value used to determine if the cutting tool is close enough to the desired cutting depth. x and xi are the parallel robot position and the initial position of the operation, respectively. e1 is a value used to determine if the parallel robot is close enough to the initial position. ll and lu represent the set of lower and upper limits of the legs of the parallel robot, respectively. l is the set of actual leg lengths of the parallel robot. tctrl and trth are the torque applied to the actuators of the parallel robot and the threshold value, respectively. d and dlimit are the actual values of the parallel robot configuration in vector form and limit values of the parallel robot configurations in vector form, respectively. Emergency button (eb) is 1 when it is pressed by the surgeon. Pause button (pb) is pressed when the surgeon wants to pause the cutting operation for a while. Surgeon can release both pb and eb to continue cutting operation.
Table 4.2 Plant symbols x~1 The cutting tool of OrthoRoby approaches towards the bone using the Cartesian system (when h1 is crossed). x~2 The cutting tool of OrthoRoby reaches the bone (when h2 is crossed). The cutting tool reaches the desired cutting depth (when h3 is crossed). x~3 x~4 OrthoRoby goes back to starting position (when h4 is crossed), safety related issues happened such as the parallel robot leg lengths are out of limits, or the parallel robot applied force is above its threshold (when h5 is crossed), or emergency button is pressed (when h6 is crossed) The surgeon presses the pause button (when h7 is crossed). x~5 x~6 The surgeon releases the pause button (when h8 is crossed). x~61 If the surgeon presses pause button when the parallel robot is approaching towards the bone. x~62 If the surgeon presses pause button when the bone cutting tool is on. x~63 If the surgeon presses pause button when parallel robot is returning back to initial position.
46
Y. G€uven and D.E. Barkana
Table 4.3 Control states The parallel robot device alone is active to move towards the cutting region on the bone. s~1f s~1b The parallel robot device alone is active to move towards the initial position. Both the parallel robot device and the cutting tool device are active. s~2 Both the parallel robot device and the cutting tool device are idle. s~3 s~4m Memory state after surgeon says “stop”. Continue state when the surgeon wants to continue with the operation while s~4m (where s~5m m ¼ 1,2) is active.
Table 4.4 Control symbols Drive parallel robot device to move towards the cutting region on the bone. r~1f Drive parallel robot device to move back to the initial position. r~1b r~2 Drive cutting tool device to cut the bone. Make both parallel robot and cutting tool devices idle. r~3
symbol which is unique for each state (Table 4.4). The low-level controller cannot interpret the control symbols directly. Thus the interface converts the control symbols into continuous outputs, which are called plant inputs.
4.5
Low-Level Controller
Computed torque control is used for the low-level controller of parallel robot of OrthoRoby (Fig. 4.3). Activation/deactivation mechanism is used for both the low-level controllers of the cutting tool and the Cartesian system. The details of the low-level controllers are given in (Erol Barkana 2010a, b). Computed torque control is a model-based method, which uses the robot dynamics in the feedback loop for linearization and decoupling. Consider the control input _ þ GðlÞ þ tdist ¼ tctrl MðlÞl€r þ Cðl; lÞ
(4.1)
which consists of an inner nonlinear compensation loop and an outer loop with an exogenous control signal l€r . Substituting this control law into the dynamical model of the robot manipulator (4.1), it follows that l€ ¼ l€r
(4.2)
4 Evaluation of an Orthopedic Surgical Robotic System Orthoroby on Bone Cadaver
47
Fig. 4.3 Computed torque control for OrthoRoby
It is important to note that this control input converts a complicated nonlinear controller design problem into a simple design problem for a linear system consisting of decoupled subsystems. One approach to the outer-loop control is propositional–integral–derivative (PID) feedback, as Ð l€r ¼ l€d þ Kv ðl_d l_a Þ þ Kp ðld la Þ þ Ki ð ðld la ÞÞ
(4.3)
The resulting linear error dynamics are given in the following equation where the convergence of the tracking error to zero is guaranteed Ð e€q þ Kv e_ q þ Kp eq þ Ki eq ¼ 0
(4.4)
where eq ¼ ðld la Þ; Kv , Kp and Ki are the derivative, proportional and integration gains, respectively. Various control gains have been used to increase the accuracy and precision of the low-level controller when the cutting tool was active. The best control gains are decided by looking at the oscillations that are caused during the motion of OrthoRoby robotic system. Frequency components of error of leg lengths of OrthoRoby have been computed using Fast Fourier Transform to observe the oscillations. Thus, the following equation is used to compute the frequency components of leg length errors. Xi [k] ¼
N1 X
xi [n]ej2pkn=N
n¼0
where xi[n] (i ¼ 1,. . .,6) are leg length errors.
(4.5)
48
4.6
Y. G€uven and D.E. Barkana
Results
Various real-time experiments had been performed to find the best control gains for each leg of OrthoRoby. The best control gains were decided by using Fast Fourier Transform method. The best control gains of each leg of OrthoRoby were given in Table 4.5. Mean leg length errors for each leg were less than 0.42 mm with selected control gains. OrthoRoby had been evaluated with a bone cadaver. Initially, the OrthoRoby moved towards the bone and then cut the bone directly with 90 (Fig. 4.4). Desired and actual leg lengths and leg length errors had been shown in Fig. 4.5 with the frequency components of leg length errors for each leg. State changes in the highlevel controller had been given in Fig. 4.6. Initially, s~1f was active and OrthoRoby had moved towards the bone. Then, s~2 became active and cutting tool was activated (Fig. 4.7) and bone cadaver was cut. Later, s~1b became active to move OrthoRoby back to its initial position. At the end, s~3 was active and OrthoRoby had stopped its movement.
Table 4.5 Control gains of each leg of parallel robot
Fig. 4.4 OrthoRoby cut bone cadaver
Leg #
Kp
1 2 3 4 5 6
9,375 10,000 9,375 10,000 9,375 12,500
Ki 1,500 1,500 1,500 1,500 1,354 2,000
Kv 4,375 4,375 4,375 4,375 4,375 4,375
x 10−3
0.48 0.46 0
200
2 0
400
0
0.5 0
400
0
x 10−3
0.48 0.46 200
2 0
400
0
Time (s)
200
1 0.5 0
400
0
Time (s) x 10−3
0.46 0
200
2 0
400
0
Time (s)
200
1 0.5 0
400
0
Time (s)
0.48 0.46 0
200
2 0
400
0
Time (s)
200
1 0.5 0
400
0
Time (s)
0.48 0.46 0
200
2 0
400
0
Time (s)
200
1 0.5 0
400
0
Time (s)
0.48 0.46 0
200 Time (s)
400
Actual Length Desired Length
2 0 0
0.02 0.04 Frequency (Hz)
x 10−3
4
|X6(jw)|
Leg 6 Error (m)
Leg 6 (m)
x 10−3 0.5
0.02 0.04 Frequency (Hz)
x 10−3
4
|X5(jw)|
Leg 5 Error (m)
Leg 5 (m)
x 10−3 0.5
0.02 0.04 Frequency (Hz)
x 10−3
4
|X4(jw)|
Leg 4 Error (m)
Leg 4 (m)
x 10−3 0.5
0.02 0.04 Frequency (Hz)
x 10−3
4
|X3(jw)|
Leg 3 Error (m)
0.5 0.48
0.02 0.04 Frequency (Hz)
x 10−3
4
|X2(jw)|
0.5
0
Leg 3 (m)
200
1
Time (s) Leg 2 Error (m)
Leg 2 (m)
Time (s)
49
x 10−3
4
|X1(jw)|
0.5
Leg 1 Error (m)
Leg 1 (m)
4 Evaluation of an Orthopedic Surgical Robotic System Orthoroby on Bone Cadaver
200 Time (s) Length Error
Fig. 4.5 Leg length changes during bone cutting with 90
400
1 0.5 0 0
0.02 0.04 Frequency (Hz) Frequency Components
50
Y. G€uven and D.E. Barkana 9 8 7
State
6 5 4 3 2 1 0
0
50
100
150
200
250
300
350
400
350
400
Time (s) Fig. 4.6 States during bone cutting with 90 (~ s1f :1, s~1b :2, s~2 :4, s~3 :8)
1
Cutting Tool
0.8
0.6
0.4
0.2
0 0
50
100
150
200
250
300
Time (s) Fig. 4.7 Cutting tool execution during bone cutting with 90 (Off:0, On:1)
4 Evaluation of an Orthopedic Surgical Robotic System Orthoroby on Bone Cadaver
4.7
51
Conclusion
An orthopedic robotic system called OrthoRoby and a control architecture that will be used in bone cutting operations are developed. The OrthoRoby system consists of a parallel robot, a cutting tool and a Cartesian system. A control architecture has been developed for the OrthoRoby system that systematically combines a highlevel controller with a low-level controller of the OrthoRoby system to enable bonecutting operations in a safe and desired manner. Real-time experiments demonstrate that OrthoRoby robotic system and its control architecture can be used to cut bone cadaver successfully. Acknowledgment I gratefully acknowledge the help of Dr. Muharrem Inan who is an orthopedist in Orthopedics and Traumatics Department in Istanbul University Cerrahpas¸a Medical Faculty. The work is supported by TUBITAK, The Support Programme for Scientific and Technological Research Projects (1001)108E092 grant.
References Brandt G, Simolong A, Carrat L, Merloz P, Staudte HW, Lavallee S, Radermacher K, Rau G (1999) CRIGOS: a compact robot for image-guided orthopaedic surgery. IEEE Trans Inf Technol Biomed 3(4):252–260 Davies B (2000) A review of robotics in surgery. Proc Inst Mech Eng 214:129–140 Erol Barkana D (2010a) Design and implementation of a control architecture for a robot-assisted orthopedic surgery. Int J Med Robotics Comput Assist Surg 6(1):42–56 Erol Barkana D (2010b) Evaluation of low-level controllers for an orthopedic surgery robotic system. IEEE Trans Inf Technol Biomed 14(4):1128–1135 Guven Y, Barkana Erol D (2010) Control architecture for an orthopedic surgical robotic system OrthOroby, proceedings of the world congress on engineering and computer science- international conference on intelligent automation and robotics 2010, 20–22 October. San Francisco, USA, pp 334–339 Jakopec M, Baena FR, Harris SJ, Gomes P, Cobb J, Davies BL (2003) The hands-on orthopaedic robot "Acrobot": early clinical trials of total knee replacement surgery. IEEE Trans Robot Autom 19(5):902–911 Jaramaz B, Hafez MA, DiGioia AM (2006) Computer-assisted orthopaedic surgery. Proc IEEE 94(9):1689–1695 Kwon DS, Lee JJ, Yoon YS, Ko SY, Kim J, Chung JH, Won CH Kim JH (2002) The mechanism and the registration method of a surgical robot for hip arthroplasty. Proc. of the IEEE International Conference of Robotics and Automation, pp 1889–1894. McEwen C, Bussani CR, Auchinleck GF, Breault MJ (1989) Development and initial clinical evaluation of pre robotic and robotic retraction systems for surgery. Conf Proc IEEE Eng Med Biol Soc 3:881–882 Merlet J-P (2006) Parallel robots. Springer, The Netherlands Pechlivanis I, Kiriyanthan G, Engelhardt M, Scholz M, Lucke S, Harders A, Schmieder K (2009) Percutaneous placement of pedicle screws in the lumbar spine using a bone mounted miniature robotic system, first experiences and accuracy of screw placement. Spine J 34(4):392–398 Plaskos C, Cinquin P, Lavalle´e S, Hodgson AJ (2005) Praxiteles: miniature bone-mounted robot for minimal access total knee arthroplasty. Int J Med Robotics Comput Assist Surg 1(4):67–79
52
Y. G€uven and D.E. Barkana
Schulz AP, Klaus S, Queitsch C, Haugwitz AV, Meiners J, Kienast B, Tarabolsi M, Kammal M, J€urgens C (2007) Results of total hip replacement using the Robodoc surgical assistant system: clinical outcome and evaluation of complications for 97 procedures. Int J Med Robotics Comput Assist Surg 3(4):301–306 Shoham M, Burman M, Zehavi E, Joskowicz L, Batkilin E, Kunicher Y (2003) Bone-mounted miniature robot for surgical procedures: concept and clinical applications. IEEE Trans Robot Autom 19(5):893–901 Song S, Mor A, Jaramaz B (2009) HyBAR: hybrid bone-attached robot for joint arthroplasty. Int J Med Robotics Comput Assist Surg 5(2):223–231 Tsai T-C, Hsu Y-L (2007) Development of a parallel surgical robot with automatic bone drilling carriage for stereotactic neurosurgery. Biomedical Engineering, Applications, Basis, and Communications 19(4):269–277 Wolf A, Jaramaz B, Lisien B, DiGioia AM (2005) MBARS: mini bone-attached robotic system for joint arthroplasty. Int J Med Robotics Comput Assisted Surg 1(2):101–121
Chapter 5
Natural Intelligence Based Automatic Knowledge Discovery for Medical Practitioners Veenu Mangat
5.1
Introduction
Swarm Intelligence is an innovative distributed intelligent paradigm for solving optimization problems that originally took its inspiration from the biological examples by swarming, flocking and herding phenomena in vertebrates. SI systems are typically made up of a population of simple agents who interact locally with each other and with their environment. The agents are governed by simple rules and there is no centralized control structure dictating how individual agents should behave. Intelligent global behaviour emerges from the local, and to a certain degree random, interactions between such agents. Natural examples of SI include ant colonies, bird flocking, animal herding, bacterial growth, and fish schooling. Data Mining is an analytical process designed to explore large amounts of data for frequent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. Association rules are a prime formalism for expressing knowledge in a symbolic way. Association rule mining is the process of extracting interesting correlations, frequent patterns and associations among items in data repositories. Association rules generally include simple predictive rules, they work well with user-binned attributes, rule reliability is higher and rules generally refer to larger sets of patients. They also have advantages of simplicity, uniformity, transparency, and ease of inference, which makes them a suitable approach for representing real world medical knowledge (Kotsiantis and Kanellopoulos 2006). Other structures like Decision trees, clustering and Bayesian networks are shown to be not as adequate for medical systems as association rules (Ordonez 2006). Decision trees cannot accommodate degree of sickness, split points are chosen by the tree induction algorithm and they cannot effectively handle conditions on combinations of attribute-value pairs.
V. Mangat (*) University Institute of Engineering and Technology, Panjab University, Chandigarh, India e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_5, # Springer Science+Business Media, LLC 2011
53
54
V. Mangat
Association rules have been used by researchers in medical domain to aid in infection detection and monitoring, staging of breast cancer, leukemia genome expression, understanding what drugs are co-prescribed with antacids, finding frequent patterns in gene data, understanding interaction between proteins, discovering co-occuring diseases, pharmacovigilance (Sordo and Murphy), determining candidates for temporal lobe surgery (Ghannad-Rezaie and Soltanain-Zadeh 2006), and detecting common risk factors in pediatric diseases. The primary issue in mining association rules on a medical data set is the large number of rules that are discovered, most of which are irrelevant. Such a large number of rules make search slow and interpretation by the domain expert difficult. This happens because the frequency requirement for rules is lowered for medical data as these may potentially indicate rare but specific conditions. An association that holds true for even a small number of patients, can be significant and should be considered. Also finding rules with a large number of terms or conditions on attribute values is not uncommon (Kotsiantis and Kanellopoulos 2006). Additionally, majority of data in medical databases describe typical instances leading to generation of commonly known knowledge. This problem is exacerbated by high dimensionality of the data, insufficient number of records and missing information for certain attributes. Some other issues in medical data (Roddick et al. ; Tre´meaux and Liu 2006) include distributed and uncoordinated data collection, strong privacy concerns, diverse data types (image, numeric, categorical, missing information), complex hierarchies behind attributes and a comprehensive knowledge base. The dynamic essence of SI provides flexibility and robustness to process of rule mining. With full control on the extracted rules, SI is a suitable approach to satisfy medical systems requirements. Another algorithm that takes its inspiration from nature, the SFL algorithm has the ability to perform a flexible robust search for a good combination of terms (logical conditions) involving values of the predictor attributes. Therefore, SFL has been modified and developed to suit our application requirement.
5.2
Traditional Rule Mining Methods
An association rule can be defined as: Let I be a set of m distinct attributes, T be a transaction that contains a set of items such that T is a subset of I, D be a database with different transaction records Ts. An association rule is an implication in the form of X -> Y, where X, Y are subsets of I and X \ Y ¼ {null}. Any set of items is called an itemset. X is called antecedent while Y is called consequent. The rule means X implies Y with a certain degree of support and confidence. If the consequent is a 1-itemset which can function as a class label, the rule can be used for classification purpose. Support(s) of an association rule is defined as the percentage/fraction of records that contain X U Y to the total number of records in the database. Confidence of an association rule is defined as the percentage/fraction of the number of transactions
5 Natural Intelligence Based Automatic Knowledge Discovery for Medical Practitioners
55
that contain X U Y to the total number of records that contain X. Confidence is a measure of strength of the association rules. Rule mining problem is usually decomposed into two subproblems. First task is to find those itemsets whose occurrences exceed a predefined threshold in the database; those itemsets are called frequent or large itemsets. The second problem is to generate association rules from those large itemsets with the constraints of minimal confidence. Generally, an association rule mining algorithm contains the following steps: • The set of candidate k-itemsets is generated by adding one item at a time to large (k-1) itemset generated in the previous iteration. • Supports for the candidate k-itemsets are generated by a pass over the database. • Itemsets that do not have the minimum support are discarded and the remaining itemsets are called large k-itemsets. This process is repeated until no more large itemsets are found. Most approaches to rule mining have been based on candidate generation using an Apriori (Agrawal and Srikant 1994) style algorithm or FP-tree (Han and Pei 2000) style approaches to mine rules without candidate generation. Efforts have been made to improve the performance of these techniques by either (1) reducing the number of passes over the database (Wang and Tjortjis 2004), (Yuan and Huang 2005), or (2) sampling data (Parthasarathy 2002; Chuang et al. 2005; Li and Gopalan 2004), or (3) adding extra constraints on the structure of rules (Tien Dung Do et al. 2003; Tseng et al. 2005) or (4) parallelization of operations (Manning and Keane; Schuster and Wolff 2001; Cheung et al. 1996) or (5) a combination of these. But these different strategies still do not return accurate results in a reasonable time. Metaheuristic algorithms are computational methods that optimize a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Metaheuristics make few or no assumptions about the problem being optimized and can search very large spaces of candidate solutions. SI based techniques perform a global search and cope better with attribute interaction than the greedy rule induction algorithms often used in data mining. The improvements are reflected in rules output to the user and classification systems constructed using these rules. Currently, meta-heuristic algorithms mainly include Genetic Algorithm (GA), Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO). These meta-heuristic algorithms have their respective strengths and weaknesses in rule mining. For instance, a single evolution run of GA gives 1-frequent pattern and may take too long in searching for the optimal solution. It requires fine tuning a large number of parameters and involves the risk of premature convergence wherein evolution stops in local optimum. ACO also involves quite a number of parameters, and may fall in local optimum. PSO is relatively simple and has evoked interest of researchers in different areas. A combination of ACO/PSO algorithms has been proven to give acceptable results (Mangat 2010a). Also, a new meta-heuristic algorithm, Shuffled FrogLeaping, works through observing, imitating and modeling the behavior of frogs
56
V. Mangat
searching for food lying on discrete stones randomly located in a pond. The shuffled frog-leaping algorithm draws its formulation from two other search techniques: the local search of the particle swarm optimization technique and the competitiveness mixing of information of the shuffled complex evolution technique. These latter two algorithms are the subject of our experiment.
5.3
Rule Mining Using Swarm Intelligence
A swarm can be viewed as a group of agents cooperating to achieve some purposeful behavior and achieve some goal. The agents use simple local rules to govern their actions and via the interactions of the entire group, the swarm achieves its objectives. A type of self-organization emerges from the collection of actions of the group. An autonomous agent is a subsystem that interacts with its environment, which probably consists of other agents, but acts relatively independently from all other agents (Dorigo et al. 2000). The autonomous agent does not follow commands from a leader, or some global plan (Eberhart and Shi 2001). Particle Swarm Optimization (PSO) and Ant Colonies Optimization (ACO) are currently the most popular algorithms in the swarm intelligence domain. In addition, the shuffled frog leaping (SFL) algorithm (Eusuff and Lansey 2003) has emerged recently as a new meta-heuristic derived from nature.
5.3.1
Combined ACO/PSO
This algorithm uses a sequential covering approach to discover one classification rule at a time according to the following algorithm: Begin RS ¼ {}/* initially, Rule Set is empty */ FOR EACH class C { TS ¼ {all training samples belonging to all classes} WHILE (number of uncovered training examples of class C > MaxUncovExampPerClass) { Run the PSO/ACO algorithm to discover the best rule predicting class C, called BestRule RS ¼ RS U BestRule TS ¼ TS–{training samples correctly covered by discovered rule} } } END;
5 Natural Intelligence Based Automatic Knowledge Discovery for Medical Practitioners
57
Each particle represents the antecedent of a candidate classification rule. The rule’s class is fixed for all the particles in each run of the algorithm since each run of the algorithm aims at discovering the best rule for a fixed class. This has the advantage of avoiding the problem of having different particles predicting different classes in the same population. Continuous values can be directly represented as a component of the vector associated with a particle and processed using the standard PSO. The vector can be converted into a set of terms and added to rule generated by ACO. For this purpose, we define upper and lower bounds for the continuous attribute in the rule. So there are two dimensions for each continuous attribute. These bounds are seeded by calculating range of all continuous attributes and adding the attribute’s value to range for upper bound and similarly deducting attribute’s value from range for lower bound. If the two bounds in rule condition conflict, they are omitted from rule, but the position and velocity of particle are updated. A particle contains a number of pheromone matrices equal to number of categorical attributes in the data set. Each pheromone matrix contains values for pheromones for each possible value that that attribute can take plus a flag value (the indifference flag) indicating whether or not the attribute is selected to occur in the decoded rule. Updating a particle’s pheromone (the probabilities of choosing attribute values) is done as follows: tcij ¼ tcij þ ð’1 Qc Þ; for all ij belongs to CurrentRule
(5.1)
tcij ¼ tcij þ ð’2 QP Þ; for all ij belongs to BestPastRule
(5.2)
tcij ¼ tcij þ ð’3 Q1 Þ; for all ij belongs to BestLocalRule
(5.3)
tcij ¼ tcij =ðSaiþ1 j¼1 tcij Þ
(5.4)
Where tcij is the amount of pheromone in the current particle c, for attribute i, for value j. Q is the quality of the rule as given by (5.5). F is a random learning factor in the range 0.1. Q ¼ ðTruePos=ðTruePos þ FalseNegÞÞ ðTrueNeg=ðFalsePos þ TrueNegÞÞ (5.5) The population is initialized in positions with nonzero qualities by taking a record from the class to be predicted and using its terms (attribute values) as the rule antecedent. Then a pruning procedure based on term quality is initially applied, and for other iterations a method similar to ACO’s pruning is applied for the final rule produced by each run of the hybrid PSO/ACO algorithm (Holden and Freitas 2007). The rule set so generated on a per class basis is ordered according to rule quality before it is used to classify examples. This helps in conflict resolution as any example is only considered covered by the first rule that matches it from the ordered rule list.
58
5.3.2
V. Mangat
ACO/PSO with Precision Fitness
This algorithm uses a sequential covering approach similar to ACO/PSO to discover one classification rule at a time. Begin RS ¼ {} FOR EACH class C { TS ¼ {All training examples belonging to any class} WHILE (Number of uncovered training examples belonging to class C > MaxUncovExampPerClass) { Run the NRule algorithm to discover best nominal rule predicting class C called Rule Run the standard PSO algorithm to add continuous terms to Rule, and return the best discovered rule BestRule Prune BestRule RS ¼ RS U BestRule TS ¼ TS {training examples covered by discovered rule} } } Order rules in RS by descending Quality Prune RS removing unnecessary terms and/or rules End; A single iteration of this loop only discovers rules based on nominal attributes, returning the best discovered rule. For the continuous part of the rule, a conventional PSO algorithm with constriction is used. The vector to be optimized consists of two dimensions per continuous attribute, one for an upper bound and one for a lower bound. At every particle evaluation, the vector is converted to a set of terms and added to Rule produced by the algorithm for fitness evaluation. If two bounds cross over, both terms are omitted from decoded rule, but Personal Best position is still updated in those dimensions using (5.6) vid ¼ wðvid þ c1 ’1 ðPid xid Þ þ c2 ’2 ðPgd xid ÞÞ xid ¼ xid þ vid
ð5:6Þ
To improve the performance of the PSO algorithm, each particle’s initial position is set to a uniformly distributed position between the value of a randomly chosen seed example’s continuous attribute and that value added to the range for that attribute (for upper bound) and at a uniformly distributed position between an example’s value and an example’s value minus the range for that attribute (for lower bound). The particles are prevented from fully converging using the
5 Natural Intelligence Based Automatic Knowledge Discovery for Medical Practitioners
59
Min-Max system. After the BestRule has been generated it is then added to the rule set after being pruned using ACO’s pruning method. But since this is computationally expensive, ACO pruning is applied only if the number of terms is less than a fixed number. Nominal attributes are handled by the NRule algorithm as follows: Begin Initialise population REPEAT for MaxInterations FOR every particle x Set Rule Rx ¼ “IF {null} THEN C” FOR every dimension d in x Use roulette selection to choose whether the state should be set to off or on. If it is on then the corresponding attribute-value pair set in the initialization will be added to Rx; otherwise (i.e., if off is selected) nothing will be added. LOOP Calculate Quality Qx of Rx P ¼ x’s past best state Qp ¼ P’s quality IF Qx > Qp Qp ¼ Qx P¼x END IF LOOP FOR every particle x P ¼ x’s past best state N ¼ the best state ever held by a neighbour of x according to N’s quality Qn FOR every dimension d in x IF Pd ¼ Nd THEN pheromone entry corresponding to the value of Nd in the current xd is increased by Qp ELSE IF Pd ¼ off AND seeding term for xd 6¼ Nd THEN pheromone entry for the off state in xd is increased by Qp ELSE pheromone entry corresponding to the value of Nd in the current xd is increased by Qp END IF Normalize pheromone entries LOOP LOOP LOOP RETURN best rule discovered End;
60
V. Mangat
Assume a Von Neumann architecture wherein each particle has four neighbours. Initially, pheromone state in each dimension is set to 0.9 for on and 0.1 for off. Quality, Q is defined using Precision as given by (5.7): Laplacecorrected Precision ¼ ð1 þ TPÞ=ð1 þ TP þ FPÞ If TP< MinTP; Q ¼ LaplaceCorrected Precision 0:1; ELSE Q ¼ LaplaceCorrected Precision
ð5:7Þ
where MinTP is the least number of correctly covered examples that a rule has to cover (Holden and Freitas 2008).
5.3.3
Shuffled Frog Leaping(SFL)
Shuffled frog-leaping algorithm (SFL) is a new memetic meta-heuristic algorithm with efficient mathematical function and global search capability. It involves a set of frogs that cooperate with each other to achieve a unified behavior for the system as a whole, producing a robust system capable of finding high quality solutions for problems with a large search space. It uses concept of memes. Memes spread through the behavior that they generate in their hosts. Like the gene complexes found in biology, memeplexes are groups of memes that are often found present in the same individual. Applying the theory of Universal Darwinism, memeplexes group together because memes will copy themselves more successfully when they are grouped together. The pseudocode of SFL algorithm (Eusuff and Lansey 2003; Elbeltagi et al. 2005) is as follows: Begin Generate random population of P solutions (frogs) For each individual i that belongs to P: calculate fitness (i) Sort the population P in descending order of their fitness Divide P into m memeplexes For each memeplex Determine the best and worst frogs Improve the worst frog position using (8) NewpositionXi ¼ CurrentpositionXi + NewVelocityVi (8) Repeat for a predefined specific number of iterations End for Combine the evolved memeplexes Sort the population P in descending order of their fitness Check if termination ¼ true End;
5 Natural Intelligence Based Automatic Knowledge Discovery for Medical Practitioners
61
An initial population of P frogs is created randomly. The structure of an individual frog for rule mining problem is composed of a set of attribute values. The velocity of individual i corresponds to the attribute update quantity covering all attribute values, the velocity of each individual is also created at random. The elements of position and velocity have the same dimension. In the next step, the frogs are sorted in a descending order according to their fitness. Fitness is defined as given by (5.5). The entire population is divided into m memeplexes, each containing n frogs. In this process, the first frog goes to the first memeplex, the second frog goes to the second memeplex, frog m goes to the mth memeplex, and frog m + 1 goes back to the first memeplex, etc. Within each memeplex, PSO is applied to improve only the frog with the worst fitness (not all frogs) in each cycle. If no improvement becomes possible in this case, then a new solution is randomly generated to replace that frog. The calculations then continue for a specific number of iterations. Rule pruning is done iteratively to remove one-term at a time from the rule while this process improves the quality of the rule, and the quality of the resulting rule is computed by (5.5).
5.3.4
SFL with Precision Fitness
As discussed in the previous section, the SFL algorithm has the ability to perform a flexible robust search for a good combination of terms (logical conditions) involving values of the various attributes. To prevent local optima, a submemeplex is constructed in each memeplex, which consists of frogs chosen on the basis of their respective fitness. The better the fitness, the easier it is chosen. So we modify the fitness function to suit our application requirements, which is that the false positives should be penalized severely. Quality (in turn fitness) is now defined using (5.7). The rest of the algorithm is the SFL discussed in previous subsection.
5.4 5.4.1
Experimental Setup Database
The data sets used for rule mining are from the STULONG data set. STULONG is an epidemiologic study carried out in order to elaborate the risk factors of atherosclerosis in a population of middle aged men (Chapman et al. 2000). The primary data for our experiment is sourced from ‘entry’ table of database. Our study focuses on identifying the relationship between alcohol intake (9 attributes), smoking (3 attributes), physical activities (4 attributes) and biochemical attributes (3 attributes). These can be used to classify a patient as hyperlipidemic or not. Hyperlipidemia is defined as the presence of high levels of cholesterol and/or triglycerides in the blood. It is not a disease but a
62
V. Mangat
metabolic derangement that can be secondary to many diseases and can contribute to many forms of disease, most notably cardiovascular disease. The main aim was to find out how previously known knowledge relates to STULONG data. Data corresponding to 1419 persons has been considered. The attributes were manually extracted. Continuous attributes need to be discretized for ACO. For this purpose, field knowledge of medical experts was used to discretize according to known standards of medical community. Though a large number of small intervals lead to a better model in terms of accuracy, yet discretization has to be controlled as there is a risk of overfitting of data in addition to large search space and an incomprehensible model if the number of intervals is too large. Nominal and categorical data was cleaned to handle missing values. ACO/PSO algorithms can handle both nominal and continuous attributes.
5.4.2
Setting of Parameters
For ACO, the following parameter values were taken: Number of Ants ¼ 2000, Minimum number of records per rule ¼ 15, maximum number of uncovered records ¼ 20(usually 10% of total records of class) and number of rules to test ant convergence ¼ 30. For PSO/ACO and PSO/ACO with PF, number of particles ¼ 100 and number of iterations ¼ 200. For PSO/ACO with PF, ACO pruning was used if rule has less than 20 terms. The value for minimum number of true positives ¼ 15, constriction factor w ¼ 0.729, social and personal learning coefficients, c1 ¼ c2 ¼ 2.05. Maximum number of uncovered examples per class was set to 20. Also, the constant factor of 0.1 in (5.7) was replaced with 0.4 in order to penalize false positives more severely, as this is desirable in medical domain. These values are not optimized. As SFLA is a relatively new algorithm, there is no theoretical basis for parameters setting. The number of memeplexes has been taken as 10, frog population is set to 10n (n being number of attributes), number of submemeplexes is set to 2n/3, number of independent runs was 30 and local exploration was carried out n times (Mangat 2010b).
5.5
Results
The first criterion used to analyze the performance of the various implemented techniques is predictive accuracy, defined in terms of cross validation accuracy rate, which in turn equals quotient between number of test cases correctly classified and the total number of test cases. A k-fold cross validation was used with value of k ¼ 10. This is a standard technique used to evaluate accuracy of data mining techniques. The other two criteria for performance evaluation are the number of rules in a rule set and the number of attribute value combinations or conditions per rule. Table 5.1 summarizes the results obtained by the combined ACO/PSO, ACO/ PSO with Precision Fitness, SFL and SFL with Precision Fitness algorithms.
5 Natural Intelligence Based Automatic Knowledge Discovery for Medical Practitioners
63
Table 5.1 Comparison between ACO/PSO, ACO/PSO with PF, SFL and SFL with PF Predictive accuracy Number of rules in rule set Number of terms in rule ACO/PSO 85.1% 17.1 2.27 9.88 0.21 ACO/PSO with PF 91.5% 13.3 1.0 6.81 0.14 SFL 90.1% 8.2 5.3 SFL with PF 94.2% 6.1 4.4
5.6
Conclusion and Future Work
The success of a rule mining method depends on the quality of rules it uncovers. Rule quality can be viewed in terms of its accuracy and comprehensibility. A rule will be usable by a medical practitioner if it is accurate and easily understood. All four techniques studied provide accuracy comparable to other non SI based mining approaches. SFL with PF shows very good results. A system for rule mining over medical data needs to include several kinds of rules without making the system unwieldy. High confidence rules with low support are important to discover rare, but specific conditions. Very generalized rules with high support are required to monitor generic behavior. A system generating large number of rules or rules with too many conditions in the antecedent, tends to confuse the end user and is not interesting for medical knowledge discovery. Shuffled Frog Leaping with new quality measure of fitness performs the best in terms of comprehensibility and accuracy. This method also penalizes false positives severely, which is a desirable property for data mining in the medical domain. One drawback of the approach is the complexity of the algorithm. A possible further research direction is to introduce new data structures to reduce execution time. Certain domain specific constraints can be applied in the preprocessing phase to reduce the input data size. To balance between efficiency and exploration capability, extensive experiments need to be conducted with different settings of parameters to arrive at the optimal values for these algorithms. Acknowledgment I would like to extend my heartfelt gratitude to Prof. Renu Vig for her guidance and support in carrying out this study. I am also grateful to Panjab University authorities for providing support in the form of necessary infrastructure and tools. On a personal front, I am indebted to my family for always keeping my morale high.
References Kotsiantis S, Kanellopoulos D (2006) Association rules mining: a recent overview. GESTS International Transactions on Computer Science and Engineering 32(82):71–82 Ordonez C (2006) Comparing association rules and decision trees for disease prediction. ACM HIKM’06, November 11, 2006, Arlington, Virginia, USA Sordo M, Murphy DN A (2009) PSO/ACO approach to knowledge discovery in a pharmacovigilance context. Harvard Medical School, Charlestown, MA, USA
64
V. Mangat
Ghannad-Rezaie M, Soltanain-Zadeh H (2006) Senior Member, IEEE, M.-R. Siadat, K.V. Elisevich Medical data mining using particle swarm optimization for temporal lobe epilepsy. 2006 IEEE Congress on Evolutionary Computation Roddick JF, Fule P, Graco WJ (2003) Exploratory medical knowledge discovery experiences and issues. Health Insurance Commission, Australia. ACM SIGKDD Explorations Newsletter, 5(1) Tre´meaux J-M, Liu Y (2006) Mining for association rules in medical data. http://naku.dohcrew. com/dea-ecd/Tremeaux-Liu-2006.pdf Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, pp 487–499 Han J, Pei J (2000) Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explorations Newsletter 2(2):14–20 Wang C, Tjortjis C (2004) PRICES: an efficient algorithm for mining association rules. Lecture Notes in Computer Science 3177:352–358 Yuan Y, Huang T (2005) A matrix algorithm for mining association rules. Lecture Notes in Computer Science 3644:370–379 Parthasarathy S (2002) Efficient progressive sampling for association rules. ICDM 2002, pp 354–361 Chuang K, Chen M, Yang W (2005) Progressive sampling for association rules based on sampling error estimation. Lecture Notes in Computer Science 3518:505–515 Li Y, Gopalan R (2004) Effective sampling for mining association rules. Lecture Notes in Computer Science 3339:391–401 Do Tien Dung, Hui Siu Cheung, Fong Alvis (2003) Mining frequent itemsets with category-based constraints. Lecture Notes in Computer Science 2843:76–86 Tseng M, Lin W, Jeng R (2005) Maintenance of generalized association rules under transaction update and taxonomy evolution. Lecture Notes in Computer Science 3589:336–345 Manning A, Keane J (2007) Data allocation algorithm for parallel association rule discovery. Lecture Notes in Computer Science 2035:413–420 Schuster A, Wolff R (2001) Communication-efficient distributed mining of association rules. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, California, pp 473–484 Cheung D, Han J, Ng V, Fu A, Fu Y (1996) A fast distributed algorithm for mining association rules. In Proceedings of 1996 International Conference on Parallel and Distributed Information Systems, Miami Beach, Florida, pp 31–44 Dorigo M, Bonaneau E, Theraulaz G (2000) Ant algorithms and stigmergy. Future Generation Computer Systems 16:851–871 Eberhart RC, Shi Y (2001) Particle swarm optimization: developments, applications and resources. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Seoul, Korea Eusuff MM, Lansey KE (2003) Optimization of water distribution network design using the shuffled frog leaping algorithm. Journal of Water Resources Planning and Management, American Society of Civil Engineers, Reston, June 2003, pp 210–225 Holden N, Freitas AA (2007) A hybrid PSO/ACO algorithm for classification. In Proceedings of the 9th Genetic and Evolutionary Computation Conference Workshop on Particle Swarms: The Second Decade (GECCO ’07), pp 2745–2750, ACM Press, London, UK, July 2007 Holden N, Freitas A (2008) A hybrid PSO/ACO algorithm for discovering classification rules in data mining. Journal of Artificial Evolution and Applications (JAEA) Elbeltagi E, Hegazy T, Grierson D (2005) Comparison among five evolutionary-based optimization algorithms. Advanced Engineering Informatics, Elsevier Science Publishers, Amsterdam, January 2005, pp 43–53 Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) CRISP-DM 1.0. Technical report, The CRISP-DM Consortium, 2000 Mangat V (2010a) Swarm intelligence based technique for rule mining in the medical domain. International Journal of Computer Applications IJCA 4(1) Mangat V (2010b) Increasing performance of rule mining in the medical domain using natural intelligence concepts. Proceedings of the World Congress on Engineering and Computer Science, San Francisco, USA, pp 529–534
Chapter 6
Design and Optimal Control of a Linear Electromechanical Actuator for Motion Platforms with Six Degrees of Freedom Evzen Thoendel
6.1 6.1.1
Introduction History
Motion platforms with six degrees of freedom, also known as hexapods, are possibly the most popular robotic manipulators used in simulation technology. This parallel mechanism was first described by Gough (1957), who constructed an octahedral hexapod to test the behaviour of tyres subjected to forces created during airplane landings. The first document providing a detailed description of this structure used as an airplane cockpit simulator was published in 1965 by Steward (1966) (hence the name of the structure). The Stewart platform is a closed kinematic system with six degrees of freedom and six adjustable length arms (see Fig. 6.1). Compared to other similar structures, its main advantage is high rigidity and a high input-power-to-device-weight ratio. Until the 1990s of the last century, the main obstacle hindering more intensive application development was the insufficient computing power of the available hardware. Determining the position of a hexapod is significantly more difficult than that of conventional serial structures. With regard to the device being controlled in real time, the main challenges are the transformation of coordinates and speed of resolving mathematical procedures (Liu et al. 1993). Despite the computing power issue having been more or less removed in recent years, direct kinematic transformation (i.e. transforming the length of arms to the position of the frame) still remains a challenge.
E. Thoendel (*) Department of Electric Drives and Traction, Czech Technical University, Prague e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_6, # Springer Science+Business Media, LLC 2011
65
66
E. Thoendel
Fig. 6.1 Current state – motion platform with hydraulic cylinders
6.1.2
Use in Simulation Technology
The Stewart platform is frequently used in simulation technology to simulate motion effects in vehicle or airplane simulators. By using this equipment, it is possible to simulate the forces acting upon the pilot (driver) during the flight (journey), thus bringing the simulator even closer to reality. The concept and role of motion effect simulation in training is discussed, for instance, in Lee (2005); apart from describing the structure of simulators, the book expressly underlines the role of this aspect during emergency event training. With motion effects being generally perceived before other kinds of perceptions (Thoendel 1981), they provide the first possibility of detecting undesired and dangerous behaviour of the airplane or vehicle.
6.1.3
Current Status
Due to the significant weight of the simulator cockpit fitted with the required audiovisual equipment and controls (the weight of the cockpit described in Thoendel (2008), for instance, amounts to 1.5 t) and the need to achieve high
6 Design and Optimal Control of a Linear Electromechanical Actuator. . .
67
dynamics levels, the platform’s linear actuators are usually implemented as hydraulic cylinders in simulation technology applications. In spite of this solution meeting the above requirements for sufficient power and dynamics levels, it suffers from the fundamental drawbacks of all hydraulic systems. These drawbacks include, in particular, higher spatial, temporal and financial installation requirements resulting from the need to employ a sufficiently powerful hydraulic aggregate and piping. In addition, hydraulic solutions suffer from relatively high noise levels, which cannot be avoided by separating the aggregate from the platform, as this would inevitably result in losses in the hydraulic piping, translating into decreases in dynamics. Moreover, the aggregate cannot be enclosed in a sound-proof box as this solution does not ensure sufficient dissipation of excess heat; installing the aggregate in a closed room requires an air-conditioning unit to be used, further increasing the costs. The drawbacks of hydraulic systems include environmental aspects too. In particular, hydraulic oil has to be replaced after a certain number of operation hours and can, in cases of system malfunctions and breakdowns, cause local pollution. Therefore, any malfunctions of hydraulic systems have to be resolved with utmost care and attention. Hence, in instances where it is envisaged that the device will be transported on a regular basis (such as the light sports aircraft simulator described in Thoendel (2009)) it is often preferable to use an electric motion system, which, being more affordable and requiring significantly shorter installation times, makes the device more attractive to customers. Compared to hydraulic solutions, electric systems benefit from many advantages – in particular much less noise, higher energy efficacy and more sophisticated control methods. However, besides their indisputable advantages, electric systems also have several disadvantages. The electromechanical transmission is subject to higher friction, increasing the wear and tear of the actuator. Therefore, the lifetime of electric systems is typically somewhat shorter than that of hydraulic solutions. In addition, electric systems require a procedure to bring the device to a safe halt after unexpected power cuts. In the case of hydraulic systems, the ‘safety landing’ procedure is catered for using oil from an appropriately sized hydraulic accumulator. With regard to UPS units significantly increasing the price of the system, power failure emergencies are typically handled using mechanical locks which, in case of an unexpected power cut, fix the platform in its current position. Nonetheless, emergency descents of platforms with an electric drive still remain problematic. Due to their many pros, there have been growing calls to replace, in certain applications, hydraulic cylinders with electric linear actuators while keeping their static and dynamic properties. The first certified aircraft simulator in the world using an electric motion system was finished in 2006 and, according to Repas and Murthy (2009), this trend will prevail in future as well. The following section provides a detailed description of an electric linear actuator intended for the application mentioned above. The main emphasis has been put on the preparation of a mathematical model which will be subsequently used to design the optimal control (Thoendel 2010).
68
6.2
E. Thoendel
Electromechanical Linear Actuator
With regard to the forces required, the electromechanical actuator is based on converting rotary motion to linear motion via a ball screw. Benefitting from high efficiency, rigidity and accuracy ball screws are often used in machine tool construction. The design of the whole electromechanical actuator is shown in detail in Fig. 6.2.
6.2.1
Mathematical Model of the Electromechanical Actuator
In order to further analyse the electromechanical system and, in particular, with respect to the need to design the optimal control, this section deals with the mathematical model to be used. However, it is not necessary at this point to
Fig. 6.2 Electromechanical actuator
6 Design and Optimal Control of a Linear Electromechanical Actuator. . .
69
fine-tune all model parameters, the main aim being to ensure that the model reflects all significant dynamic properties of the electromechanical actuator and that it works with quantities which can be easily derived or directly measured in the system. The following assumptions were made before designing the mathematical model: • For economical reasons, position or velocity is measured only at one location (motor shaft or ball screw). Therefore, the electromechanical system will be modelled as a system with one degree of freedom. • With respect to the above, the model will ignore the torsional rigidity effects of the ball screw and the mechanical compliance of the connection between the shaft and the ball screw. As shown later by the control results, these factors have only a minor impact on the quality of control with respect to the criteria used to assess regulation quality. The differential equation of motion can be defined immediately after replacing the system with a single virtual body with the generalized mass mred subject to all forces and moments Fred. Called the ‘Generalized Forces Method’, the basic equation can be written as: mred x€ þ
1 dmred 2 x_ ¼ Fred : 2 dx
(6.1)
In addition, the following properties apply (see, for instance, Nozicka 1991): • The kinetic energy of the generalized mass is equal to the sum of all kinetic energies of the elements of the system. • The virtual work of the generalized force is equal to the sum of the virtual works of all forces and moments in the system. The kinetic energy of the mechanical system can be expressed as: 1 1 1 Ek ¼ m x_ 2 þ I ’_ 2 ¼ mred x_ 2 ; 2 2 2
(6.2)
where m is the mass of the load, x the displacement of the ball screw, I the moment of inertia of the rotary parts of the system and j the rotation angle. The relationship between the translational position x and the rotation angle is defined by the conversion constant kmex, while: ’ ¼ kmex x:
(6.3)
By substituting into the previous expression, we obtain the following equation: mred ¼ I k 2mex þ m:
(6.4)
70
E. Thoendel
According to the virtual work principle, the following relationship holds true: Fred dx ¼
N X
dW j ¼ Mh d’ mg dx Ft dx;
(6.5)
j¼1
_ is the friction force. where Ft ¼ Ft0 sgnðxÞ The above expression can be used to determine the generalized force Fred: _ Fred ¼ Mh kmex mg Ft0 sgnðxÞ;
(6.6)
where Mh is the motor torque, m the load mass and g the gravitational acceleration. By substituting into the basic ‘Generalized Forces Method’ equation, we obtain the equation of motion for the mechanical part of the actuator:
_ I k2mex þ m x_ ¼ Mh kmex mg Ft0 sgnðxÞ:
(6.7)
The driving torque Mh is generated by an AC servomotor controlled by a servo driver. With regard to these ddevices being shipped by the manufacturer with the optimum current/moment regulator settings, the device shall be modelled as a firstdegree dynamic system with the time constant t. With respect to small time constants, the remaining dynamic properties of the servo drive can be left out of consideration, playing only a negligible role in the overall behaviour of the system and being irrelevant for the control design. Servo drive dynamics can be expressed with the transfer function Gel ðsÞ ¼
kel Mh ðsÞ ¼ ; tsþ1 uðsÞ
(6.8)
where u is the variable corresponding to the required moment and, by the same token, the input signal of the system and kel the electric constant of the motor.
6.2.2
Design of the Optimal Control
For the sake of convenience, the mathematical model will be written using matrices of state and transformed into the discrete form. Non-linearity caused by friction forces can be compensated for by adding to or subtracting from the input signalu the value uFt corresponding to the friction force Ft0, and therefore will be left out of consideration. Below are the equations of state of the system: x^_ ¼ A^ x þ B^ u; y ¼ C^ x þ D^ u;
(6.9)
6 Design and Optimal Control of a Linear Electromechanical Actuator. . .
71
where 2
2 kmex 3 0 6 t mred 7 6 6 7 6 ; B¼4 0 A ¼6 1 0 0 7 4 5 1 kel 0 0 3 t 2 2 3 1 0 0 0 0 7 6 6 0 1 07 6 7 7; D ¼ 6 0 0 7; C ¼6 7 6 4 5 4 15 0 0 0 0 t 3 2 " # x_ u 7 6 : x^ ¼4 x 5; u^ ¼ 1 Mh 0
0
mg 3 mred 7 7 0 5; 0
ð6:10Þ
The corresponding discrete form of the system for the sampling period T can be written as: x^ððn þ 1ÞT Þ ¼ Ad x^ðnTÞ þ Bd u^ðnTÞ; yðnT Þ ¼ C^ xðnTÞ þ D^ uðnTÞ;
ð6:11Þ
Ad ¼ eAT ; Z T AT eAt dt B: Bd ¼ e
ð6:12Þ
where
0
In designing the optimal control, the following quality criteria will be considered: • The regulated variable will be the displacement of the ball screw x. • In simulation technology (where the device in question is used to reproduce forces), system response time (i.e. transmission bandwidth) is the most important and essential quality parameter. • With regard to small steady-state deviations from the desired position not being perceptible to persons sitting in the simulator cockpit, regulation accuracy is not critical in this application. • In spite of this requirement not being as critical as, for instance, in machine tools control, where similar mechanisms involving ball screws are often used, unit step response should result in small overshoots only. • Control has to be sufficiently robust to react flexibly to changes in the load m, which can be a value from the preset interval m 2 h0; mmax i:
72
E. Thoendel
In terms of control theory, it is advisable to use as much information about the controlled system as possible. Ideally, we should be able to either directly measure or somehow derive all state variables. In this case, the control law equals to (see, for instance, Rossiter 2003): u ¼ K x x^ þ K r ref þ u0 ;
(6.13)
where Kx is the row vector, Kr the scalar, ref the reference/desired position and u0 ¼ mg=kel kmex the constant compensating the effects of the load. It can be proved that using state space control (state feedback loops) it is theoretically possible to control the behaviour of the system as required (Havlena and Stecha 2002). In practice, however, one is limited by the input signal, which amounts to finite values from the range u 2 humin ; umax i: The following sections focus on determining the optimal state space control to regulate the position of the electromechanical linear actuator. The optimal control is one which minimises the optimality criterion; let us now define a quadratic criterion based on the control quality requirements set forth hereinabove: min J ¼ min u
u
N X
J n ¼ min
n¼1
u
N X
J yn þ J un þ J overshoot ; n
(6.14)
n¼1
where, at the optimisation horizon N, Jy penalises the deviation from the desired position: J yn ¼ q1 ½ yðnÞ ref ðnÞ2 ;
(6.15)
with Ju penalising the input signal if it exceeds the allowed limits: 8 < q2 ðu umax Þ2 J un ¼ q2 ðu umin Þ2 : 0
u>umax ; u
(6.16)
and Jovershoot penalising the control in case of overshoots during positive unit step responses (i.e. non-monotonous responses): J overshoot ¼ n
q3 x_ 2 0
x_ < 0 : x_ 0
(6.17)
The quantities q1, q2 and q3 are the masses and, at the same time, normalisation coefficients of individual elements of the criterion.
6 Design and Optimal Control of a Linear Electromechanical Actuator. . . Fig. 6.3 Determining the optimal control
73
Best Function Value: 4.6952
7.5
Function value
7 6.5 6 5.5 5 4.5
0
100
200
300
400
Iteration
6.2.3
Minimising the Criterion via a Genetic Algorithm
Following on from the above, the control strategy in (6.13) is completely described by the vector K ¼ [KxKr], with control quality being determined by the criterion J. Hence, in determining the optimal control, the goal is to find the vector K with the minimum value of (the criterion) J. This issue can be approached using a genetic algorithm employing the principles of evolution biology (crossbreeding and mutation) to solve complex problems. Each individual in the population is described by their ‘chromosome’ (in our case the vector K ), with the probability of this genetic information being passed to the next generation being directly proportional to its quality (lower values of the criterion J). In addition, there is a small chance that this genetic information will mutate (random changes of genes within the chromosome). The initial population of several randomly chosen individuals will be left to evolve under the simple evolution rules defined above. After several generations, we select the best individual whose ‘chromosome’ contains the ideal solution for our problem (essentially, we are ‘breeding’ the solution for the optimum control problem). With regard to the nature of this paper, it is not possible to provide here comprehensive information on the properties of genetic algorithms and convergence conditions. However, a detailed description can be found, for instance, in Marik et al. (2003). The following chart (Fig. 6.3) shows the algorithm convergence when searching for the optimal control value after 400 generations. The criterion J and the whole algorithm used to determine its minimum value can be easily implemented for example in the MATLAB-Simulink environment, as shown in Fig. 6.4. Figure 6.5 shows the simulation results (response to step changes of the position) of the optimal state regulator controlling the system under the parameters defined in Table 6.1. These parameters correspond to the system shown in Fig. 6.2. Simulation results show that step responses do not result in overshoots and only small deviations from the desired steady-state position can be observed.
74
E. Thoendel
u2
q1 K Ts −K− z−1
K Ts z−1
2
u
J_y (error)
1 u_overshoot
q2 u0
ref
u_lim dx
−K−
u
y(n)=Cx(n)+Du(n) x(n+1)=Ax(n)+Bu(n)
u_lim
Kr
0.05
1
x
J
x,ref
J To Workspace
M Scope
Discrete State−Space K*u Kx
q3
ref 2
<
u
−K−
J_overshoot
K Ts z−1
0
Fig. 6.4 Implementation of the model of the electromechanical actuator
u [V]
10 0
dx/dt [m/sec]
−10
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0.2 0 −0.2
x [m]
0.1
Position Ref
0.05 0
0
0.2
0.4
0
0.2
0.4
0.6
0.8
1
0.6
0.8
1
M [Nm]
2 0 −2
time [s]
Fig. 6.5 Simulation results
6.2.4
The Robustness Condition
The designed control has to ensure that the system remains stable after a load change and for all load values from the relevant interval.
6 Design and Optimal Control of a Linear Electromechanical Actuator. . . Table 6.1 Parameters of the electromechanical actuator
Name kmex kel I m g t T uFt Ft0
Value 800p 0.2 5.5429 104 < 0..512 > 9.81 0.01 0.01 0.4 16p
75
Unit rad/m Nm/V kg m2 kg ms2 s s V N
It can be proved by simulation that the system will remain stable if the force exerted by the load does not exceed the maximum force which can be generated by the drive. Therefore: m<
umax kel kmex ¼ mmax : g
(6.18)
If the above condition is met the system shall remain stable and load changes not compensated for by the input signal u0 (see 6.13) will have an impact on the size of the steady-state deviation only.
6.2.5
Implementing the Control on a Real System
In order for state space control to be possible, all state variables have to be known in each control step. In practice, however, the electromechanical system is equipped with one sensor only, namely the position sensor fitted on the motor shaft. The remaining state variables (velocity and moment) have to be derived accordingly. Velocity can be determined by differentiating the current position, and the current moment at the shaft of the motor can be ascertained based on the input signal u via relationship (6.8).
6.2.6
Comparing Dynamic Properties of Electromechanical and Hydraulic Systems
In this case, dynamic properties cover, in particular, the transmission bandwidth, in other words the input signal frequency range which can be transmitted by the unchanged/undampened system. In literature, this maximum frequency is defined as the frequency when the amplitude of the output signal drops to 3dB.
76
E. Thoendel
Magnitude [dB]
0 −3 −4 −6 −8 −10 0.1
0.4 Frequency [Hz]
0.8
1
1.3 1.6
2
0 Phase [deg]
Electric actuator −20
Hydraulic actuator
−40 −60 −80 0.1
0.4 Frequency [Hz]
0.8
1
1.3 1.6
2
Fig. 6.6 Measured frequency characteristics of the real system (signal 10%)
Figure 6.6 shows and compares the results of frequency characteristics measurements conducted for hydraulic and electromechanical actuators. The position of the hydraulic actuator is controlled by a typical PID regulator, whereas the electromechanical actuator uses the results of the optimal state space regulator described here. As given in the figure, the transmission frequency bandwidth of the electromechanical actuator is equal to twice that of the hydraulic system.
6.3
Conclusion
This paper describes the process of designing an electromechanical actuator which can be used as a suitable replacement for hydraulic cylinders in simulation technology applications using six-degree-of-freedom motion platforms. Providing a description of the optimal state space control, the paper compares the operational and dynamic properties of hydraulic and electromechanical systems, proving that the latter can achieve better dynamics results. However, one has to take into consideration a decrease in the lifetime of the device as a consequence of increased wear and tear resulting from mechanical friction. In addition, ensuring a safe shut-down procedure in case of unexpected power cuts can be quite costly. In spite of these drawbacks, current trends and customer demand show that electromechanical motion platforms will gradually replace current hydraulic systems, the main reasons, apart from better dynamic properties, being significantly lower noise, increased ease of installation and better energy efficiency.
6 Design and Optimal Control of a Linear Electromechanical Actuator. . .
77
References Gough E (1956–1957) Contribution to discussion of papers on research in automobile stability, control and tyre performance. Proc Auto Div Inst Mech Eng 171:392–394 Havlena V, Stecha J (2002) Teorie dynamickych systemu. Czech Technical University, Prague Lee AT (2005) Flight simulation. Ashgate Publishing Limited, Aldershot Liu K, Fitzgerald JM, Lewis FL (1993) Kinematic analysis of a stewart platform manipulator. IEEE Trans Ind Electron 40:282–293 Marik V, Stepankova O, Lazansky J (2003) Umela inteligence 3. Academia, Praha Nozicka J (1991) Mechanika a termodynamika. Czech Technical University, Prague Repas R, Murthy S (2009) Electric actuators replace hydraulics in flight simulators. http:// machinedesign.com Rossiter JA (2003) Model based predictive control. CRC, Boca Raton Steward D (1965–1966) A platform with six degrees of freedom. Proc Inst Mech Eng 180:392–394. Thoendel E (1981) Simulace pohybovych vjemu na pilotnim trenazeru. PhD thesis, Czech Technical University, Prague Thoendel E (2008) Simulating motion effects using a hydraulic platform with six degrees of freedom. Proceedings of IASTED International Conference on Modelling and Simulation. Acta Press, Anaheim Thoendel E (2009) Simulator lehkeho a ultralehkeho sportovniho letadla. Automatizace 367–370 Thoendel E (2010) Electric motion platform for use in simulation technology – design and optimal control of a linear electromechanical actuator. In Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2010. WCECS, San Francisco, pp 960–965
Chapter 7
Parametric Identification of a Power-System Emulator Rube´n Salas-Cabrera, Oscar Martı´nez-Herna´ndez, Julio C. Rosas-Caro, Jonathan C. Mayo-Maldonado, E. Nacu´ Salas-Cabrera, Aaron Gonza´lez-Rodrı´guez, Hermenegildo Cisneros-Villegas, Rafael Castillo-Gutierrez, Gregorio Herna´ndez-Palmer, and Rodolfo Castillo-Ibarra
7.1
Introduction
This paper presents the modeling and parametric identification of a motor-generator set. Our aim is to obtain a model including the corresponding parameters for experimental research in power systems. The long term objective would be to have a turbine-generator emulator as an experimental tool for studying the transient behavior of a power system when small disturbances appear. The electric machines involved in this turbine-generator emulator are a shunt connected DC motor and a 3-phase salient-pole synchronous generator. One of the contributions of this work is that nonlinear and linearized dynamic models are derived. These models describe the steady state and transient behavior of the augmented (motor-generator) dynamic system. Once that these models are known, the structure and the order of the linearized dynamic model are used as a base for the predictor-error algorithm to identify the parameters. It is clear that our approach is based on a time-domain model. Another contribution of this work is the design of several custom-made power, analog and digital electronics circuits that are employed for the acquisition of the transient experimental data. In other words, the methodology is tested in a experimental set up. Once that the experimental data is obtained a complex off-line processing is carried out. For example, a nonlinear mapping is employed to transform the original machine variables to variables in the rotor reference frame. This mapping is actually calculated by using the experimental dynamic values corresponding to the rotor position. A brief review of the literature follows. The estimation of parameters of synchronous generators has been always an important topic since its relevance in steady state
R. Salas-Cabrera (*) ITCM. Av. 1ro de Mayo S/N, Cd., Madero, MEXICO e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_7, # Springer Science+Business Media, LLC 2011
79
80
R. Salas-Cabrera et al.
and dynamic studies. There are plenty of contributions regarding the modeling and parametric identification of these electric machines. For example, there are contributions that deal with different techniques to measure the experimental variables. There are also different identification schemes for a variety of models. Some online estimation techniques are presented in Vermeulen et al. (2002), Kyriakides and Heydt (2004) and Biro et al. (2001). Contribution in Ghomi and Sarem (2007) provides an excellent review of parameters estimation and dynamic model identification of the synchronous generator. In Dehghani et al. (2007), an identification method is presented; it is applied to a seventh order model that includes the effect of saturation. Methods based on state space models for identification of parameters have been also proposed. In Dehghani and Nikravesh (2009), a continuous state space model is transformed to its discrete state space form to obtain a set of equations that allow the authors to calculate the parameters of the generator. An identification procedure for estimating the parameters of a synchronous machine in the time and frequency domains is presented in Hasni et al. (2007). In Dehghani and Nikravesh (2008), fundamental equations of the synchronous machine are used to obtain a state space model to analyze the interaction of different areas of generation in an interconnected system. Preliminary results of this work are presented in Salas- Cabrera et al. (2010) and Salas-Cabrera et al. (2011). The above contributions are normally associated with a particular electric machine, i.e. the synchronous generator. In contrast, our modeling deals with the augmented (motor-generator) dynamic system. In addition, our work presents data that are obtained by using an experimental set-up that includes the effects related to the power electronics converter. This is important since the long term objective of this work is to have an emulator that will include a state feedback and an actuator as the one mentioned above.
7.2 7.2.1
Modeling DC Motor
Consider the following voltage equations that describe the transient and steady state behavior of a shunt connected DC motor (Krause et al. 2004), va ¼ ra ia þ Laa pia þ Laf wr if va ¼ rf if þ Lff pif
ð7:1Þ
the electromagnetic torque of the DC motor is given by T e1 ¼ Laf if ia
(7.2)
7 Parametric Identification of a Power-System Emulator
81
rotor speed and torque are related by the following expression T e1 ¼ J 1 pwr þ T L
(7.3)
Standard notation is employed in this part of our work, Krause et al. (2004).
7.2.2
Synchronous Generator
In order to define an adequate dynamic model for the synchronous generator a change of coordinates is necessary. The well known reference frame theory provides the basis necessary to obtain a dynamic model that does not include any timevarying coefficient (Krause et al. 2004). Since the circuits of the rotor are normally unsymmetrical we will be using the rotor reference frame (Krause et al. 2004). Variables associated with the stator of the generator can be transformed from the original coordinates (machine variables) to variables in the rotor arbitrary reference frame, i.e. 2 3 3 2 2p 2p Fas cos y þ cos y cosy r r r 6 7 Fqs 3 3 76 26 7 74 Fbs 5 ¼ 6 (7.4) 4 5 3 2p 2p Fds sin yr þ sinyr sin yr Fcs 3 3 where F may represent voltage, current or flux linkages. We employ the standard notation presented in Krause et al. (2004) excepting that we have omitted the superindex r for simplifying purposes. That superindex is normally associated with the variables in the rotor reference frame. The mechanical rotor position is denoted by yr. Let us consider the voltage equations in the rotor reference frame for the stator of a pole-salient synchronous generator (Krause et al. 2004), vqs ¼ r s iqs þ wr lds þ plqs vds ¼ r s ids wr lqs þ plds
ð7:5Þ
If the electric load at the generator terminals is a 3-phase symmetrical resistive circuit, then expression (7.5) now becomes rL iqs ¼ r s iqs þ wr lds þ plqs r L ids ¼ r s ids wr lqs þ plds vfd ¼ r fd ifd þ plfd
ð7:6Þ
82
R. Salas-Cabrera et al.
where the expression that represents the behavior of the field winding has been included; rL denotes the load resistance. The flux linkages may be written as 3 3 2 2 lqs iqs ðLls þ Lmq Þ 7 7 6 6 (7.7) 4 lds 5 ¼ 4 ids ðLls þ Lmd Þ þ ifd Lmd 5 lfd ids Lmd þ ifd ðLlfd þ Lmd Þ It is easy to show that the currents can be expressed as a function of the flux linkages, this is 3 2 3 2 iqs lqs =ðLls þ Lmq Þ 4 ids 5 ¼ 4 lds ðLlfd þ Lmd Þ=L þ lfd Lmd =L 5 (7.8) lds Lmd =L þ lfd ðLls þ Lmd Þ=L ifd where L ¼ Lls Llfd þ Lls Lmd þ Lmd Llfd
(7.9)
the electromagnetic torque of the synchronous generator in terms of the currents in the rotor reference frame is h i 3 P Lmd ids þ ifd iqs þ Lmq iqs ids (7.10) T e2 ¼ 2 2 using (7.8) it is possible to obtain a new expression for the electromagnetic torque of the synchronous generator 3 P Llfd þ Lmd 1 Lmd þ lqs lds lqs lfd T e2 ¼ (7.11) L L 2 2 Lls þ Lmq the torque-rotor speed relationship is defined by the following expression 2 T e2 ¼ J 2 pwr þ T L P
7.2.3
(7.12)
Nonlinear Model of the Mechanical Coupled System
In order to establish the mathematical model of the motor-generator set, an equation that defines the interaction between both subsystems is required. Since both electrical machines rotate at the same speed only one state equation for the rotor speed is necessary. It is clear that the load torque of the DC motor is the input (load) torque of the synchronous generator. Using (7.3) and (7.12) we obtain T e1 J 1 pwr ¼ T e2 þ
2 J 2 pwr P
(7.13)
7 Parametric Identification of a Power-System Emulator
83
Substituting (7.8) into (7.6) and using (7.1), (7.6) and (7.13) the following nonlinear dynamic model can be obtained 8 plqs ¼ ðrs þ r L Þlqs =ðLls þ Lmq Þ wr lds > > > > > > plds ¼ wr lqs þ ðr s þ r L Þ½ ðLlfd þ Lmd Þlds =L þ Lmd lfd =L > > > > > < plfd ¼ r fd Lmd lds =L r fd ðLls þ Lmd Þlfd =L þ vfd > pia ¼ r a ia =Laa Laf wr if =Laa þ va =Laa > > > > > > > pif ¼ r f if =Lff þ va =Lff > > > : pwr ¼ PðT e1 T e2 Þ=ð2J 2 þ PJ 1 Þ 8 > < vqs ¼ r L lqs =ðLls þ Lmq Þ vds ¼ r L ðLlfd þ Lmq Þlds =L þ r L Lmd lfd =L > : wr
(7.14)
(7.15)
where Te1, L and Te2 are defined in (7.2), (7.9) and (7.11) respectively. It is clear that expression (7.14) is the state equation and (7.15) is the output equation. The following remarks are in order: Remarks • Some of the experimental variables to be measured are the 3-phase instantaneous voltages (original variables) at the terminals of the generator, that is [vas vbs vcs]T. Then the electrical outputs (vqs and vds in (7.15)) defined in the rotor reference frame can be calculated by using a particular case of expression (7.4). • The experimental synchronous generator that is employed in this work does not have any damper winding. This is the reason for not including any state equation associated with those windings. • Since the 3-phase voltages at the terminals of the generator are balanced the 0s voltage (v0s) is zero 8t. This is the reason for not including any 0s equation and/or variable. • The external resistive circuit that is connected at the terminals of the generator is considered to have parameters that are known. Additionally, it is clear that a resistive circuit does not increase the order of the system to be tested.
7.2.4
Linearized System
The dynamic model presented in (7.14) and (7.15) can be linearized by using the Taylor’s expansion about an equilibrium point. It is important to note that during steady state conditions the machine variables (original state variables) of the DC motor are normally constants. In contrast, during steady state conditions the stator
84
R. Salas-Cabrera et al.
machine variables (original stator state variables) of the synchronous generator consist of a balanced 3-phase sinusoidal set. Transforming the machine variables of the generator to the rotor reference frame allows us to obtain an augmented model (motor-generator set) that has an equilibrium point instead of having a state vector containing periodic solutions during steady state conditions. The existence of the equilibrium point is a basic requirement for our linearization process. The linearized dynamic system follows. pDx ¼ ADx þ BDu Dy ¼ CDx þ EDu
(7.16)
where Dx ¼ [Dlqs Dlds Dlfd Dia Dif Dwr]T is state vector, Dy ¼ [Dvqs Dvds Dwr]T is the measurable output, Du ¼ [Dva Dvfd]T is the input and A ¼ ½ A1 2 6 6 6 6 A1 ¼ 6 6 6 6 4
A2
ðr s þ r L Þ=ðLls þ Lmq Þ wr0 0 0 0 k1 k2 k3 lds0 þ Lmd lfd0 =L
2
A3
A4
3
A6
A5 2
3
wr0
7 6 ðrs þ r L ÞðLlfd þ Lmd Þ=L 7 7 7 6 7 7 6 7 7 6 r fd Lmd =L 7; A2 ¼ 6 7 7 7 6 0 7 7 6 7 7 6 5 5 4 0 k1 k2 k3 lqs0
3
2
3
2
3
2 3 0 7 6 6 ðr s þ r L ÞLmd =L 7 0 7 6 6 7 7 6 0 7 6 6 7 7 6 7 6 6 7 6 rfd ðLls þ Lmd Þ=L 7 0 0 7 6 6 7 7 6 ; A5 ¼ 6 A3 ¼ 6 ; A4 ¼ 6 7 7 7 L w =L 0 6 ra =Laa 7 6 af r0 aa 7 7 6 7 6 4 r f =Lff 5 7 6 5 4 5 4 0 0 k1 Laf ia0 k1 Laf if 0 k1 k2 Lmd lqs0 =L 0
2
lds0
0
0
7 6 6 0 lqs0 7 6 6 7 6 6 7 6 6 0 0 7; B ¼ 6 A6 ¼ 6 6 L i =L 7 6 1=L af f 0 aa 7 aa 6 6 7 6 6 5 4 4 1=Lff 0 0
0
0
3
07 7 7 17 7; C ¼ ½ C1 07 7 7 05 0
C2
7 Parametric Identification of a Power-System Emulator
2
r L =ðLls þ Lmq Þ 6 C1 ¼ 4 0
85
3 2 0 0 0 7 6 rL ðLlfd þ Lmq Þ=L 5; C2 ¼ 4 r L Lmd =L 0
0
0
0
0
0 0
3 0 7 05
0
1
E¼0 where k 1 ¼ P=ðJ 1 P þ 2J 2 Þ; k2 ¼ 3P=4 k3 ¼ ðLlfd þ Lmd Þ=L þ 1=ðLls þ Lmq Þ and x0 ¼ [lqs0 lds0 lfd0 ia0 if0 wr0]T is the equilibrium point where the linearization was performed.
7.3
Experimental Setup
Important components of the experimental system are a personal computer, a National Instruments AT-MIO-16E-10 data acquisition card, NI Labview software, two custom made PWM-based power electronics converters, an incremental encoder, a custom made microcontroller based design for defining several random sequences and signal conditioning circuits for measuring several variables. A block diagram of the experimental setup is shown in Fig. 7.1. To identify parameters in (7.16) an experimental test is carried out. Basically, the experimental test consists of applying (at the same time) two different random sequences to the incremental inputs of the system Du. During this test, we restrict those inputs such that the dynamics of the augmented system is located in a small neighborhood of the equilibrium point. Microcontroller in Fig. 7.2A contains the assembler source program for defining the random sequence. The PWM signal is generated by using the design presented in Fig. 7.2B. The schematic diagram of the power stage and the DC motor is depicted in Fig. 7.2C. A design similar to the one that is presented in Fig. 7.2 is used to apply the random sequence to the field winding of the synchronous generator. The only difference is that both designs contain different random sequences. In addition, a circuit is designed for conditioning the signals to be measured, see Fig. 7.3. Once the variables have been isolated and have adequate amplitudes, they are connected to the analog inputs of the data adquisition board, see Fig. 7.3. Additional diagrams of the experimental setup for the parametric identification can be found in Martinez-Hernandez (2009).
86
Fig. 7.1 Block diagram of the experimental setup
Fig. 7.2 Circuit for applying a random sequence to the DC motor
R. Salas-Cabrera et al.
7 Parametric Identification of a Power-System Emulator
87
Fig. 7.3 Signal conditioning circuit
7.4
Results
The measured trace of the PWM field winding voltage of the synchronous generator (vfd) is depicted in Fig. 7.4. After some off-line filtering and subtracting the input corresponding to the equilibrium point (vfd0) the PWM voltage in Fig. 7.4 becomes the incremental voltage (Dvfd) depicted in Fig. 7.5. It is evident that the measured PWM field winding voltage corresponds to an incremental voltage having a timevarying (random) period. Figure 7.5 also shows the ideal (not measured) trace of the corresponding random sequence. Similar traces were obtained for the other incremental input, i.e. for the measured voltage of the DC motor (Dva). Incremental inputs (Dva and Dvfd) having time-varying (random) periods clearly affect the measured voltages in original coordinates at the generator terminals, see the voltage vas shown in Fig. 7.6. As it was explained earlier, the experimental voltages in the rotor reference frame (Dvqs and Dvds) can be obtained by performing an off-line calculation, i.e. transforming the incremental version of the measured 3-phase voltages at the terminals of the generator [vas vbs vcs]T. As an example, voltage denoted by Dvds (expressed in the rotor reference frame) is shown in Fig. 7.7. It is important to note that the unsymmetrical nature of the voltage in Fig. 7.7 is related to the initial condition of the angle employed in the mapping (7.4). Once we obtained the transient experimental data corresponding to the output Dy ¼ [Dvqs Dvds Dwr]T and the input Du ¼ [Dva Dvfd]T, we utilized Matlab for the off-line processing of the data. In particular, prediction-error technique is used
88
R. Salas-Cabrera et al.
Fig. 7.4 Measured PWM field winding voltage of the synchronous machine
Fig. 7.5 Incremental field winding voltage of the synchronous machine
to calculate the parameters in (7.16). It is important to note that the filtered versions of the inputs (as the one presented in Fig. 7.5) were employed just to tune the incremental values. The prediction-error algorithm is actually performed by using the PWM version of the incremental inputs.
7 Parametric Identification of a Power-System Emulator
89
Fig. 7.6 Measured phase a voltage at the terminals of the synchronous machine
Fig. 7.7 Incremental ds voltage of the synchronous machine
A good agreement is observed when the measured and simulated outputs are compared, see Figs. 7.8 and 7.9. It is clear that the simulated response is obtained by solving the linear system that results of substituting the calculated parameters.
90
Fig. 7.8 Comparison between the measured and simulated output (Dvds)
Fig. 7.9 Comparison between the measured and simulated output (Dwr)
R. Salas-Cabrera et al.
7 Parametric Identification of a Power-System Emulator
91
A key assumption here is that the model order and the structure of the vector/ matrices used as a base for the prediction-error algorithm is the same as the one defined by model in (7.16). The resulting vectors/matrices are 3 3 3 3 2 2 2 2 201:1 134:57 0 0 6 134:57 7 6 4; 678:2 7 6 16; 183 7 6 0 7 7 7 7 7 6 6 6 6 7 7 7 7 6 6 6 6 7 6 6 820:93 7 6 2; 897:8 7 6 0 7 0 7 7 7 7 6 6 6 6 A1 ¼ 6 7; A2 ¼ 6 7; A3 ¼ 6 7; A4 ¼ 6 130:7 7 0 0 0 7 7 7 7 6 6 6 6 7 7 7 7 6 6 6 6 5 5 5 4 4 4 4 0 0 0 0 5 89; 708 1; 920:4 6; 539:3 44:234 3 140:86 7 6 6 104:78 7 7 7 6 6 7 7 6 6 7 7 6 6 0 0 7; A6 ¼ 6 7 A5 ¼ 6 6 3; 689:9 7 6 30:377 7; 7 7 6 6 7 7 6 6 5 4 73:486 5 4 0 2
3
0 0
2
2
1; 944:6
6 6 6 6 0 B¼6 6 0:045635 6 6 4 0:11965 0
0 2
152:93 C¼4 0 0
0 0
0 1:6195 0
0 3:8727 0
3 0 07 7 7 17 7 07 7 7 05 0
3 0 0 0 0 0 05 0 0 1
The initial condition of the incremental state vector is Dxð0Þ ¼ ½0:0025939
2:0613
0:64278
4:8217
0:13137 2:094T:
References Biro K, Szabo L, Iancu V, Hedesiu HC, Barz V (2001) On the synchronous machine parameter identification. Technical University of Cluj-Napoca Dehghani M, Nikravesh SKY (2008) State-space model parameter identification in large-scale power systems. IEEE Trans Power Syst 23(3):1449–1457 Dehghani M, Nikravesh SKY (2009) Estimation of synchronous generator parameters in a large scale power system. JICICI 5(8):2141–2150 Dehghani M, Karrari M, Malik OP (2007) Synchronous generator model identification using linear H identification method. IFAC Workshop ICPS 07 Cluj-Napoca, Romania Ghomi M, Sarem YN (2007). Review of synchronous generator parameters estimation and model identification. Islamic Azad University of Toyserkan, UPEC 2007, pp 228–235 Hasni M, Djema S, Touhami O, Ibtiouen R, Fadel M, Caux S (2007) Synchronous machine parameter identification in frequency and time domain. Serb J Elec Eng 4(1):51–69
92
R. Salas-Cabrera et al.
Krause PC, Wasynczuk O, Sudhoff SD (2004) Analysis of electrical machinery and drive systems. Wiley, IEEE Press Power Engineering Kyriakides E, Heydt G (2004) Estimation of synchronous generator parameters using an observer for damper currents and a graphical user interface. Arizona State University Department of Electrical Engineering, Tempe Martinez-Hernandez O (2009) On the modeling and parametric identification of a motor-generator set, MSc. thesis (in spanish). Instituto Tecnologico de Cd. Madero. Cd. Madero, Mexico Salas-Cabrera R, Martinez-Hernandez O, Castillo-Ibarra R, Rosas-Caro JC, Gonzalez-Rodriguez A, Salas-Cabrera EN, Cisneros-Villegas H, Castillo-Gutierrez R, Hernandez-Palmer G (2010) On the modeling and parametric identification of a motor-generator set. In: Lectures notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2010. WCECS, San Francisco, pp. 942–948. Salas-Cabrera R, Martinez-Hernandez O, Castillo-Ibarra R, Rosas-Caro JC, Gonzalez-Rodriguez A, Salas-Cabrera EN, Cisneros-Villegas H, Castillo-Gutierrez R, Hernandez-Palmer G (2011) Modeling and parametric identification of a turbine-generator emulator. Eng Lett IAENG 19(1):68–74 Vermeulen HJ, Strauss JM, Shikoana V (2002) Online estimation of synchronous generator parameters using PRBS perturbations. IEEE Trans Power Syst 17(3):694–700
Chapter 8
Enhanced Adaptive Controller Applied to SMA Wire Control Samah A.M. Ghanem
8.1
Introduction
Besides the mainly verified or validated properties of the modeling approaches of the SMA focusing to the modeling of the different observed effects, all these SMA models lack the applicability in different engineering applications and in the control of the SMA since they don’t describe a general behaviour of any SMA type or shape; idealizing the material constants and thereby disallowing their application into different systems, and into different applications scenarios and also for modelbased control design purposes. However, if an SMA wire within a control loop is replaced by another SMA wire with new properties, or with an SMA plate, etc. those mathematical models may be in detail weak to perform the control task as they lack generalization properties from an engineering point of view, and weak in understanding this actuator material behavior as an I/O-system with structured behavior. Model-based control approaches may allow in combination with suitable design approaches the design and the realization of robust control algorithms and therefore the general usage of the SMA not only to be used as actuator but also for application where the shape memory effect should not be used i.e. the actuation where the inverse model can cancel the expected behaviour of the SMA, therefore it is an open research topic as far as no SMA generalized behavioural based model exit (Ghanem 2010). Non-Model-based control systems are limited to the usage of the SMA as actuators in applications and need tuning for its procedures any time the material physical characteristics are changed. As a result, non-model-based adaptive control approaches in the position control of the SMA is of vital importance to be studied (Ghanem 2010; Ghanem et al. 2010).
S.A.M. Ghanem (*) Institute of Telecommunications, University of Porto, Portugal e-mail:
[email protected] Birzeit University, Palestine S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_8, # Springer Science+Business Media, LLC 2011
93
94
S.A.M. Ghanem
The adaptive approaches in control theory performs a redesign of the controller in accordance to the changing parameters of the plant. Adaptive control is a good choice in the shape memory alloy (SMA) wire actuator plant case. This is because control of SMA is a hard process due to the non-linearity exhibited by the SMA material. That is, the hysteretic behavior of the material causing the shape memory effect exhibited within it. However, methods used for controlling the position of the SMA lacks the simplicity and the adaptability to the SMA characteristics since environmental changes influence the characteristics of the SMA (Ghanem 2010; Reynolds 2003; Chirani et al. 2003; Bouvet et al. 2004; Dutta et al. 2005; Teh and Featherstone 2003; Song et al. 2003; Bizdoaca et al. 2006; Benzaoui et al. 1999; Silva 2007). The proposed adaptive controller is derived through mathematical equations of the MRAC and STC adaptive control (Ghanem et al. 2010; Slontine and Li 1999). The concept of PID classical control functionality was used to let the PID proportional, integral, and derivative Kp, Ki, and Kd parameters respectively to be used as the estimator parameters or the adaptation parameters of the adaptive controller proposed. The enhanced version of adaptive controllers that can be used to control linear or non-linear plants will be derived step by step, the robustness of this controller has been evaluated through a hard control problem which is the SMA wire, however, this adaptive controller can be used to tune the PID parameters giving more robust solutions; i.e. real time adaptive tuning, such that the parameters fit the plant in an easier way than using other well known tuning procedures in control. The chapter will firstly introduce the basic two approaches of adaptive control used in the derivation of this adaptive controller, then it will discuss the shape memory alloy wire position control using the designed PID adaptive controller, a tuned PI controller, and the experimentally enhanced version of the adaptive controller. Finally, the experimental results of the enhanced adaptive controller and the tuned PI controller under different sets of actuation frequencies will be evaluated.
8.2
Adaptive Control Approaches
Since adaptive control performs a redesign of the controller in accordance to the changing parameters of the plant, adaptive control is a good choice in the application of SMA wire actuator plant control. The next sections will introduce the basic two approaches of adaptive control that leads to the way how the PID adaptive controller is derived and tested on the SMA wire.
8.2.1
MRAC Control
Model reference adaptive controller basic principle is to build a reference model that specifies the desired output of the controller, and then the adaptation law adjusts the unknown parameters of the plant so that the tracking error converges to zero (Slontine and Li 1999).
8 Enhanced Adaptive Controller Applied to SMA Wire Control
8.2.2
95
STC Control
Self tuning adaptive controller basic principle of is to have a parameter estimator that estimates recursively the unknown parameters of the plant and feeds it to the controller. This recursive estimation based on the parameters that fit the past inputoutput criteria of the plant (Slontine and Li 1999).
8.3
Adaptive Control with Combined MRAC And STC
A combined approach that joins MRAC and STC will be introduced in building a PID adaptive controller. The reason for calling this approach a combined approach, is that the estimated parameters will be the controller parameters; which will adapt to the plant unknown parameters recursively like in STC, while the tuning will be according to the tracking error convergence to zero like in MRAC, since the reference model here is the plant itself, and that is the basis for calling the SMA actuator as the SMA actuator plant. The equations corresponding to the MRAC controller for a first order plant are the following: vðtÞ ¼ ½ r
y T
where vðtÞdenotes the signal vector, r is the reference input signal, and y is the output signal.
a^r a^ðtÞ ¼ : a^y where a^ðtÞ is the vector of the controller adaptive parameters; defined by their derivatives in terms of plant parameters to derive the adaptation law as follows: a^_ r ¼ sgnðbp Þger a^_ y ¼ sgnðbp Þ gey
(8.1)
where, a^_ r is the derivative of the estimated parameter corresponding to the reference signal r and a^_ y is the derivative of the estimated parameter corresponding to plant output signal y, e is the error signal, sgn(bp) determines the direction of search for the proper controller parameter, and g is the adaptation coefficient.
96
S.A.M. Ghanem
8.3.1
Building an Adaptive Controller
Choose the position as the estimated parameter and the position error as the second estimated parameter. Let the signal vector be: vðtÞ ¼ ½ e
y T
Here e is the position error, and y is the actual position. Let the controller estimated parameters be: a^ðtÞ ¼
a^s a^y
Where a^s is the estimated parameter corresponding to error signal, and a^y is the estimated parameter corresponding to actual position signal, then the adaptation law is the following: a^_ s ¼ sgnðbp Þgez a^_ y ¼ sgnðbp Þgey
(8.2)
Substitute bp as 1 since there is no transfer function for the SMA wire dynamics to build the controller.
8.3.2
Building the PID Adaptive Controller
Using the error signal as one of the signals used for adaptation helps the controller to react. And since there is no equation expressing the plant dynamics, to build a PID adaptive controller the assumption will be to use the controller parameters as the estimated parameters of the plant; while this will help the PID controller to self-tune itself; which is a combined approach of MRAC and STC. Figure 8.1 shows a block diagram describing the idea. In Fig. 8.1., eðtÞ denotes the error signal; which equals the desired position subtracted from the actual position of the SMA to insure convergence of the error to zero, like in MRAC. The reference model of the MRAC is the plant itself; the adaptation coefficients are the estimator coefficients of the STC to let the PID adapt its controller coefficients to the plant. The block attached to the PID is a block to explain that the output of the controller is converted using blocks that derive the current into the SMA wire. Based on the formulas of regular PID controllers, the _ _ _ idea in Fig. 8.1 is that the k^p , k^i , and k^d are the estimated parameters of the plant
8 Enhanced Adaptive Controller Applied to SMA Wire Control
97
Fig. 8.1 The PID adaptive controller combined MRAC and STC
while they are recursively tuning their corresponding controller parameters as well with the following adaptation law, so that the PID controller coefficients are just the integral of the adaptation law. ð ^ ) kp ¼ ge2 dt:
_ k^p ¼ ge2 _ k^i ¼ ge
ð
ð ð ^ edt ) ki ¼ ge edt dt:
de _ k^d ¼ ge dt
ð de ^ ) kd ¼ ge dt: dt
(8.3)
k^p is the proportional gain, k^i is the integral gain, k^d is the derivative gain. While _^ _ kp is the proportional estimated parameter, k^i is the integral estimated parameter, _ and k^d is the derivative estimated parameter. So the integral form of the estimated vector is: 2
3 k^p a^ðtÞ ¼ 4 k^i 5 k^d And the signal vector corresponding to each of them like normal PID controller signals as follows: vðtÞ ¼ eðtÞ
ð eðtÞdt
eðtÞ ¼ yA ðtÞ yr ðtÞ
deðtÞ T : dt
98
S.A.M. Ghanem
where yA(t) is the actual position of the SMA wire and yr(t) is the desired position. The output of the PID Controller will be the following sum of the proportional, integral, and derivative outputs: YðtÞ ¼
X
bðiÞ
To simplify the representation of equations in time domain, the s-domain will be used. Where an integral is a division by s, and a derivative is a multiplication by s. And instead of using t as the time samples, i will be used as the iteration number, as follows: 1 bP ðiÞ ¼ ge3 ðiÞ ¼ kp e S bI ðiÞ ¼ ge3 ðiÞ
1 e ¼ ki S2 s
bD ðiÞ ¼ ge3 ðiÞs ¼ kD es Then the output of the controller can be expressed by the following stable system, YðiÞ ¼
X
YðiÞ ¼ ge3
8.4
P;I;D
bðiÞ ¼ bp ðiÞ þ bI ðiÞ þ bD ðiÞ
ðs4 þ s2 þ 1Þ : s3
Position Control of The SMA
The Shape Memory Alloys (SMA) are smart materials that are difficult to control due to its hysteretic behavior and unpredictable changes to its characteristics under temperature, electrical excitation, or mechanical forces. Such type of behavior encourages the usage of adaptive control approaches in the control of the SMA wire. However, the SMA will be used to test the validity and robustness of the derived adaptive controller in controlling the position of the SMA wire. The SMA experimental setup includes all the hardware and software components required for the experiment: The NiTi SMA wire, a constant load connected to the wire, a slider or a removable block to enable the movement of the wire and the connected mass under negative and positive elongations. Software components mainly include the MATLAB/SIMULINK and the dSPACE data acquisition system for testing real time changes on the SMA wire. The SMA wire used in the experiments is a Nickel Titanium wire, with the parameters mentioned in Table 8.1.
8 Enhanced Adaptive Controller Applied to SMA Wire Control Table 8.1 Parameters of the NiTi Shape Memory Alloy
8.4.1
Length (m) 0.55
Radius (m) 0.0002
Density (Kg/m3) 6,500
99
Area (m2) 6.9115*104
Volume (m3) 6.9115*108
Adaptive Controller Test on SMA Position Control
To measure the output current from the SMA actuator wire, the values of voltages should be divided by the resistance R ¼ 8O to approximate the current which is the square root of this value and saturated in the range 0 to 1.7A with pffiffiffiffiffiffiffiffiffiffiffiffiffi1:7 iðtÞ ¼ YðtÞ=8 : Fig. 8.2 shows the block diagram for the PID adaptive control0
ler in equation (3). A similar PID adaptive controller has been derived by Feng Lin, et al. (2000) using Fre´chet derivative and SISO system formulas, while here simplified formulas and combination of adaptive approaches lead to this model.
8.4.1.1
Design of the PI Controller
The PID adaptive controller shows a single shot high performance results, in the experimental environment. However, the readings for the Kp, Ki, and Kd gains on real time allows the design of a classical tuned PI controller that is robust and consistent in the control of the position of the SMA wire in (Ghanem 2010; Ghanem et al. 2010).
8.4.1.2
Design of the Enhanced Adaptive Controller
Although simulation results are perfect for the PID adaptive controller, however, in the real world where no idealization exit, a set of recursive experiments on the PID adaptive controller proved inconsistency in the controller. This leads to the need improve the controller design and to test its behavior experimentally. The mathematical adaptation law in (3) will not change, while the adaptation mechanism will be modified. Such that the estimated parameters corresponding to kp, ki, and kd will be the same, while the controller parameters already called kp, ki, and kd will be changed so that it is not similar to proportional, integral, and derivative gains respectively. The error signal will be used as the signal vector such that: vðtÞ ¼ ½eðtÞ eðtÞ
eðtÞT
for all controller parameters instead of using the proportional, integral, and derivative components of the error signal to get the corresponding controller parameters; which are not anymore PID parameters. In comparison with the previous PID adaptive controller parameters, the kp is same as it was; dependent on the error signal, and the ki and kd are equally likely. This change indeed results in an enhanced version of a new adaptive controller, Fig. 8.3 shows the block diagram of the enhanced adaptive controller. The controller was tested in the control of the SMA wire and it gives a consistent robust control.
Fig. 8.2 PID adaptive controller for position control of the SMA
100 S.A.M. Ghanem
Fig. 8.3 Enhanced adaptive controller for position control of the SMA
8 Enhanced Adaptive Controller Applied to SMA Wire Control 101
102
8.4.2
S.A.M. Ghanem
Experimental Results
A detailed set of experimental results have been introduced in (Ghanem et al. 2010) using this controller setup, however, we sum up the following summarized points: 1. Equations of MRAC controller with adaptation law (1) do not respond experimentally and give no response to control the position of the SMA. 2. Refinement of equation (1) into (2) to make the parameter dependant on the error signal instead of the reference signal makes the controller starts to respond and control the position of the SMA, while the performance is low. 3. The PID adaptive controller with adaptation law in (3) was tested and it shows excellent results in controlling the position of the SMA. Note that there is a time interval needed for adaptation almost equal to 30 seconds. Multiple set of experiments show that the adaptive controller with error vector signal is an enhanced adaptive controller with more consistency and less adaptation time than the PID adaptive controller. 4. The enhanced adaptive controller was tested under different set of frequencies and gives high performance and robustness in the control of SMA under different sets of excitation frequencies (Ghanem 2010; Ghanem et al. 2010). 5. For this SMA actuator plant in Table 8.1, it is proved that chosing an adaptation coefficient g ¼ 1 under different sets of experiments gives the best performance, any increase or decrease of g will affect the performance of the adaptive controller. 6. For this SMA in Table 8.1, the online Kp, Ki, and Kd parameters running the adaptive controller under different sets of adaptation coefficients gives the following values for the PID gains Kp ¼ [13.2–13.9]; Ki ¼ [0.3;1]; and for Kd ¼ [0.33;0.39]. So, these values can be used for tuning a PID controller with coefficient’s ratios [Kp: Ki: Kd] ¼ [14:1:0.3], which are the tuned PI controller parameters. Hence, this adaptive controller can be used to tune the Kp, Ki and Kd coefficients for linear or non-linear plants with unknown dynamics. 7. Using a chaotic wave input signal, Fig. 8.4 shows the results of using an arbitrary chaotic signal as the desired position for both the PI controller and the enhanced adaptive controller. The chaotic signal is generated using the Chua’s equations set (Yousef Al-Sweiti 2006). The performance of both controllers is very good with few delays in the actual position, but the overall performance of the PI controller is better. The adaptive controller shows that its adaptability is not mainly based on the input signal consistency, but it is dependent on the plant dynamics and the error signal as well. 8. Experimenting the tuned PI controller under different set of excitation frequencies. The PI controller tuned using the adaptive controller leads to better performance (Ghanem 2010; Ghanem et al. 2010). 9. Comparing the control error for both, the tuned PI controller and the enhanced adaptive controller under different set of excitation frequencies leads to better performance in the PI controller, Fig. 8.5 shows the comparison. Note that the experimental results were taken in real time from the user interface of the dSPACE data acquisition system connected to the SMA test rig.
8 Enhanced Adaptive Controller Applied to SMA Wire Control
103
Fig. 8.4 PI and enhanced PID adaptive controller results using chaotic signal as the desired position. Time in (s) versus desired position (dotted) and actual position (solid) in (mm)
104
S.A.M. Ghanem
Fig. 8.5 Position control error (mm) for PI controller (dashed), and enhanced adaptive controller (dotted) under different excitation frequencies (Hz)
8.5
Conclusion and Future Work
In this chapter, we introduced the design of an adaptive controller using a combination of two basic adaptive approaches in control, the MRAC, and the STC. The PID adaptive version is used to derive a tuned PI controller. The PI controller and the enhanced adaptive controller are used to control the position of the SMA wire. The controller can be used to control different linear or non-linear plants with unknown dymanics, i.e. similar to the SMA. The adaptive control introduced as a smart alternative for the Non-model based approaches that lack the adaptability to such type of materials that change its behavior by changing its characteristics. While also adaptive control in such kind of difficult non-linear control problems stand as alternative for a model-based controller in their non-dependency on the temperature, i.e. the characteristic changes in the case of the SMA. However, it is still required to build model-based controllers that allows a general usage of the SMA not only to be used as actuator, but also for different engineering applications, where the application will not be only focused on the control of the position. Therefore, it is an open research topic as far as no SMA generalized behavioural based model exit.
References Arbab Chirani S, Aleong D, Dumont C, McDowell D, Patoor E (2003) Superelastic behavior modeling in shape memory alloys. Journal de Physique 4(112):205–208 Benzaoui H, Chaillet N, Lexcellent C, Bourjault A (1999) A Non linear motion and force control of shape memory alloys actuators. Proceedings of SPIE - The International Society for Optical Engineering 3667:337–348
8 Enhanced Adaptive Controller Applied to SMA Wire Control
105
Bizdoaca N, Hamdan H, Selisteanu D (2006) Fuzzy Logic Controller for a Shape Memory Alloy Tentacle Robotic Structure. University of Craiova, Romania, IMRAC, France Bouvet C, Calloch S, Lexcellent C (2004) A phenomenological model for pseudoelasticity of shape memory alloys under multiaxial proportional and nonproportional loading. European Journal of Mechanics, A/Solids 23(1):37–61 Dutta SM, Ghorbel FH, Dabney JB (2005) Modeling and control of a shape memory alloy actuator. In: Proceedings of the 20th IEEE International Symposium on Intelligent Control, ISIC ’05 and the 13th Mediterranean Conference on Control and Automation, MED, art. no. 1467151, pp 1007–1012, 2005 Teh YH, Featherstone R (2003) A new control system for fast motion control of SMA actutator wires. Department of Systems Engineering, Research School of Information Sciences and Engineering, The National University, 2003 Li F, Brandt RD, Saikalis G (2000) Self tuning of PID controllers by adaptive interaction. Proceedings of the American Control Conference, Chicago, IL Reynolds DR (2003) A nonlinear thermodynamic model of phase transitions in shape memory alloy wires. P.h.D. Thesis. Rice University, Houston, TX, May, 2003 Ghanem SAM (2010) Position control of shape memory alloy (SMA) wire using model and behavioral-based and non-model-based controllers. #2010, VDM Verlang Dr M€uller Ghanem SAM, Shibly H, Soeffker D (2010) Enhanced adaptive controller using combined MRAC and STC adaptive control approaches for the control of shape memory alloy wire. Lecture Notes in Engineering and Computer Science: Proceedings of The World Congress on Engineering and Computer Science 2010, WCECS 2010, 20–22 October, 2010, San Francisco, USA, vol 2187, pp 996–1001, Issue 1 Silva EPD (2007) Beam shape feedback control by means of a shape memory actuator. Materials and Design 28(5):1592–1596 Slontine JJ, Li WP (1991) Applied nonlinear control, pp 312–328, #1991 by Prentice Hall International Inc. Song G, Chaudhry V, Batur C (2003) Precision tracking control of shape memory alloy actuators using neural networks and a sliding-mode based robust controller. Institute of Physics Publishing, Smart Materials and Structures, vol 12, 2003 Yousef Al-Sweiti (2006) Modeling and control of an elastic ship-mounted crane using variablegain model-based controller. Ph.D. Thesis. University of Duisburg-Essen, Germany, 2006
Chapter 9
A Standalone System to Train and Evaluate Operators of Power Plants Jose´ Tavira-Mondrago´n, Rogelio Martı´nez-Ramı´rez, Fernando Jime´nez-Fraustro, Roni Orozco-Martı´nez, and Rafael Cruz-Cruz
9.1
Introduction
A wide list of the main benefits of using a variety of training simulators is shown in (International Atomic Energy Agency 1998), some of the most important are: the ability to train on malfunctions, transients and accidents; the reduction of risk to plant equipment and personnel; the ability to train personnel on actual plant events; a broader range of personnel receiving effective training; and individualized instruction or self-training being performed effectively on simulation devices designed with these capabilities in mind. Since more than 25 years, the simulation department of the Electric Research Institute (IIE) has designed, developed and started-up all the power plant simulators required for the Federal Commission of Electricity (CFE) to train its operators of thermal, geothermal and combined cycle power plants. These simulators have different scopes according to the training demands of the CFE, therefore in their training centers, the CFE have classroom simulators, full-scope replica and no-replica simulators, and with the aim of improving the onsite training programs, the CFE also has portable part task simulators. Currently, the training facilities of the CFE with the bigger number of simulators is The Ixtapantongo National Training Center (CENAC-I), which is devoted to train the operators of thermal power plants, the available simulators in this center, which have been developed by the IIE (Tavira-Mondrago´n et al. 2010a; Tavira-Mondrago´n et al. 2009; Zabre et al. 2009; Tavira-Mondrago´n et al. 2006; Tavira-Mondrago´n et al. 2005) are the following: 1. Two full-scope replica control board simulators, one for a 300 MW thermal power plant, and one for a 400 MW combined cycle power plant.
J. Tavira-Mondrago´n (*) Instituto de Investigaciones Ele´ctricas (IIE), Reforma 113 Col. Palmira, Cuernavaca, Mor., Mexico e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_9, # Springer Science+Business Media, LLC 2011
107
J. Tavira-Mondrago´n et al.
108
2. Five full-scope simulators with graphic operation interface for a 300 MW thermal power plant, for a 350 MW thermal power plant, for a 350 MW coal fired power plant, for a 150 MW gas turbine power plant, and for a 450 MW combined cycle power plant. 3. Two classroom simulators, one for a 300 MW thermal power plant, and one for a 350 MW thermal power plant. 4. Two partial-task portable simulators, one for turning and acceleration of the main turbine, and one for a 300 MW thermal power plant. The partial-task simulators are portables, so they are transported to the power plants, in this way, the operators can practice onsite, these simulators can be utilized with the assistance of an instructor or in a free-hands context, but in any case they do not have an automatic tutoring system or an evaluation system. The full-scope and classroom simulators are hosted in the CENAC-I facilities and require of an instructor to guide the training session. With these simulators, the CENAC-I prepare the operation personnel of the thermal power plants of the CFE; the training programs include retraining strategies, safe operation under malfunctions and familiarization with power plant systems. Another important issue is that the partial-task simulators are based on UNIX operating system, while the other simulators are based on personal computers with Windows XP operating system and with a very similar simulation environment. On the other hand, expert systems provide powerful and flexible tools to solve to a wide range of problems that often cannot be dealt by traditional methods; therefore, the use of expert systems has proliferated to many technological areas. In this way, although full-scope simulators are recognized worldwide as the only realistic method to provide real time and hands-on training of operators (International Atomic Energy Agency 2004), the inclusion of intelligent tutoring systems to train power plant operators has been a study matter to enhance the capabilities of a power plant simulator (Seifi and Seifi 2002), or as an intent to optimize operators training (Arjona Lo´pez et al. 2003; Tavira-Mondrago´n et al. 2010b). The main features of a portable simulation system to train and evaluate operators onsite without the guidance of a human instructor are presented. This system utilizes CLIPS as expert system and keeps the attributes of full-scope, detailed mathematical modeling and real-time functioning for each one of the two simulated processes (a 350 MW coal-fired power plant and a 450 MW combined cycle power plant). The software architecture and the most important aspects of the developed tools to design the training exercises are also shown; finally the results and conclusion achieved are presented.
9.2
The Power Plant Simulators
Because of the training demands onsite, the 350 MW Coal-Fired power plant Simulator (CFS) and the 450 MW Combined Cycle power plant Simulator (CCS) were selected among the available simulators at the CENAC-I. The CFS reproduces
9 A Standalone System to Train and Evaluate Operators of Power Plants
109
the behavior of a 350 MW dual fuel power plant, so it can operates with oil or coal as fuel, this unit has as main components: a forced circulation steam generator, a tandem compound turbogenerator, the feedwater system, and the auxiliary systems (e.g. lubrication oil, fuel, water services, electric grid, etc.). On the other hand, the CCS reproduces the behavior of a 450 MW combined cycle power plant, this unit has as main components: two gas turbines, two heat recovery steam generators, one tandem compound turbogenerator, the feedwater system, and the auxiliary systems (e.g. lubrication oil, fuel, water services, electric grid, etc.). Due to these simulators are of the full-scope type, their mathematical models must be able to reproduce, in a dynamic way, the behavior of the power plant in any feasible operation, this means, any steady state from cold iron up to full-load generation, and transients states as a part of operation itself or because of malfunctions, therefore, the simulators have detailed mathematical modeling based on physical laws, and to customize the models, to the actual power plant used as reference, each one of the equipments (tanks, valves, pumps, fans, heat exchangers, etc.), are characterized with design information and operation data. In this kind of modeling, the governing equations are formed by a group of algebraic and first-order differential equations. Once all the governing equations had been developed and programmed, each one of the models were tested and validated in an independent way, and their corresponding step-size integration methods were defined. After that, they were inserted into a modular sequential solver according to their causality. Additionally, each one of the simulators has its distributed control system; these systems include manual, semi-automatic and automatic operation modes, according to the specified requirements of the simulated areas. In brief, the control algorithms are organized in basic components with a very specific function (PID, Set/Reset, Dead Band, Limiters, etc.), and they are represented through a hierarchical components network. This network is stored as a diagram and consequently a group of diagrams constitutes a module. These modules have their own solver according to the order of precedence of the modules
9.3
Hardware and Software Architecture
Due to the simulation system will be continuously transported to power plants, a robust and portable computer platform was selected, for a safe and easy transportation. The computer selected is a personal computer with three LCD integrated monitors. This computer, based on Windows XP operating system, has a dual Xeon quad processor of 2.66 GHz, 4 GB memory and one 160 GB hard disk. The software architecture of the simulation system is shown in the Fig. 9.1, the CFS and the CCS used as base simulators for the new simulation system were adapted and integrated to this architecture. The description of the software modules is presented in the following sections.
J. Tavira-Mondrago´n et al.
110
Fig. 9.1 Software architecture
9.3.1
Simulator Modules
A detailed discussion of the simulator software is described in (Tavira-Mondrago´n et al. 2010a), therefore herein just a brief description of each module is presented. The control models and process mathematical models are discussed in Sect. 2.
9.3.1.1
Real Time Executive
The real-time executive module coordinates all simulation functions and its main parts are: (a) the mathematical model launcher; (b) the manager module for interactive process diagrams; (c) the manager module for the global area of mathematical models; (d) the manager module for the instructor console; (e) data base drivers.
9.3.1.2
Main Sequencer
It is in charge of sequencing, in real-time, all the functions which require a cyclic execution, these are: mathematical models, control models and other additional functions like historical trends.
9 A Standalone System to Train and Evaluate Operators of Power Plants
9.3.1.3
111
Operator Module
The operator module is in charge of the Human Machine Interface (HMI) of the trainee and manages the information flow with the executive system. The HMI consists of interactive process diagrams, which are Flash movies; the flash movies have static and dynamic parts. The static part is constituted by a drawing of a particular flow diagram, whereas the dynamic part is configured with graphic components stored in a library which are related to each one of the plant’s equipment, e.g., pumps, valves, motors, etc. These components have their own properties and they are established during the simulation.
9.3.1.4
Instructor Console Module
This interface is hidden to the trainee, but the instructors of the CENAC-I can access it during the stage of design, test and debug of the training exercises. This module consists of five parts: (a) a main module to carry out all the tasks related to the graphical interface of the instructor; (b) a module to retrieve the static information of the simulation session, e.g., malfunctions, internal parameters, etc.; (c) a module to store information in a data base using SQL; (d) a module to dynamically update the instructor console with the simulation information; (e) a module to communicate the instructor console with the real-time executive.
9.3.2
Evaluation Console Modules
These modules constitute the application software to train and evaluate the trainees.
9.3.2.1
Expert System
Expert systems use human expertise (the result of deliberated practice on standard tasks over many years) to answer questions, pose questions, solve problems, and assist humans in solving problems. They do so by using inferences similar to those a human expert would make, to produce a justified, sound response in a brief period of time. When questioned, they should be able to produce the rules and processes that show how they arrived at the solution (Olmstadt 2000). The use of an expert system provides many advantages over traditional programming. The system written in a conventional language would result in low performance and reliability with very limited flexibility when compared to the same task performed in a declarative expert system. Additionally expert systems typically allow for the incorporation of explanation facilities. With an explanation facility, the user (trainee) may “ask” the system why it is asking a particular question or how it has reached a particular conclusion. This is a capability that is
J. Tavira-Mondrago´n et al.
112
not possible using conventional programming techniques. Additionally, with the structure of the system being declarative in nature, knowledge may be continually added to the system there by extending its capabilities (Chan 1996). The expert system incorporated to the simulation environment is the C Language Integrated Production System (CLIPS), which is a widely used tool to build expert system by government, industry and academia. The expert system is responsible of tracking the status of the simulation to determine the group of rules that should be fired. Due to its inference engine, and according to the configuration of the simulation exercise, the expert system is able to modify the simulation process, because it can insert malfunctions, modify values of process and control variables, and change the status of the simulation without the intervention of a human instructor. The expert system is embedded within the real-time executive of the simulator. In order to provide a suitable interface to the expert system, to its inference engine were added external functions to couple it to the main sequencer of the simulators.
9.3.2.2
Exercises Editor
This editor is a graphic tool to configure the training exercises in the simulation system. This configuration has two parts: 1. Simulation scenario. In this part are defined aspects like: exercise objective, initial condition for the simulation (e.g. cold iron, full-load, etc.), trend graphics, interactive process diagrams, process variables to evaluate the simulation, and questions for the theoretical evaluation 2. Exercise sequence. This sequence is established through a graphical language, which is a data flow language where basic elements are blocks and lines, the blocks have a specific function, for instance, to send a message to the user or establish a waiting time for an action. Each block consists of a header and an action, in the header is the logical code to launch its action. The execution sequence of all the blocks, and the communication among them is carried out by means of lines, in this way, an active input line is the signal to start the block, and an active output line indicates the block has finished its function. Additionally, the blocks have another input line to abort the block activation if a preset condition is detected. The result of this edition is a directed acyclic graph, which is stored in XML format. Finally, with a specially developed translator, the XML file is transformed in a new file containing an ordered group of rules which are the base information for the expert system
9.3.2.3
Theoretical Evaluation
The conceptual support of the training exercises is supported by multimedia lessons, and a theoretical evaluation. The multimedia lessons consist of theoretical explanations related with the training exercise and they can be presented to the
9 A Standalone System to Train and Evaluate Operators of Power Plants
113
trainee through different multimedia formats. In the case of the theoretical evaluation, the questions related with the exercise objectives are captured in a database, and at the moment of the evaluation, a group of them are selected and displayed randomly. This module is also in charge of feedback to the trainee with the correct answers to his mistakes.
9.3.2.4
Trainee Interface
This interface allows trainee to access the training exercises in the simulation system, and it is displayed in the three monitors of the computer, in one of them the trainee has the functions to control the simulation and a window to show the messages from the expert system. For the operation of the simulated process, the trainee has two monitors with interactive process diagrams, which have the same look and functioning like the diagrams of the base simulators, in this way, the trainees do not need to learn a new operation interface.
9.4 9.4.1
Knowledge Acquisition and Representation Knowledge Acquisition
The C Language Integrated Production System (CLIPS) is an inference engine initially developed to facilitate the representation of knowledge to model human expertise (CLIPS). CLIPS provides a cohesive tool for handling a wide variety of knowledge with support for three different programming paradigms: rule-based, object-oriented and procedural. Rule-based programming allows knowledge to be represented as heuristics, or “rules of thumb,” which specify a set of actions to be performed for a given situation. A rule-based expert system is defined as one, which contains information obtained from a human expert, and represents that information in the form of rules. The rules can then be used to perform operations on data to inference in order to reach appropriate conclusion. These inferences are essentially a computer program that provides a methodology for reasoning about information in the rule base or knowledge base, and for formulating conclusions (Liao 2004). Object-oriented programming allows complex systems to be modeled as modular components, which can be easily reused to model other systems or to create new components. In the procedural approach, CLIPS can be called from a procedural language, perform its function, and then return control back to the calling program. Likewise, procedural code can be defined as external functions and called from CLIPS. When the external code completes execution, control returns to CLIPS. Knowledge acquisition is built by carrying out predefined simulation exercises with the “Instructor Console” mode of the simulation system; these tests are derived
J. Tavira-Mondrago´n et al.
114
of the training procedures for the simulators and they are performed by qualified instructors of the CENAC-I. The training procedures are the same utilized during the simulation sessions guided by an instructor in the simulators of the CENAC-I, therefore, they include all the operation actions required from the trainee, and this helps to build a very complete knowledge base. These procedures include normal operation (startup, normalization and shutdown operations, for each one of the power plant systems); and abnormal operations (predefined malfunctions, like pumps trips, pipe breaks, etc.). In this way, with the record of the operation actions, the results observed and the instructor experience is obtained all the necessary information for the expert system.
9.4.2
Knowledge Representation
According to the design specifications of the evaluation system, and the knowledge acquisition method, the rule-based approach to represent the knowledge was selected. This approach is also called IF-THEN rules. Some of the benefits of IFTHEN rules are their modularity, and that each one defines a relatively small and independent piece of knowledge. New rules can be added and old ones deleted usually independently of other rules. With the aim of avoiding resolution conflict, the rules must have a predefined order. The exercises editor is a graphic tool where the instructor of the CENAC-I builds the training exercises for the simulation system, Fig. 9.2 is an example of a simulation exercise. These editor contains a group of blocks where each block represents a rule (or a group of rules), and each block is customized by its characteristic parameters. The available blocks in the current version of the editor are the following: 1. Malfunction. It inserts a malfunction (e.g. pump trip, fouling in heat exchangers, etc.). 2. Local Action. It modifies a local action of the simulated process (e.g. valves position). 3. External Parameter. It modifies an external parameter to the simulated process (e.g. ambient temperature). 4. Cancel Malfunction. It removes an active malfunction. 5. Snapshoot. It takes a snapshot of the simulator state, so the trainee can return to this point in any moment. 6. Message. It sends a message to the trainee. 7. Conditional Message. It sends a message to the trainee requesting an operative action to satisfy certain condition (e.g. the trainee needs opening the fuel valve to increase the steam pressure in the boiler). 8. Timer. It establishes a time period waiting for a trainee action. 9. True/False Question. It asks a true/false question to the trainee. 10. Multiple Choice Question. It asks a multiple option question to the trainee.
9 A Standalone System to Train and Evaluate Operators of Power Plants
115
Fig. 9.2 Simulation exercise
9.5
Implementation and Results
The purpose of the developed system (simulator-expert system) is to provide a standalone application where operative personnel of the power plants can practice and evaluate themselves without the supervision of a human instructor; this practice will be a complement of the regular training courses carried out at the CENAC-I, due to this, the training exercises are designed with a very specific objective and a duration among 20 and 60 min. Figure 9.3 shows in a simple way the proposed approach to develop and implement the training exercises. The design stage of the exercise is carried out by a qualified instructor of the CENAC-I. In this phase the instructor (based on operation procedures) develops a training exercise with a specific objective, for instance, boiler
116
J. Tavira-Mondrago´n et al.
Fig. 9.3 Implementation of an exercise
warming from ambient temperature up to 120 C. The process of warming the boiler from ambient temperature up to nominal conditions (540 C) requires at least of 6 h, but instead of having very long exercises, which can discourage or bore to the trainees, the proposed exercise has a duration of 30 min, and the goal is to train and evaluate to the trainee in specific operative actions related with the startup of the oil system and the initial stage of boiler warming. Additional exercises can be prepared to practice the complete process of boiler pressurization up to nominal conditions. As a part of the simulation scenario, the instructor defines the initial condition of the simulation system, creates the questions for the theoretical evaluation, assigns the corresponding multimedia material for the theoretical lessons, and selects the process variables for the simulation evaluation, for each one of these variables, the instructor defines a weight over the final grade, and also defines a time penalization in the case of exceeding the expected time to carry out the exercise. Finally with the exercises editor, the instructor “transforms” the operation procedure in a group of rules to guide/track the trainee operation supported by the expert system. During a training session with the simulation system, the trainee has a graphical interface to perform by himself his practice. Figure 9.4 shows a partial view of this interface, its main parts are: 1. Trainee functions. These are the functions available for the trainee during a simulation session, and they are: (a) Selecting the simulator and the training exercise; (b) Running/stopping the simulation; (c) Selecting snapshots; (d) Reloading the
9 A Standalone System to Train and Evaluate Operators of Power Plants
117
Fig. 9.4 Partial view
interactive process diagrams; (e) Displaying theoretical information related with the exercise scope, and application of the theoretical evaluation: (g) Selecting a new user. 2. Trainee auxiliary window. This window displays the theoretical information of the exercise or it can be utilized to display a trend graph or an interactive process diagram during simulation. 3. Messages window. This window displays to trainee all the processed information by the expert system, for instance, if it is required a trainee action, the expert system displays a message, if the trainee carries out the right action, the message changes of color; if the trainee does not carry out the action, the expert system verifies the timer specified for the action, and it sends a warning message to continue the simulation or abort and initiate again the practice. All these options are fully configurable in the exercises editor. When some of the criteria specified for finishing the exercise are achieved, the simulation system calculates the grade and shows the result to the trainee; after that, the simulation system is ready to start the execution of a new exercise.
J. Tavira-Mondrago´n et al.
118
The results of each one of the simulation exercises are stored in a database and they are available for the consult of the instructors of the CENAC-I. The main stored results are: trainee identification, grades of the theoretical test and the simulation practice and total elapsed time. Currently have been developed and tested two exercises for each one of the power plant simulated. For the coal-fired power plant simulator, the exercises are: boiler warming from ambient temperature up to 120 C, and malfunction detection of regenerative air heaters. In the case of the combined cycle power plant simulator, the exercises are startup of feed water system, and turning and synchronization of the steam turbine.
9.6
Conclusions
With the aim of having a portable simulator, in which power plant operators can practice onsite, without assistance of an instructor, was developed a simulation system which includes an expert system, a 350 MW coal-fired power plant simulator and a 450 MW combined cycle power plant simulator. These simulators were integrated in a common simulation environment, and they kept their detailed mathematical modeling, their full-scope and real-time features, and provide a suitable human machine interface to train operators of modern power plants. In this simulation environment were implanted an expert system and a group of functions to train and evaluate the trainees in theoretical and practical aspects of the simulated process. Due to the nature of the knowledge acquisition process, the ruled-based approach was selected for knowledge representation. To create the simulation exercises for the trainee, a graphic editor was developed to design and implant in the simulator such exercises, thanks to this editor, the tasks related with coding of rules was replaced by a process of selecting, dragging and customizing the required blocks, which simplifies enormously the creation of exercises. Each one of the simulation exercises can include theoretical material to study, evaluation of theoretical concepts, guide and supervision via an expert system, and the corresponding evaluation. All these options are fully configurable in the exercises editor. The final objective of these simulation exercises is to provide to operators on site with a tool to practice specific operation actions, as a complement of the regular training programs which they carry out at the CENAC-I. The whole system has been validated with the development and test of four simulation exercises for normal operation and malfunctions, and it is expected the simulation system will be sent to the power plants once a complete group of exercises is developed, according to the current training requirements of the CFE. Acknowledgment The authors would like to thank all the personnel of the IIE and CENAC-I who participated in the project development. The simulation environment is proprietary software of the IIE, and it was customized to the particular requirements of this project.
9 A Standalone System to Train and Evaluate Operators of Power Plants
119
References Arjona Lo´pez M, Hernandez FC, Gleason GE (2003) An intelligent tutoring system for turbine startup training of electrical power plant operators. Exp Syst with App 1:95–101 Chan PF (1996) An expert system for diagnosis of problems in reinforced concrete structures. Master of Applied Science Thesis. Royal Melbourne Institute of Technology, Australia CLIPS, A Tool for Building Expert Systems [Online]. Available: http://clipsrules.sourceforge.net International Atomic Energy Agency (1998) Selection, specification, design and use of various nuclear power plant training simulators, IAEA-TECDOC-995, pp 2–3 International Atomic Energy Agency (2004) Use of control room simulators for training of nuclear power plant personnel, IAEA-TECDOC-1411, p 7 Liao S (2004) Expert system methodologies and applications-a decade review from 1995 to 2004. Exp Syst with App 28:93–103 Olmstadt W, (2000) Cataloging expert systems: optimism and frustrated reality. J South Acad Sp Lib ISSN: 1525-321X Seifi H, Seifi A (2002) An intelligent tutoring system for a power plant simulator. Elect Power Syst Res 3:161–171 Tavira-Mondrago´n J, Parra-Go´mez I, Martı´nez-Ramı´rez R, Tellez-Pacheco J (2005) 350 MW fossil power plant multiple simulator for operators training. Proceedings of the Eighth IASTED International Conference 2005, 29–31 August, Oranjestad, Aruba Tavira-Mondrago´n J, Parra-Go´mez I, Melgar-Garcı´a J, Cruz-Cruz R, Tellez-Pacheco J (2006) A 300 MW fossil power plant part-task simulator. Proceedings of the 2006 summer simulation conference, July 30–August 3, Calgary, Canada Tavira-Mondrago´n J, Melgar-Garcı´a J, Garcı´a-Garcı´a J, Cruz-Cruz R (2009) Upgrade of a fullscope simulator for fossil-fuel power plants. Proceedings of the spring simulation conference 2009, SpringSim´09, 22–27 March, San Diego, USA Tavira-Mondrago´n J, Jime´nez-Fraustro L, Romero-Jime´nez G (2010a) A simulator for training operators of fossil-fuel power plants with an HMI based on a multi-window system. Int J of Comp Aided Eng and Tech 1:30–40 Tavira-Mondrago´n J, Martı´nez-Ramı´rez R, Jime´nez-Fraustro F, Orozco-Martı´nez R, Cruz-Cruz R (2010b) Power plants simulators with an expert system to train and evaluate operators. Proceedings of the World Congress on Engineering and Computer Science 2010, WCECS 2010, 20–22 October, San Francisco, USA Zabre E, Rolda´n-Villasana E, Romero-Jime´nez G, Cruz R (2009) Combined cycle power plant simulator for operator’s training. Proceedings of the World Congress on Engineering and Computer Science 2009, WCECS 2009, 20–22 October, San Francisco, USA
Chapter 10
Lube Oil Systems Models for Real Time Execution Used on a Training Power Plant Simulator Edgardo J. Rolda´n-Villasana, Yadira Mendoza-Alegrı´a, and Iva´n F. Galindo-Garcı´a
10.1
Introduction
The Mexican Electric Research Institute (IIE1, http://www.iie.org.mx) was established in 1975. Since then, it has been the R&D support of the Mexican Electric Utility Company (CFE) offering solutions and innovations in the technical area. The Simulation Department (GS) of the IIE is a technical group specialized in the development of training simulators. They design and implement tools and methodologies to sustain the simulators development and operation. The GS has developed several training simulators devoted to train power plant operators. To satisfy their training requirements, CFE has acquired simulators based on control panels, classroom simulators, portable simulators and, recently, simulators based on multi-window man machine interfaces (MMI) as control screens. The use of real time full scope simulators has proven to be effective and reliable for training operators. The operators can learn to operate the power plant more efficiently in transients or maneuvers such as lowering of the heat rate or reducing the power required by auxiliary equipment (Hoffman 1995). Even simulators without full scope are being successfully used (Fray and Divakaruni 1995). In 2007 CFE sought to have a combined cycle simulator (SCC) using the open architecture of the IIE products. It was decided the new simulator to be developed in two stages: the gas-turbine part in 2007 and the steam-heat recovery part in 2009 (Zabre et al. 2009). The description of the lube oil systems models of the combined cycle simulator (CCS) is presented in this chapter, based in a previous work (Mendoza-Alegrı´a and Rolda´n-Villasana Edgardo 2010).
1
Some acronyms are after the name or phrase spelling in Spanish
E.J. Rolda´n-Villasana (*) Departamento de Simulacio´n, Instituto de Investigaciones Ele´ctricas, 62440 Cuernavaca, Morelos, Mexico e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_10, # Springer Science+Business Media, LLC 2011
121
E.J. Rolda´n-Villasana et al.
122
10.2
Description of the Simulator
10.2.1 Simulation Sessions Scope The simulator is controlled by an instructor from his own console. The operators train using control screens, which are replica of the real ones. The simulator allows defining simulation scenarios: an initial condition is established (cold start up, 100% of load, etc.) and an assignment is imposed to the operator. The instructor tracks the simulation session and, without the knowledge of the operator, introduces malfunctions to evaluate the operator´s response. A malfunction is a simulated failure of plant equipment, for example: heat exchangers tube ruptures, pumps trips, obstruction of filters, etc. The instructor has the option to define a delay, permanence time, severity, and evolution time of a malfunction. If the operator asks for an action that in the real plant is not made in the control room but is executed by the auxiliary operators in the field, the instructor may execute it on his console. The instructor can modify the external conditions: atmospheric pressure and temperature (dry and wet bulb), voltage and frequency of the external grid, fuel composition, etc. The “ANSI/ISA S77.20-1993 Fossil-Fuel Power Plant Simulators Functional Requirements” norm was adopted as a design specification, including the real time execution.
10.2.2 Hardware Four Personal Computers, interconnected through a fast Ethernet Local Area Network, constitute the simulator. Each PC has a Pentium D processor with 3.6 GHz, 2 GB of RAM, 160 GB HD, and Windows XP as operating system. Figure 10.1 shows a schematic of this architecture.
10.2.3 Software WindowsXP is the operating system. The simulator was programmed in MS Visual Studio 2005: Fortran Intel was used for the mathematical models, and Flash and VisSim for the gas turbine screens. The steam turbine screens were duplicated from the real plant control screens, and C# was the basis for the modules of the simulation environment called MAS (proprietary software of the IIE), where the instructor controls the simulation sessions (Fig. 10.2).
10
Lube Oil Systems Models for Real Time Execution. . .
123
Control Room Operation Station1 (OS1)
Operation Station2 (OS2)
Lan switch Maintenance node (MN)
Instructor Console (IC)
Fig. 10.1 Hardware architecture
Fig. 10.2 Main display of the instructor console (Partial view)
10.3
Modeling Methodology
10.3.1 Processes The processes models are a set of algebraic and differential equations obtained from the application of basic principles (energy, momentum and mass balances and well-known and proved relations). The IIE methodology may be summarized as: (a) Information of the process is obtained and classified, (b) The information is analyzed to create a conceptual model, (c) Reasonable simplifications to the system are made to get a simplified diagram, (d) Main assumptions are stated and justified, (e) The flows and pressures network (FPN) configuration is obtained, (f) Energy balances are programmed using generic models, (g) Equipment’s models out the limits of the hydraulic network are parameterized considering the generic models available in the simulator libraries or developing the appropriate models, (h) Local tests are performed and, if necessary, corrections to the models are
E.J. Rolda´n-Villasana et al.
124
made, (i) Integration between all the different systems and their controls are performed, ( j) Global tests are made with the proper adjustments, (k) Final acceptance tests are performed by the final user according to his own procedures.
10.3.2 Control The control models acquire and process the actions performed by the operator in the control room by manipulating the control screens. The control models are a reproduction of the actual logic of the plant.
10.4
Lube Oil Systems Description
The components of the gas turbine lubrication oil system (GTLOS) and the steam turbine lube oil system (STLOS) are: AC motor pumps, emergency DC motor pumps, lubrication oil tanks, electric oil heaters, lube oil coolers, oil mist eliminators (extractors), oil filters, and valves. Additionally, the GTLOS has a pressure control for the lubricated parts. The STLOS has oil ejectors, an oil conditioner, and a main pump mounted on the steam turbine shaft. The systems provide a continuous supply of filtered oil at the pressure and temperature required for lubrication. In normal operation conditions of the GTLOS, the oil is supplied from the lube oil tank to the bearings through the main pump; the backup pump operates when the main pump cannot keep operating requirements. An emergency DC pump is only used as an ultimate option to ensure that the gas turbine slows down safely when a loss of pressure in the oil system, indicated by the pressure in the header bearings, occurs. During start up and shut off periods, the steam turbine lube oil requirements are supplied by the auxiliary pump and, when the turbine is at rated speed, the main oil pump supplies oil to the bearings. For both systems the oil is taken from the main oil tank, passes through a water cooler, is directed to the bearings and finally flows back to the tank. The GTLOS is linked to the Gas Turbine Seal Oil System (GTSOS) through a valve that supplies seal oil to the generator when the seals pump fails.
10.5
Modeling of the Systems
A model is a mathematical representation of the behavior of the real system variables of the power plant. Models of oil systems, from the point of view of operation training, are not frequently reported in the literature because they belong
10
Lube Oil Systems Models for Real Time Execution. . .
125
to the companies that provide the simulators or training services, and therefore, it is proprietary information (Vieira et al. 2008). Nevertheless, some design and analysis simulation approaches for oil system may be found (L€uckmann et al. 2009; Flores et al. 2007; D’Souza and Childs 2002; Wasilczuk and Rotta 2008) (out of the training modeling approach). In the SCC, a model responds to the operator’s action in the same way that the real systems do (in tendency and time), at least as required by the ANSI norm. The IIE has adapted different public solution methods for both, linear and nonlinear equations (Newton–Raphson, Gaussian elimination and bisection partition search) and they are used depending on the structure of each particular model. The differential equations are numerically solved with one of various methods (e.g. Euler, trapezoidal rule with one or two corrections). Fundamental conservation principles were used for the modeling considering a lumped parameters approach and widely available and accepted empirical relationships. The independent variables are associated with the operator’s maneuvers (actions on valves, pumps, etc.) and with the control signals from the distributed control system (DCS). A generic model (GM) constitutes a standard tool with some built-in elements (in this case, routines), that represent an equipment or system and may facilitate its adaptation to a particular case. The GM, as developing tools, allows reducing the developing time of a simulator. The GS have developed a series of GM from single accessories as junction nodes or ejectors; complex equipments as air condensers, deaerators, or boilers; to numerical methods that have been listed elsewhere (Rolda´n-Villasana et al. 2009).
10.5.1 Flows and Pressures Modeling The flows and pressures generic model ( flupre) is derived from the momentum equation applied in each stream and the continuity equation for each node of a hydraulic network. The term of transient acceleration is neglected and the forces acting on the fluid are considered instantly balanced. A 1D model may be stated integrating the momentum equation along a stream: DP ¼ X Dt þ r g z
(10.1)
Here, P is the pressure, X the length, r the density, g the gravity acceleration, z the height, and t the viscous stress tensor that may be evaluated using empirical expressions for any kind of element. For example, for pumps it may be assumed that: 0
0
0
DP ¼ K1 w2 þ K2 w o þ K3 o2 r g z
(10.2)
E.J. Rolda´n-Villasana et al.
126
There, K’i are constants, w the mass flow, and o the angular speed of the pump. For non rotating elements the equation may be stated as: w2 ¼ K 0 r Apg ðDP þ rgzÞ
(10.3)
Ap is the aperture of the valves but it may also represent a variable resistance to the flow. Exponent g represents the characteristic behavior of a valve. Other components (such as turbines) may also be represented by Flupre. Thus, by applying the momentum equation on each piece of equipment and a mass balance on each node, a system of equations is stated where the flows and pressures are the unknown variables to be solved. Flupre may detect automatically the varying network topology and establish the proper set of equations.
10.5.2 Energy Balances An energy balance is made where heat exchangers exist and in nodes where a temperature or enthalpy is required to be displayed or to be used in further calculations. In the lube oil models, two types of heat exchangers were included: fluid-fluid exchangers and fluid-metals exchangers. The fluid (water and air) exchangers were modeled considering a heat flow: q ¼ U A DT
(10.4)
Here q is the heat flow, A the transfer area, T the temperature, and U the global heat transfer coefficient which depends on the fluid properties, the flow rates, and the construction parameters of the equipment. The characteristic temperature difference between hot and cold streams is the logarithmic mean temperature difference. So, for each fluid, an energy balance may be stated (considering the heat capacity Cp as a constant) to calculate the exit temperature:
q T o ¼ Ti w Cp
(10.5)
Subscripts o and i are for output and input, respectively. To avoid thermodynamic inconsistencies in the exchanger, e.g. if temperature crossing is detected, and the heat calculated by (10.4) is too high, the limiting heat, which depends on the cold and the hot streams, and the maximum heat permitted by the second law are identified. Heat is re-calculated arranging (10.5). Outlet temperatures are re-calculated. Clearly, an iterative procedure may be established in order to obtain a converged solution. The model for the metals cooled with oil was based on the assumption that turbine and generator metals are heated by friction and electric current. The temperature
10
Lube Oil Systems Models for Real Time Execution. . .
127
Fig. 10.3 Schematic metal temperature model
Tg qg qatm Tm
qf
Tf
from this generated heat Tg is represented by virtual temperatures (TI for the current and To for the speed) that are calculated in order to simulate these heating effects (Fig. 10.3). With this temperature Tg, the heat qg represents the transferred energy due the frictional forces and/or the generated electric current (Pasquantonio and Macchi 1976): qg ¼ qo þ qI ¼ ko ðTo Tm Þ o0:4 þ kI ðTI Tm ÞI2
(10.6)
So, the temperature of the compressor may be calculated by integration of next equation: dTm qg qf qatm ¼ dt m Cp
(10.7)
Being t the time, m the mass, the subscripts f and atm are for the fluid and atmosphere respectively. The heat flow is calculated with (10.4).
10.5.3 Capacitive Nodes Two kinds of capacitive nodes exist in the lube oil systems. The first one (thermal node) is part of the FPN whose pressure is calculated as explained before. In this case, because no phase change is simulated for the oil, temperature is calculated by integrating next equation (the state variable could be the enthalpy especially if a phase change is expected): dT ¼ dt
P
wi Ti wo T qatm m Cp m
(10.8)
E.J. Rolda´n-Villasana et al.
128
The mass of the lube oil in the tanks is obtained from a mass balance on the liquid phase: X dm X wo ¼ wi dt
(10.9)
The volume of the liquid Vl is: Vl ¼
m r
(10.10)
The oil level is calculated with an appropriate correlation depending on the tank’s geometry. The second kind of capacitive node is one that is a boundary of the FPN. In this category are boilers, condensers, deaereators, and other equipment related with phenomena involving water/steam operations. The state variables depend on each particular case. This equipment was not used in the lube oil systems; only closed tanks with oil and air were modeled. The tank is considered to have two separated phases with heat transfer between them. The air pressure is calculated based on an ideal gas behavior: Pa ¼
ma R Ta Ma Va
(10.11)
Ma is the air molecular mass and R the ideal gas constant. Equations (10.8) and (10.9) apply also for the air phase. Humidity in the air is neglected. Extractors that maintain the vacuity in the tanks are formulated with (10.2).
10.6
Modeled Systems
In this section some particular considerations of each model are presented. Some simplifications are included in the generic models (e.g., density and heat capacity are constant). The control models associated with the systems were modeled in an independent way (the activation of the automatic actions and the alarms behavior were included in these models). Additional GM are employed to solve for the angular speed and current of the pump motors.
10.6.1 Gas Turbine Lube and Seal Oil Systems Model The GTLOS and GTSSO were initially simulated into a single module. For the tanks, the extractors were included. The bearing metals and the heat exchangers were considered. The dragging of the hydrogen into the oil was neglected.
10
Lube Oil Systems Models for Real Time Execution. . .
129
Mathematical instabilities appeared because the magnitude order of the flow rates is very different for the both oil networks. To avoid the numerical problems, two independent sets of equations were solved, one for each system. In Fig. 10.4 a schematic diagram of the networks is presented. The dotted lines represent the common flow rates calculated out of the flupre models. In Fig. 10.5 is presented the control screen of the GTLOS.
10.6.2 Steam Turbine Lube Oil Model The STLOS model includes pipelines, pumps, heat exchangers, rotor bearings and valves (Fig. 10.6). There were excluded the oil ejectors at the suction of the main pump resulting in taking suction directly from the main tank and including their pressure effects in the main oil pump operation curve. The operator monitoring and control screen is shown in Fig. 10.7.
10.7
Upling and Testing
An important task is to define the causal relation of the models, i.e. to specify for each model all the variables (inputs and outputs) that connect the mathematical models, the controls, the operator console, and the instructor console. The goal is to assure that all models are congruent. The variables are classified and added into a data base. This data base contains the variables declaration for all the simulator programs, parameters values, unit conversions, instruments ranges, remote functions, malfunctions, etc. Each model is tested, independently, off line without controls (open loop). In this stage some simplified controls are included to avoid problems of process instabilities; for example in a tank model it is necessary to control the level to avoid it becomes empty or shedding. The idea of the tests is to reproduce the design data at 100% of capacity. The input variables must have an initial value (initial condition). For the coupling process, the control models are integrated into the MAS without the process models and they are tested to verify its dynamics, including the control screens. The models are added individually according to a predefined order. An algorithm was developed to consider a sequence trying to minimize retarded information to avoid mathematical problems(Mendoza-Alegrı´a and Rolda´n Villasana 2008). For each model addition the initial condition is updated and some tests are performed to assure the coupling is successful. Adjustments are done to fix differences between the simulator and the real plant values. When the last model has been added and the 100% initial condition is ready, all the other initial conditions, down to the cold start initial condition, are obtained by operating the simulated plant. Full factory tests, developed by the final user, are applied to the integrated simulator. Then it is delivered to its final site with the customer.
Fig. 10.4 Gas turbine lube and seal oil systems flows and pressures network
130 E.J. Rolda´n-Villasana et al.
Lube Oil Systems Models for Real Time Execution. . .
Fig. 10.5 Gas turbine lube and seal oil systems control screen
10 131
132
Fig. 10.6 Schematic diagram steam turbine lube oil system
Fig. 10.7 Control screen of the steam turbine lube oil system
E.J. Rolda´n-Villasana et al.
10
Lube Oil Systems Models for Real Time Execution. . .
10.8
133
Results
As an example of the models scope, two scenarios are presented. Both were executed with the entire simulator coupled. No corrective action by the operator was made during the transients. The first one is related to the gas turbine. The customer provided plant data for an automatic start up procedure. The simulator results were compared with these data. In Fig. 10.8 a comparison between the expected temperatures and the simulator results is presented. The x axis is the sum of the gas turbine speed and the load of the gas turbine generator (note that the load is zero while the speed reaches its nominal speed of 377 rad/s). The selected variables were the temperatures of the control oil tank and the thrust bearing metal and oil (at the exit). The good agreement between the simulated results and the plant data is evident. The second transient consists in testing the steam turbine lube oil model under the actions reported in Table 10.1. The reported variables in Fig. 10.9 are the temperatures in the lube oil tank, at the exit of the cooler, of the metal of the third bearing, and the oil at the exit of the third bearing; and the aperture of the cooler control valve (simulated in the cooling water circuit system). From a qualitative point of view, the simulator results are in agreement with the expected behavior the plant would have if the same transients were applied on the real plant.
Temperature (°C )
100
80
60
40
20
0 0
200
Real Plant Brng Thrust
Simulator Brng Thrust
Real Plant Oil Brng Thrust
Simulator Oil Brng Thrust
Real Plant EHC Oil
Simulator EHC Oil
400
600
Speed (rad/s) + Charge (MW) Fig. 10.8 Gas turbine lube oil models results
800
1,000
134
E.J. Rolda´n-Villasana et al.
Table 10.1 Test events of the steam turbine oil models Time (s) Event 0 Simulation starts at 100% of load in steady sate with the lube oil cooler control valve stuck at a position of 33% (a malfunction) 30 The electric oil heater is turned on. The temperatures of the system (oil and metals) increase 830 The electric oil heater trips automatically when the oil temperature in the tank reaches 80 C. The temperatures in the tank and at the exit of the cooler stop augmenting The metal and oil temperatures at the exit of the bearing keep raising until an inflexion point is reached at 830 s 1,200 The steam unit trips due to high temperature in the turbine metals (the 3rd bearing metal temperature reaches 89 C). The metal and oil temperatures continue increasing because the turbine speed, although lowering, continues with a high value 1,290 The cooler control valve is released from its malfunction. The valve opens quickly trying to control the oil temperature (at 40 C). All the metal and oil temperatures descend
Fig. 10.9 Steam turbine lube oil models results
References Hoffman S (1995) A new era for fossil power plant simulators. EPRI J 20(5):20–27 Fray R, Divakaruni M (1995) Compact simulators can improve fossil plant operation. Power Eng 99(1):30–32 (ISSN 0032-5961, United States) Zabre E, Rolda´n-Villasana EJ, Romero-Jime´nez G, Cruz R (2009) Combined cycle power plant simulator for operator’s training. The World Congress on Engineering and Computer Science, San Francisco, pp 20–22 Mendoza-Alegrı´a Y, Rolda´n-Villasana EJ (2010) Oil systems modeling for an operators’ training combined cycle plant simulator. The world congress on engineering and computer science, ISBN: 978-988-17012-0-6, San Francisco, pp 20–22 Vieira L, Matt C, Guedes V, Cruz M, Castelloes F (2008) Optimization of the operation of a complex combined-cycle cogeneration plant using a professional process simulator.
10
Lube Oil Systems Models for Real Time Execution. . .
135
Proceedings of IMECE2008. ASME International mechanical engineering congress and exposition, October 31–November 6, 2008, Boston, pp 787–796 L€ uckmann JA, Vinicius AM, Barbosa RJ (2009) Analysis of oil pumping in a reciprocating compressor. Appl Therm Eng 29(14–15):3118–3123 Flores P, Ambrosio J, Claro JCP, Lankarani HM (2007) Study of the influence of the revolute joint model on the dynamic behavior of multibody mechanical systems modeling and simulation. Proceedings of the ASME 2007 International design engineering technical conferences & computers and information in engineering conference, Las Vegas, pp 367–379 D’Souza RJ, Childs DW (2002) A comparison of rotordynamic-coefficient predictions for annular honeycomb gas seals using three different friction-factor models. J Tribol 124–3:524–529 Wasilczuk Micha, Rotta Grzegorz (2008) Modeling lubricant flow between thrust-bearing pads. Tribol Int 41:908–913 Rolda´n-Villasana EJ, Cardoso Ma J, Mendoza-Alegrı´a Y (2009) Modeling methodology for operators training full scope simulators applied in models of a gas-turbine power plant. Memorias del 9o. Congreso Interamericano de Computacio´n Aplicada a la Industria de Procesos, 25 al 28 de agosto, Montevideo, pp 61–66 Pasquantonio FD, Macchi A (1976) Mathematical model and boundary conditions in stress analysis relating to steam turbine rotors, under transient operating conditions. Int J Numer Meth Eng 10(2):345–360 Rolda´n-Villasana EJ, Mendoza-Alegrı´a Y, Zorrilla-Arena S, Jorge J, Cardoso G, Jesu´s Ma, CruzCruz R (2008) Development of a gas turbine full scope simulator for operators’ training. European Modelling Symposium, Second UKSim European Symposium on Computer Modelling and Simulation, UK Simulation Society, 6–8 September, Liverpool, England, 2008
Chapter 11
Stochastic Analysis and Particle Filtering of the Volatility Hana Baili
11.1
Introduction
Let S ¼ ðSt Þt2IRþ be an IRþ -valued semimartingale based on a filtered probability space ðO; F ; ðF t Þt2IRþ ; IPÞ which is assumed to be continuous. The process S is interpreted to model the price of a stock. A basic problem arising in Mathematical Finance is to estimate the price volatility, i.e., the square of the parameter s in the following stochastic differential equation dSt ¼ mSt dt þ sSt dW t where W ¼ ðW t Þt2IRþ is a Wiener process. It turns out that the assumption of a constant volatility does not hold in practice. Even to the most casual observer of the market, it should be clear that volatility is a random function of time which we denote st2. Itoˆ’s formula for the return yt ¼ logðSt =S0 Þ yields dyt ¼
m
s2t dt þ st dW t 2
y0 ¼ 0
(11.1)
The main objective is to estimate in discrete real-time one particular sample path of the volatility process using one observed sample path of the return. As regards the drift m, it is constant but unknown. Under the so-called risk-neutral measure, the drift is a riskless rate which is well known; actually one finds that m does not cancel out, for instance, when calculating conditional expectations in a filtering problem. For this argument no change of measure is required, we work directly in the original measure IP, and m has to be estimated from the observed sample path of the return as well. H. Baili (*) E´cole Supe´rieure d’E´lectricite´, Department of Signal Processing and Electronic Systems, 3 rue Joliot Curie, Plateau de Moulon, 91192 Gif sur Yvette, France e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_11, # Springer Science+Business Media, LLC 2011
137
138
11.2
H. Baili
A Model for the Stochastic Volatility
Let ðzt Þt2IRþ be an arbitrary IR-valued process; at the moment, this is not the unknown process st2 of instantaneous volatility. Let us assume prior information about the process zt: wide sense stationarity and a parametric model for its covariance function gðtÞ ¼ D expðajtjÞ
t 2 IR
for some constants D, a > 0. Then the spectral density of zt is given by the formula 1 GðoÞ ¼ 2p
ð gðtÞ expðjotÞdt ¼ IR
1 2D a 2p o2 þ a2
pffiffiffiffiffiffiffi where j ¼ 1. The spectral density G(o) may be rewritten as GðoÞ ¼ where Hð joÞ ¼
2 1 Hð joÞ 2p Fð joÞ
o 2 IR
pffiffiffiffiffiffiffiffiffiffi 2D a and Fð joÞ ¼ jo þ a. Notice now that FðsÞ ¼
HðsÞ FðsÞ
s 2 IR
represents the transfer function of some temporally homogeneous linear filter; this filter is furthermore stable as the root of F(s) is in the left half-plane of the complex variable s. Recalling that 1/2p is the spectral density of a white noise with unit intensity, we come to the conclusion that zt IE½zt may be considered as the response of the filter whose transfer function is F(s), to a zero-mean white noise with unit intensity. The differential equation describing such a filter is _ þ auðtÞ ¼ uðtÞ
pffiffiffiffiffiffiffiffiffiffi 2D a wðtÞ
where w(t) and u(t) are respectively the input and the output of the filter. Setting m ¼ IE½zt and zt m ¼ uðtÞ, the process zt solves the following SDE dzt ¼ aðzt mÞ dt þ
pffiffiffiffiffiffiffiffiffi 2Da dW t
Let the IR-valued process ð~zt Þt2IRþ solve the SDE d~zt ¼ a~zt dt þ
pffiffiffiffiffiffiffiffiffi 2Da dW t
t>0
11
Stochastic Analysis and Particle Filtering of the Volatility
139
with given initial condition ~z0 . ~zt is wide sense stationary with zero mean and correlation function zðtÞ~ zðt tÞ ¼ D expðajtjÞ r ~z ðtÞ ¼ IE½~ Consider now the process j~zt j, written xt. Then xt is a solution of dxt ¼ axt dt þ
pffiffiffiffiffiffiffiffiffi 2Da dW t
t>0
(11.2)
with reflection on the boundary {0} of its state space IRþ . Starting from any fixed point strictly greater than zero, xt reaches this boundary by a predictable stopping time with finite expectation because of the negative sign of the drift. The initial condition x0 is a random variable with known distribution since x0 ¼ j~z0 j. It is worthwhile to note the ergodicity of the Markovian process xt, with stationary distribution density 2 x2 pðxÞ ¼ pffiffiffiffiffiffiffiffiffi exp 2D 2pD
x 2 IRþ
(see pages 55–57 of Skorokhod 1989). This is beyond our expectation in view of the required wide sense stationarity. It follows that for each x 2 IRþ , for any bounded continuous function f on IRþ ð lim IEx ½ f ðxt Þ ¼
t!1
f ðxÞpðxÞ dx IRþ
and in particular 2D lim IEx ½xt ¼ pffiffiffiffiffiffiffiffiffi 2pD
t!1
Obviously, the second order moment for xt coincides with that of ~zt , namely IE x2t ¼ IE ~z2t ¼ D Let us compute rx(t), the correlation function of xt, for t 6¼ 0 IE½ xðtÞxðt tÞ ¼ IE½ j~ zðtÞ~ zðt tÞj ¼ IE½~ zðtÞ~ zðt tÞIPfeven passage number by 0g IE½~ zðtÞ~ zðt tÞIPfodd passage number by 0g ¼ C expðajtjÞ for some constant 0 < C < D. Note the discontinuity of rx(t) at t ¼ 0. We shall freely call the process xt or equivalently the SDE (11.2) our stochastic volatility model. We just
140
H. Baili
~ since it is independent have to denote the Wiener process in (11.2) differently, say W, of the Wiener process in (11.1). It should be noted that this type of correlation function: C expðajtjÞ t 6¼ 0 a>0 0
11.3
Filtering
Now we consider the filtering problem associated to the couple (xt, yt): we have noisy nonlinear observations of xt, the IR-valued discrete-time process of returns (yn)n ¼ 1, 2, . . . indexed at irregularly spaced instants t1, t2, . . . . The observation times are assumed to be rigourously determined. The observations process is related to the state process ðxt Þt2IRþ via the conditional distribution IPfyn 2 Gjy1 ; :::; yn1 ; ðxt : 0 t tn Þg n 1 for G a Borel-measurable set from IR. For homogeneity of notation we set t0 ¼ 0 so that yn¼0 ¼ yt¼t0 ¼ 0. Now look at the distribution above and recall that yn ¼ y(tn) and that the process yt solves the SDE
pffiffiffiffi xt dt þ xt dW t dyt ¼ m 2 This is (11.1) where st is denoted yt ¼ yn1 þ
y0 ¼ 0
(11.3)
pffiffiffiffi xt . For t tn 1
ðt
m
tn1
xs ds þ 2
ðt
pffiffiffiffi xs dW s
(11.4)
tn1
and thus IPfyn 2 Gjy1 ; :::; yn1 ; ðxt : 0 t tn Þg ¼ IPfyn 2 Gjyn1 ; ðxt : tn1 t tn Þg Given a sample path of ðxt Þtn1 ttn and the observation yn1, ðyt Þtn1 ttn is a Markov process with state space IR satisfying (11.4). This leads to the central concept of this section: the Fokker-Planck equation (Gardiner 1985; Risken 1989; Frank 2005; Mahnke et al. 2009). The domain of the Fokker-Planck operator: LFP pðy; tÞ ¼
x
t
2
@p
m
@y
ðy; tÞ þ
xt @ 2 p ðy; tÞ; 2 @y2
11
Stochastic Analysis and Particle Filtering of the Volatility
141
is the set of distribution densities on ðIR; BðIRÞÞ under IP. Given a sample path of ðxt Þtn1 ttn and the observation yn1, the distribution density p(y, t) of yt solves the Fokker-Planck equation @p ðy; tÞ ¼ LFP pðy; tÞ tn1
(11.5)
with the initial condition pðy; tn1 Þ ¼ dðy yn1 Þ. The formal solution of the above partial differential equation is pðy; tÞ ¼ expfðt tn1 ÞLFP gpðy; tn1 Þ Since LFP is a sum of two non commuting operators, the exponential operator expfðt tn1 ÞLFP g cannot be expressed as simple products of terms involving each of these. Nevertheless, the solution of the Fokker-Planck equation is obtained using the Trotter product formula (Valsakumar 1983). For two arbitrary operators A and B
nt o n t on expftðA þ BÞg ¼ lim exp A exp B n!1 n n Then the solution of (11.5) is the limit as n ! 1 of
n rðt tn1 Þ d Rðt tn1 Þ d 2 exp exp dðy yn1 Þ dy2 n dy n
where r¼
xt m 2
R¼
xt 2
For algebraic manipulations we use the integral representation of the delta function and write the solution of (11.5) as 1 pðy; tÞ ¼ lim Y n!1 2p
ð þ1
n
1
expfjzyg expfjzyn1 g dz
where Y ¼ exp
rðt tn1 Þ d Rðt tn1 Þ d 2 exp dy2 n dy n
We claim that
Rðt tn1 Þ d 2 Rðt tn1 Þ 2 z jzy expfjzyg ¼ exp exp n n dy2
142
H. Baili
rðt tn1 Þ d rðt tn1 Þ expfjzyg ¼ exp jz jzy exp n dy n Therefore Rðt tn1 Þ 2 rðt tn1 Þ z jz jzy Y expfjzyg ¼ exp n n
Yn expfjzyg ¼ exp Rðt tn1 Þz2 rðt tn1 Þjz jzy and thus 1 pðy; tÞ ¼ 2p
ð þ1 1
expfRðt tn1 Þz2 þ jz½y þ yn1 rðt tn1 Þg dz
Let Z be a Gaussian random variable and c(u), u 2 IR, be its characteristic function: cðuÞ ¼ IE½expfjuZg
) ðz IE½ZÞ2 expfjuzg exp ¼ ð2pnVar½ZÞ dz 2nVar½Z 1 u2 ¼ exp ju IE½Z nVar½Z 2 12
ð þ1
(
Then 1 pðy; tÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi cðy þ yn1 rðt tn1 ÞÞ 2 pRðt tn1 Þ with IE½Z ¼ 0 nVar½Z ¼
1 2Rðt tn1 Þ
and hence we obtain for tn 1 t tn ( ) 2 ½y þ yn1 þ m x2t ðt tn1 Þ 1 pðy; tÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi exp 2xt ðt tn1 Þ 2pxt ðt tn1 Þ
11
Stochastic Analysis and Particle Filtering of the Volatility
143
11.3.1 Conditional Density Characterization: The Optimal Filter The optimal estimate – in a sense of the mean square – of f(xt) given the observations y1,...,yn1 up to time t is the conditional expectation IE½ f ðxt Þjy1 ; :::; yn1
tn1 t
n1
for all reasonable functions f on IRþ . We assume that IPfxt xjy1 ; :::; yn1 g possesses a density with respect to the Lebesgue measure l on IRþ : Pxt jy1 ;:::;yn1 ðxÞ ¼
dIPfxt xjy1 ; :::; yn1 g lðdxÞ
Now look at the SDE (11.2), the Fokker-Planck operator for xt is LFP pðxÞ ¼ apðxÞ þ aðx mÞp0 ðxÞ þ Dap00 ðxÞ The domain of this operator is the set of distribution densities p(x) on ðIRþ ; BðIRþ ÞÞ, under IP, satisfying mpð0Þ Dp0 ð0Þ ¼ 0 This is due to the reflection of the process xt on the boundary {0} of its state space IRþ . It follows that the posterior distribution density Pxt jy1 ;:::;yn1 ðxÞ for tn1 t < tn, n 1, solves the Fokker-Planck equation @p ðx; tÞ ¼ LFP pðx; tÞ @t
tn1 < t < tn
i.e., @p @p @ 2p ðx; tÞ ¼ apðx; tÞ þ aðx mÞ ðx; tÞ þ Da 2 ðx; tÞ @t @x @x
(11.6)
with the initial condition pðx; tn1 Þ ¼ Pxðtn1 Þjy1 ;:::;yn1 ðxÞ
(11.7)
and the boundary condition mpð0; tÞ D
@p ð0; tÞ ¼ 0 @x
This is a static relation for x ¼ 0, i.e., it holds for any t ∈ [tn1, tn].
(11.8)
144
H. Baili
At each observation instant tn, n 1, P xðtn Þjy1 ;:::;yn ðxÞ solves the Bayes rule P xðtn Þjy1 ;:::;yn ðxÞ / Pxðtn Þjy1 ;:::;yn1 ðxÞPyn jy1 ;:::;yn1 ;xðtn Þ¼x ðyn Þ
(11.9)
where 1 Pyn jy1 ;:::;yn1 ;xðtn Þ¼x ðyn Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2pxðtn tn1 Þ ( 2 ) yn þ yn1 þ ðm 2xÞðtn tn1 Þ exp 2xðtn tn1 Þ and Pxðtn Þjy1 ;:::;yn1 ðxÞ is the solution of (11.6–11.8) as t " tn.
11.4
Identification
It follows from (11.3) that the variation process [y]t of yt is given by ½yt ¼
ðt xs ds 0
thus ½ytn ½ytn1 ¼
ð tn xs ds
n ¼ 1; 2; :::
tn1
On the other hand, so long as every duration between two successive observations is small, the following approximation holds ½ytn
Xn i¼1
ðyi yi1 Þ2
Thus ð tn tn1
xs ds ðyn yn1 Þ2
i.e., the couple of series below coincide approximatively S¼
ð tn
xs ds
tn1
n¼1;2;:::
n o S0 ¼ ðyn yn1 Þ2
n¼1;2;:::
11
Stochastic Analysis and Particle Filtering of the Volatility
145
and so do their first and second order moments. The following is the computation of the mean and the correlation function for the series S of aggregations of the instantaneous volatility on the observation intervals. To do this we need to have tn tn1 ¼ d for each n ¼ 1, 2, . . . and as mentioned above d must be small (we set d ¼ 1 time unit). ð tn
2D d xs ds ¼ pffiffiffiffiffiffiffiffiffi 2pD tn1
IE and for k ¼ 1, 2, . . . ð tn
xu du ;
IE tn1
ð tn ð tnk xv dv ¼ r x ðu vÞ du dv
ð tnk tnk1
tn1 tnk1
If we replace rx(u v) by its expression, we obtain the following formula for k ¼ 1, 2, . . . ð tn ð tnk IE xu du ; xv dv tn1
tnk1
C ¼ 2 ðexpfadðk 1Þg 2 expfadkg þ expfadðk þ 1ÞgÞ a
(11.10)
It follows that C and a may be obtained by least squares of the difference between 0 the correlation function of S , estimated from the observations, and the correlation function given by formula (11.10). The following gives an approximation for the drift parameter m in (11.1). We have yn yn1 ¼
ð tn
tn1
m
xs ds þ 2
ð tn
pffiffiffiffi xs dW s
tn1
Then ð tn 1 xs ds IR 2 tn1 Dd ¼ m d pffiffiffiffiffiffiffiffiffi 2pD
IE½ yn yn1 ¼ m d
and thus 1 D m ¼ IE½yn yn1 þ pffiffiffiffiffiffiffiffiffi d 2pD
146
H. Baili
3.5
x 104
3
Daily Stock Price
2.5
2
1.5
1
0.5
0
500
1000
1500 2000 Trading Days
2500
3000
3500
Fig. 11.1 The observed sample path for the daily price of the Hang Seng index
The Hang Seng index price of the market of Hong Kong is observed during 3,191 successive trading days from 1995 to 2007. This is plotted in Fig. 11.1. Figure 11.2 shows the daily returns yn yn1 ¼ log
Stn Stn1
n ¼ 1; :::; 3; 190
0
The empirical mean of S yields an approximation for the second order moment D of 1. 0977e–007. This approximation together with the empirical mean of the daily returns yield an approximation for the drift m of 5. 4008e–004. The constant C and the rate a that give a good fitting between the correlation function of S and its approximation are 3. 5926e–007 and 0. 0857 respectively. The model for the stochastic volatility of the stock is thus calibrated, and we now go back to filtering.
11.5
A Monte-Carlo Particle Filter
The true filter (11.6–11.9) which is optimal in a mean square sense involves a resolution of the Fokker-Planck equation. Both analytic and numerical solutions for this partial differential equation are computationally intractable. This drives us to
11
Stochastic Analysis and Particle Filtering of the Volatility
147
0.2
0.15
Daily Returns
0.1
0.05
0
−0.05
−0.1
−0.15
0
500
1000
1500 2000 Trading Days
2500
3000
3500
Fig. 11.2 The observed sample path for the daily return
an alternative Monte-Carlo filter (Baili and Snoussi 2009). We wish to approximate the posterior distribution as a weighted sum of random Dirac measures: for G a Borel-measurable set from IRþ IRfxt 2 Gjy1 ; :::; yn1 g
XK k¼1
wk Exk ðGÞ
tn1 t < tn
n1
where the particles xk are independent identically distributed random variables with “the same” law as xt; these particles are indeed samples drawn from the Euler discretization of the SDE (11.2). Here we use the well known Euler scheme since there isn’t a significant gain with more sophisticated discretization schemes. Then, for any function f on IRþ IR½ f ðxt Þjy1 ; :::; yn1
XK k¼1
wk f ðxk Þ
tn1 t < tn
n1
The weights {wk}k ¼ 1, . . . , K are updated only as and when an observation yn proceeds, each one according to the likelihood of its corresponding particle, i.e., at each observation time tn Py jy ;:::;y ;xðt Þ¼x ðyn Þ wk ¼ PK n 1 n1 n k l¼1 Pyn jy1 ;:::;yn1 ;xðtn Þ¼xl ðyn Þ where {xk}k ¼ 1, . . . , K are samples with the same law as x(tn).
148
H. Baili
30
X: 697 Y: 27.44
Square Root Volatility (percent)
25
20
15
10
5
0
0
500
1000
1500 2000 Trading Days
2500
3000
3500
Fig. 11.3 The estimated sample path for the volatility of the Hang Seng index
Besides sampling, there may be (importance) resampling at each observation time: the set of particles is updated for removing particles with small weights and duplicating those with important weights. We simulate K new iid random variables according to the distribution XK k¼1
wk Exk
Obviously, the new particles have new weights and thus give a new approximation for the posterior distribution. On the other hand, these new particles are used to initialize the Euler discretization scheme for the next sampling. The following is the remainder of implementation details of the Monte-Carlo particle filter. – Number of particles: K ¼ 1, 000 – Time step of the Euler discretization: 0.01 time unit – In practice the distribution for the initial volatility x0 is not available, here we take a uniform distribution on [e, 1] (e > 0 must be small); its density satisfies the imposed condition (11.8). The sample path of the square root volatility – in percent – is displayed in Fig. 11.3. This sample path exhibits relatively high volatilities that are clustered together round the 697th trading day; this corresponds to the Asian financial crisis of October 1997.
11
Stochastic Analysis and Particle Filtering of the Volatility
11.6
149
Conclusion
Probabilistic management of uncertainty in dynamical systems can be illustrated with a financial engineering application: volatility estimation (Baili 2010). We treat volatility as a stochastic process and construct a filter that is recursive and pathwise in observations. These two aspects are designated with the term online, or real-time, filtering. The filter output is one particular sample path of the volatility process. The main feature that makes online particle filtering possible is analytic resolution of a Fokker-Planck equation. Our method does not require data transformation, such as removing seasonality. The conformity between the implementation result – within a low simulation cost – and some practical issues prove the performance of the method to my satisfaction.
References Baili H (2010) Online particle filtering of stochastic volatility. In: Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2010, WCECS 2010, San Francisco, pp 936–941 Baili H, Snoussi H (2009) Stochastic filetring with networked sensing. In: IEEE 70th vehicular technology conference, Anchorage, Alaska, CD-ROM proceedings Frank TD (2005) Nonlinear FokkerPlanck equations: fundamentals and applications. SpringerVerlag, Berlin/New York Gardiner CW (1985) Handbook of stochastic methods for physics, chemistry and the natural sciences, 2nd edn. Springer-Verlag, Berlin/New York Mahnke R, Kaupuzˇs J, Lubashevsky I (2009) Physics of stochastic processes: how randomness acts in time. Wiley-VCH Verlag, Weinheim Risken H (1989) The Fokker-Planck equation: methods of solution and applications, 2nd edn. Springer-Verlag, Berlin Skorokhod AV (1989) Asymptotic methods in the theory of stochastic differential equations. Translations of mathematical monographs, vol 78. American Mathematical Society, Providence Valsakumar MC (1983) Solution of Fokker-Planck equation using Trotter’s formula. J Stat Phys 32(3):545–553
Chapter 12
Numerical Simulation of Finite Slider Bearings Using Control Volume Method Mobolaji Humphrey Oladeinde and John Ajokpaoghene Akpobi
12.1
Introduction
The modified Reynolds equations which accounts for the use of non-Newtonian couple stress fluids and bearings with slip surfaces have no known analytical solution. Consequently, researchers have resorted to the use of approximate numerical methods to obtain solutions to them. A number of researchers have obtained the solution of different finite slider bearing configurations using different numerical schemes. In recent times, most numerical work in hydrodynamic lubrication has involved the use of the Reynolds equation and the finite difference method (Mitidierri 2005). The use of the finite element methods has also been noted in solving problems in hydrodynamic lubrication. A finite difference multigrid approach was used to investigate the squeeze film behavior of porous elastic bearing with couple stress fluid as lubricant by (Bujurke and Kudenati 2007). In (Serangi et al. 2005), the modified Reynolds equation extended to include couple stress effects in lubricants blended with polar additives was solved using the finite difference method with a successive over relaxation scheme. The conjugate Method of iteration was used to build up the pressure generated in a finite journal bearing lubricated with a couple stress fluid in (Linj 1998). Reference (Elsharkawy 2004) provided a numerical solution for a mathematical model for hydrodynamic lubrication of misaligned journal bearings with couple stress fluids as lubricants using the finite difference method. Reference (Lin 2003) calculated the steady and perturbed pressures of a two dimensional plane inclined slider bearing incorporating a couple stress fluids using the conjugate gradient method. In (Nada and Osman 2007), the problem of finite hydrodynamic journal bearing lubricated by magnetic fluids with couple stresses was investigated
M.H. Oladeinde (*) Production Engineering Department University of Benin, Benin City, Nigeria e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_12, # Springer Science+Business Media, LLC 2011
151
152
M.H. Oladeinde and J.A. Akpobi
using the finite difference method. The finite element method has been used prominently for some years to solve continuum and field problems Reference (Zienkiewicz 1970) presented the finite element solution for incompressible lubrication problems of complex geometries without the loss of accuracy as the finite difference method. In (Kim 1988), a velocity-pressure integrated, mixed interpolation, Galerkin finite element method for the Navier-Stokes equations was reported. A finite element method was used to analyze the electromechanical field of a hydrodynamic-bearing (HDB) spindle motor of computer hard disk drive at elevated temperature in (Jang et al. 2004). The finite element method was used to solve the modified Reynolds equation governing the pressure distribution in a parabolic slider bearing with couple stress fluids in (Oladeinde and Akpobi 2009a). Reference (Oladeinde and Akpobi 2009b) reported the steady state characteristics of an infinitely wide inclined slider bearing obtained using the finite element method. In (Oladeinde and Akpobi 2010a, finite element method was used to obtain the load capacity of finite slider bearings with slip surfaces. Reference (Shakier et al. 1980) solved the generalized Reynolds equation with slip at the bearing surfaces using multilayer lubrication theory. In (Oladeinde and Akpobi 2010b), control volume method was applied to obtain the pressure distribution in finite inclined slider bearing with slip surfaces. The open literature is replete with slider bearing design using finite difference and finite element methods as the numerical tool for analysis as can be deduced from the literature cited above. Previous researchers seem not to have exploited the applicability of control volume methods in slider bearing design. This gap was addressed in (Oladeinde and Akpobi 2010b) and is developed further in the present paper. In particular, this work centers on the use of control volume method for solving the modified Reynolds equation governing the pressure distribution in a finite slider bearing with slip surfaces or couple stress lubricant.
12.2
Governing Equations
The equation governing the dimensionless pressure distribution in a finite slider bearing with slip surfaces is given by (12.1) @ 3A 1 @ 3A 3 @p 3 @p þ 2 H 1þ H 1þ @X @X HþA @Y HþA l @Y @ H H 1þ ¼U @X HþA
ð12:1Þ
When the slip parameter A ¼ 0, the governing equation reduces to the classical Reynolds equation. The dimensionless pressure distribution in the oil film of a finite slider bearing lubricated by a couple stress fluid is given by (12.2) (Oladeinde and Akpobi 2010a).
12
Numerical Simulation of Finite Slider Bearings Using Control Volume Method
@ @p 1 @ @p dH þ 2 ¼ 6U S S @X @X @Y @Y dX l
153
(12.2)
Where S is given by (12.3) and the film thickness of the inclined slider ðHÞ is given by (12.4). In (12.3), Lis the dimensionless couple stress parameter. When L ¼ 0, (12.2) reduces to the classical Reynolds equation for a Newtonian fluid. In (12.1) and (12.2), l ¼ Lx =Ly where Lx and Ly are the lengths of the bearing in the x and y directions respectively. H S ¼ H3 12L2 H 2L tanh 2L
(12.3)
H ¼ 1 þ K KX
(12.4)
The boundary conditions are given by the specification of the pressure at the perimeter of the bearing which is equal to atmospheric pressure. In (12.4), K is the film thickness ratio, defined as the ratio of the maximum to minimum film thickness.
12.3
Control Volume Descritization
Figure 12.1 depicts a typical control volume around a central node P. The control volume consists of a rectangular volume whose sides’ passes through the point’s n, s, e and w. Control volume discretization is used to discretize the governing equations shown in (12.1) and (12.2) respectively. The control volume is used to solve for the pressure at the point P in terms of the pressures at the points N, E, S and W.
Δ N n
W
w
p s S
Fig. 12.1 A control volume around a node P
e
E Δ
154
M.H. Oladeinde and J.A. Akpobi
Integrating (12.1) and (12.2) over the control volume, (12.5) is obtained. ðn ðw s e
dP dP dXdY K dX dX
1 þ 2 l
ðn ðw s e
ðn ðw dP dP d dXdY ¼ U K ðCÞ dXdY dX dX dX
ð12:5Þ
s e
In (12.5), K and C are given by (12.6a) and (12.6b) respectively for finite slider with velocity slip. In the case of slider with couple stress lubricants, K and C are given by S shown in (12.3) and H is shown in (12.4) 3A K ¼H 1þ HþA 3
C¼H 1þ
(12.6a)
A HþA
(12.6b)
Integrating (12.5), we obtain (12.7). K
dP dP 1 dP dP Dy K Dy þ 2 K Dx L2 K Dx dx e dx w dy n dy s l
¼ UðCÞe Dy UðCÞw Dy
ð12:7Þ
It can be observed that (12.7) contain derivatives which can be approximated using the finite difference method. The derivatives can be approximated using the expressions: dP PE PP dP Pp Pw dP PN PP dP PP PS ¼ ¼ ¼ ¼ ; ; ; Dx Dx Dy Dy dx e dx w dy n dy s Substituting the expressions for the pressure gradients into (12.7), the final discretized form of (12.1) and (12.2) is given by (12.8) ke
ð P E PP Þ ðPP PW Þ ðPN PP Þ ðPP PS Þ Dy kw Dy þ kn Dx ks Dx Dx Dx Dy Dy ð12:8Þ ¼ UCe Dy UCw Dy In compact form, (12.8) can be written as shown in (12.9) ap PP ¼ aE PE þ aw PW þ aN PN þ aS PS þ SC
(12.9)
12
Numerical Simulation of Finite Slider Bearings Using Control Volume Method
155
where aE ¼
Ke Dy Dx
aW ¼
Kw Dy Dx
aN ¼
1 Kn Dx l2 Dy
ke ¼
2KE KP KE þ KP
kw ¼
2KW KP K W þ KP
ks ¼
2KS KP K S þ KP
kn ¼
2KN KP K N þ KP
ap ¼ aN þ aS þ aE þ aW
aS ¼ L2
Ks Dx Dy
SC ¼ U ðCw Ce ÞDy
The load capacity of the bearing is computed after the pressure solution has been obtained using the expression shown in (12.10). ð1 ð1 W¼
Pðx; yÞdxdy
(12.10)
0 0
12.4
Numerical Results and Discussion
The methodology developed in the previous section has been used to obtain the pressure distribution at the middle of benchmark finite slider bearing with Newtonian lubricant whose solution have been obtained using finite element method in the literature. The methodology described was implemented using a visual Basic.Net program developed by the authors. The dimensionless bearing length and width are equal to unity. Table 12.1 shows the pressure obtained at the middle of the bearing in comparison with those obtained by (Oladeinde and Akpobi 2010), who solved the same problem using the finite element method. The dimensionless velocity slip (A) used for the simulation equal 100, the dimensionless speed (U) equal unity and the film thickness ratio (K) equal 1.5 Table 12.1 shows that the results obtained by both methods are in good agreement especially at higher mesh densities. The two methods depict a converging Table 12.1 Pressure at the middle of the bearing with slip surfaces using control volume and finite element methods
Mesh 44 88 84 48 168 1616 3232
Finite element 0.063918 0.063416 0.064492 0.062529 0.063416 0.063111 0.063094
Control volume 0.060718 0.062478 0.061376 0.061764 0.062643 0.062936 0.063051
156 Table 12.2 Pressure at the middle of bearing for lubricated with Newtonian fluid using control volume and finite element methods
M.H. Oladeinde and J.A. Akpobi
Mesh 44 88 84 48 168 1616 3232
Finite element 0.128214 0.126775 0.129369 0.125426 0.127211 0.126598 0.126564
Control volume 0.121833 0.125337 0.123125 0.123933 0.125660 0.126248 0.126478
trend, however from different directions towards the exact value. The exact solution at the middle of the bearing obtained using Richchardson extrapolation equal 0.063091. At a discretization level of 961 control volumes corresponding to mesh density of 32 32, it can be seen that the error in the control volume solution is 0.00004 compared to 0.000003 for the finite element results. However using a total of 3,969 control volumes, the control volume solution converges to the exact solution. It can be deduced from Table 12.1 that whereas the solution obtained at the middle of the bearing decreases towards the exact with increase in mesh density, the control volume method shows the opposite trend. As the mesh size decreases, there is progressive increase in the pressure solution towards the exact. Table 12.2 shows the solution obtained using the control volume method and the finite element method for a finite slider bearing with Newtonian lubricant. The table shows that the present method produces reliable results when compared with the finite element results. The converging trend is similar to that observed for a finite slider bearing with slip surfaces. Numerical experiments show that when the control volume is simulated using a total of 3,969 control volumes, the control volume results converges to the exact solution equal to 0.126554. The reliability of the control volume method has been demonstrated since the solution obtained using the method is in good agreement with those obtained using the finite element methods. The control volume method was subsequently used to show the effect of velocity slip and couple stress parameter on the load capacity of finite slider bearing. Figure 12.2 shows the variation of load capacity with dimensionless slip parameter for a finite slider with Newtonian lubricant using the present technique. Figure 12.2 shows the variation of load capacity with slip parameter using the present method for a finite slider bearing with Newtonian lubricant. The load capacity variation obtained using the present method is similar to those obtained by (Oladeinde and Akpobi 2010). The load capacity first decreases with increase in dimensionless slip parameter and attains a minimum at a slip parameter of 5. Further increase in the slip parameter brings about an increase in the load capacity. When a slip parameter of 40 is reached, further increase in the slip parameter does not bring about any significant improvement in the load capacity. Numerical experiments using the present method also indicate that regardless of the film thickness ratio, marginal increase in the load capacity is achieved at slip parameter greater than 40.
12
Numerical Simulation of Finite Slider Bearings Using Control Volume Method
157
Fig. 12.2 Variation of load capacity with Slip parameter for slider with Newtonian lubricant
Fig. 12.3 The variation of load with slip parameter for different couple stress parameters
Figure 12.3 shows the variation of load capacity with slip for two different couple stress parameters. When the dimensionless couple stress parameter equals zero, the problem reduces to a bearing with Newtonian lubricant. The graph shows that increase in couple stress parameter brings about an increase in the load carrying capacity of the finite slider bearing for a given slip
158
M.H. Oladeinde and J.A. Akpobi
parameter. However, as in the case of a Newtonian lubricant in Fig. 12.2, once a slip parameter of 40 is reached, further increase in the slip parameter brings about no significant increase in the load carrying capacity irrespective of the couple stress parameter.
12.5
Conclusion
A control volume method has been used to solve the hydrodynamic lubrication problem of slider bearings. The solution obtained using the present method has been shown to be stable and converges with increase in the number of control volumes in the domain. Having established the reliability of the numerical scheme, parametric study was carried out to how the effects of couple stress and slip parameter on bearing load. Computations reveal that the effect of couple stresses is to increase the load capacity. Also, a dimensionless slip parameter equal to 40 has been established which produces marginal increase in load capacity with increase in couple stress parameter.
References Bujurke NM, Kudenati BR (2007) Multigrid solution of modified Reynold’s equation incorporating poroelasticity and couple Stress. J Porous Media 10(2):125–136 Elsharkawy AA (2004) Effects of misalignment on the performance of finite journal bearings lubricated with couple stress fluids. Int J Comput Appl Tech 21(3):137–146 Jang JY, Park SJ, Lee SH (July 2004) Finite-element analysis of electromechanical field of a HDB spindle motor at elevated temperature. IEEE Trans 40(4):2083–2085 Kim SW (1988) A velocity-pressure integrated, mixed interpolation, Galerkin finite element method for high Reynolds number laminar flows, Final Report Universities Space Research Association, Huntsville, AL. Systems Dynamics Lab Lin JR (2003) Derivation of dynamic couple-stress Reynold’s equation of sliding-squeezing surfaces and numerical solution of plane inclined slider bearings. Tribol Int 36(9):679–685 Linj R (1998) Squeeze film characteristics of finite journal bearings; couple stress fluid model. Tribol Int 31(4):201–207 Mitidierri BP (2005) Advanced modeling of elastohydrodynamic lubrication, Doctoral Thesis, Tribology Section and thermo fluids section, Department of Mechanical Engineering, Imperial College, London, pp 19 Nada GS, Osman TA (2007) Static performance of finite hydrodynamic Journal bearings lubricated by magnetic fluids with couple stresses. Tribol Lett 27(3):261–268 Oladeinde MH, Akpobi JA (2009a) Parametric characterization of load capacity of an infinitely wide parabolic slider bearing with couple stress fluids. Proc World Cong Sci Eng Tech 57:565–569 Oladeinde MH, Akpobi JA (2009b) Finite element simulation of performance characteristics of infinitely wide plane pad slider bearing. Adv Mater Res 62–64:637–642 Oladeinde MH, Akpobi JA (2010a) A study of load capacity of finite slider bearings with slip surfaces and stokesian couple stress fluids. Int J Eng Res Afr 1:57–66
12
Numerical Simulation of Finite Slider Bearings Using Control Volume Method
159
Oladeinde MH, Akpobi JA (2010) Control volume method applied to simulation of hydrodynamic lubrication problem, lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2010, WCECS 2010, 20–22 Oct, 2010, San Francisco, pp 1115–1118 Serangi M, Majumda BC, Sekhar AS (2005) Elastohydrodynamically lubricated ball bearings with couple stress fluids, Part 1: Steady state analysis. Tribol Trans 48(3):404–414 Shakier JB, Kumar S, Chandra P (1980) Generalized Reynolds equation with slip at bearing surfaces: multi –layer lubrication theory. Wear 60(2):253–268 Zienkiewicz OC (1970) The finite element method, 2nd edn. McGraw-Hill, New york
Chapter 13
Automated Production Planning and Scheduling System for Composite Component Manufacture Mei Zhongyi, Muhammad Younus, and Liu Yongjin
13.1
Introduction
In order to get the customer confidence, every manufacturing industry has to improve their product quality, reduce production cost, and minimum lead-time to deliver the part in time (Muhammad Younus et al. 2009a; Muhammad Younus et al. 2009b). The traditional manufacturing approach may not fulfill the contemporary digitalized manufacturing setup needs (Muhammad Younus et al. 2010a). Therefore, sophisticated technologies and software solutions are implicated in the manufacturing industries to handle the frequent changes in the product and its demand. MES is one of the software solution used to bridge between production planning and equipment control system (Muhammad Younus et al. 2010b). Most of the aerospace enterprises have been using Enterprise Resource Planning (ERP) systems currently. Matching material, predicting production capacity of the enterprise, and arranging main production plans may be done simultaneously after issuing production plan. Most of the enterprises do not have such kind of controlling and information management in the workshops. After the production plans are issued to the workshops, the information transferring is still implemented manually by using traditional paper bills. So it’s difficult to keep the consistency between the workshop production and the enterprises production plans. Due to the rapid development of the computer networking and internet technologies, the computerization pace in manufacturing and process industries are also accelerated. The automation level of these industries are constantly improved which led to a closed – loop flow of information at the plant level. The role of MES becomes more vital to link the two levels of the industries. MES has gradually been progressed to the presented intelligent integrated modular system (Simao et al. 2008; Qiu and Zhou 2004).
M. Zhongyi (*) Beijing University of Aeronautics and Astronautics, 37 Xueyuan Raod, Beijing 100191, China e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_13, # Springer Science+Business Media, LLC 2011
161
162
M. Zhongyi et al.
In recent years, a lot of enterprises have adopted Manufacturing Execution System (MES) to improve the level of information management in workshops. Cheng et al. (1998) propose a systematic approach to develop an open, modularized, distributive, configurable, and integrated MES framework by using object-oriented technique. Chung Sheng-Leun and Jeng MuDER (2002) present an integrated MES for Semi Conductor manufacturing on an open architecture. Walkden (2006) presents MES for a papermaking enterprise which composed of reliable delivery time, planning, and production invoicing. Qi et al. (2007) analyze the characteristics of current shop floor’s production planning and scheduling. The idea of researching on production planning and scheduling based on APS and MES integrated system is presented. Huang and Kan (2008) implement the optimization control and management of the auto electronic parts enterprise production process. Cao et al. (2009) introduce the scheduling optimization based on CAPP/PPC Integration. Three core modules, namely, operation planning, operation scheduling, and material tracing are presented. Zhilun et al. (2006) introduced the framework for developing manufacturing execution system dedicated to iron and steel industry. Unified Modeling Language (UML) is used to specify the system model and components. Mei Zhongyi et al. (2010) presented a MES for the composite component manufacturing workshop of an aerospace enterprise. The algorithm of production planning and scheduling is emphatically introduced. The application of the production planning schedule system has radically changed the traditional manual production planning and scheduling manner and improved the work efficiency. The composite component manufacturing workshop understudy is experiencing problems in inventory control, scheduling, material flow, equipment management, and real time data collection for decision making. It is extremely complicated to carry out any production statistics analysis on the available data for decision making because manual compilation of report involves several calculations. The real time data access and information sharing for further planning and rescheduling is unavailable resulting low productivity. A suitable MES has been developed to improve the traditional working efficiency and real time process monitoring of the products. This paper mainly presents the production planning and scheduling management of the developed MES, especially focus on the automatic scheduling algorithm.
13.2
The Function of Production Planning and Scheduling Management
MES is an important part of the enterprise information management system. The main functions of the developed MES for the composite component manufacturing workshop are production planning and scheduling management, and production process management. By real-time tracking the production process of the workshop, MES can harmonize all production activities. MES improves the workshop production efficiency and management.
13
Automated Production Planning and Scheduling System for Composite. . .
163
Obtaining production planning from ERP
Importing the product process route Forming production planning
Determining the components Working-out the planning task
Production planning and scheduling
Arranging the planning task
Production scheduling
Querying the planning task
Modifying the planning task
Beginning production
Fig. 13.1 Work content of production planning and scheduling in the composite component manufacturing workshop
The module of planning and scheduling management is the core module of the developed MES. The module of planning and scheduling management includes two aspects, namely, forming production planning and production scheduling. Production planning is mainly arranging product variety, product quality, production output, and production value which should be attained in the planning period. It’s as the guidance to the workshop production. Production scheduling is the continue work of the production planning. It’s the detailed execution process of the production planning. Based on the product variety, product quantity, and product delivery period planned by production planning, production scheduling makes up the detailed production tasks for every production unit in a determinate time period and puts the production planning into effect. Therefore, production planning and scheduling is the origination of the workshop production and the main line of the workshop process flow. Figure. 13.1 shows the work content of
164
M. Zhongyi et al.
production planning and scheduling in the composite component manufacturing workshop. The work flow of the production planning and scheduling is: 1. 2. 3. 4. 5. 6.
Importing the product process route from CAPP. Determining the components that will be planned. Working-out the planning task. Arranging the planning task. Querying the planning task. Modifying the planning task.
The former three contents are forming the workshop production planning. The later three contents are called the workshop production scheduling. The workshop production planning is the subdivision of the enterprise production planning. According to the priority of the production, the workshop production planning can arrange appropriate production sequence of the orders which got from the enterprise production planning. The effect of workshop production scheduling is reasonably arranging production tasks to suitable production teams or workers at right time. The workshop production scheduling prepares all needed tools, materials, fixtures, and equipments in time. These will ensure the production tasks may be completed with high quality and right quantity.
13.3
Integrating with Other Systems
The production planning and scheduling system is a subsystem of the MES. When applied in the composite component manufacturing workshop, it must exchange data with other system. The production planning and scheduling system imports main production plan of the enterprise from ERP. After completing the production schedules data collection and data statistic, the production planning and scheduling system returns these data and information to ERP. The production planning and scheduling system also needs to import production process route and BOM from CAPP and return production plan information to CAPP. This system needs to obtain material information and material inventory information from the MIS that belongs to the material management department. Because the databases of all the systems are ORACLE, The integration between these systems is achieved by accessing the open view of each database. Integration relationship between the four systems is shown in Fig. 13.2.
13.4
Status of the Automatic Scheduling in Production Planning and Scheduling
Production scheduling must consider a lot of factors, such as urgent order, production ability, the peak load of equipment, and delivery period. In actual production, these factors often restrict and conflict each other. It’s impossible to satisfy all
13
Automated Production Planning and Scheduling System for Composite. . .
ORACLE database link
MES database
ERP database
View
ERP of the enterprise
CAPP database
View
CAPP of the enterprise
View
MIS of the material management department
ADO.NET Production planning and scheduling system
MIS database
165
Fig. 13.2 Integration relationship between four systems
factors to the best. So production scheduling will synthetically consider all the factors and make the outcome of the scheduling most reasonably to the appointed workshop. The scheduling decides the utilizing efficiency of the equipments and the working efficiency of the workers. It also decides product cost and the overall production efficiency of the workshop. With the production scope enlarging and the production tasks increasing, the workshop production scheduling becomes more challenging. Traditional manual scheduling is difficult to satisfy the demand of the workshop production scheduling. Developing the automatic scheduling system is for replacing the traditional manual scheduling, realizing the optimization of the scheduling, and finally improving the workshop production efficiency. The status of the automatic scheduling in the production planning and scheduling is shown in Fig. 13.3.
13.5
Characteristic of Composite Component Production Scheduling
Aerospace composite component manufacturing workshop is differing from other component manufacturing enterprises. There are more working procedures. The working procedures of composite component manufacturing include three main portions, namely cloth cutting and lay up, curing and heat press for shaping, and painting and assembly. The production characteristic of these three working procedures is presented in Table 13.1. By comparing and analyzing, curing and heat press are the key procedures in the process of composite component manufacturing. Curing and heat press need more process time. All the composite component manufacturing must pass through curing and heat press procedure. The process flow of composite component manufacturing is shown in Fig. 13.4.
166
M. Zhongyi et al.
Top layer of the enterprise
Creating monthly production planning
Determining the components
Working-out the planning
Arranging the planning task
Manual scheduling
Automatic scheduling
The state of executing the daily production planning Monitoring the production schedule
The module of production process management Tracking production process
Collecting data from workshop floor
Generating daily production planning of the working team
The module of production planning and scheduling
Fig. 13.3 Status of the automatic scheduling in the production planning and scheduling Table 13.1 Characteristic of composite component manufacturing Curing and heat Classify Cloth cutting and lay up press for shaping The key Cloth cutting and lay up Curing and heat press working (Necessary procedures procedures) Process time Process time is short except Curing and heat press vacuumizing component needs nearly 4 h Curing stove and heat Needed Numerical control press pot equipments equipments of cloth cutting and lay up
Painting and assembly Cementing and inspection (Optional procedures) Process time of all procedures is short Most equipments are numerical control equipments
There are some special characteristics in composite component manufacturing, as follows. 1. Manufacturing sequence of the composite component according to the production sequence which starts working from small procedure number to big procedure number. 2. Production processes of the composite components are generally same. The same production procedures of the different components have different process time.
13
Automated Production Planning and Scheduling System for Composite. . .
Cloth cutting (10)
Honeycomb processing (20)
Lay up (30)
167
Curing (40)
Trimming and drilling (50)
CMM inspection (60)
Delivery to assembly (110)
Painting (100)
Dimension inspection (90)
Countersink (80)
NDI inspection (70)
Fig. 13.4 Process flow of composite component manufacturing
3. Some components in the same batch of composite components can have the production priority. On all accounts, production scheduling of composite component manufacturing workshop can be approximately regarded as scheduling problem of the flow shop. In flow shop, one production team can only finish one procedure of one component at one time. Before starting new procedure, the previous procedure must be finished. In order to satisfy the demand of workshop production planning and scheduling, many optimal algorithms like Simulated Annealing, Genetic Algorithm, Tabu Search, and Neural Network are developed. This paper adopts improved genetic algorithm to complete automatic scheduling of the composite components manufacturing. Some improved methods are applied to the traditional genetic algorithm.
13.6
The Work Flow of Improved Genetic Algorithm
To set up traditional genetic algorithm to the workshop production scheduling is required follow two steps. The first step is to input basic data about all machines and working teams, process procedures of the components, process time of every procedure, and required machine for every procedure. Second, the basic parameters of the genetic algorithm are assigned. These parameters include population genetic generations N, crossover probability Pc, mutation probability Pm, and so on. Based on above basic information and biology principle, the genetic algorithm can randomly create chromosome gene and subsequently generate initial population. By using selection operator, the individual with machine information is judged whether adapting environment. If it does not adapt environment, this individual will be discarded. Otherwise, this individual enters crossover phase. After the population suffers crossover, this population becomes the embryo of the next generation population. Afterward, this population is dealt with mutation operator
168
M. Zhongyi et al.
Beginning
Inputting all machines, process procedures, restriction conditions, and etc
Population genetic generations N, crossover probability Pc, mutation probability Pm, and etc are assigned
Specifying the chromosome fitness scope of the selection operator
Generating initial population
Calculating chromosome fitness
Discarding this individual
No Whether in the fitness scope Yes No Whether adapting environment
Discarding this individual
Yes Adapting individual Number i generation population (b)
Yes Crossover
Whether crossover Yes i
No Number i generation population (a)
i=i+1
No Obtaining best result
Yes Mutation
Whether mutation Ending No
Fig. 13.5 Work flow of improved genetic algorithm
and backward operator. It becomes the real next generation population. After the population undergo N times heredity, the chromosome gene becomes stable. Thereby, the best result can be obtained. The work flow of improved genetic algorithm is shown in Fig. 13.5.
13
Automated Production Planning and Scheduling System for Composite. . .
169
When genetic algorithm is applied to the production scheduling of the composite workshop, based on the traditional genetic algorithm, some improved methods are adopted to adjust the convergence rate. For example, the suitable fitness scope is specified. Only the chromosome whose fitness satisfies the scope can enter into the selection operator. Otherwise, the chromosome whose fitness does not satisfy the scope will be discarded immediately. This is similar to the heredity of the biology population. The individual that does not adapt environment can not copulate and enter into next generation. This ensures that only relatively excellent individual can achieve the heredity and multiply the next generation. By using these methods, the convergence rate of this algorithm can be improved. Consequently, the influence of the bad individuals to the whole population can be eliminated.
13.7
Application of Improved Genetic Algorithm
By using the core improved genetic algorithm, the production planning and scheduling system of the composite workshop has been developed. The system designing adopts three-layer structure which based on Web technology and browser/server (B/S) architecture. The developing languages are ASP.NET and C#. Some practical composite components of the workshop are selected to verify the validity of the improved genetic algorithm and the developed system. The parameter values of the improved genetic algorithm are specified, such as crossover probability Pc ¼ 0.9, mutation probability Pm ¼ 0.1, backward probability Pb ¼ 0.05, the individuals of the population is 300, the population genetic generations is 150. There are two orders and three composite components. The two orders are respectively named oreder1 and order2. The part numbers of the three composite components are respectively named 193Z1001-3, 193Z1001-4, and 193Z1002-3. Order1 includes three components and order2 includes two components, as presented in Table 13.2. All the three composite components have the same working procedures. But the process time of the same working procedure for different component is different. The working procedures and process time of the three composite components are presented in Table 13.3. Work order information editor provides the detail information about the work order. Using this editor, work order number, work order name, completion date, and responsible person detail can be inputted. Based on the inputted order information, one order can be queried. In this editor, the composite components in this order can Table 13.2 Components in the order
Order The components in the order
Order1 193Z1001-3 193Z1001-4 193Z1002-3
Order2 193Z1001-3 193Z1001-4
170 Table 13.3 Working procedure number 10 20 30 40 50 60 70 80 90 100 110
M. Zhongyi et al. Working procedures and process time (hour) of the composite components Working Working team Process time Process time Process time procedure name number of 193Z1001-3 of 193Z1001-4 of 193Z1002-3 Cloth cutting 1 2 2.5 3 Honeycomb 2 2.5 2.5 2.5 Lay up 3 4.5 5 4.5 Curing 4 3.5 6.5 4 Trimming and 5 0.5 1.5 1.5 drilling CMM inspection 6 1.5 1.2 1.5 NDI inspection 7 1.5 2 1 Countersink 8 1 2 2.5 Dimension 9 0.5 1 1.5 inspection Painting 10 1 1.2 1.5 Delivery 11 0.5 0.5 0.5
Fig. 13.6 Work order information editor
be also selected and issued to the work teams. Figure 13.6 shows a work order information editor. Based on the improved genetic algorithm, the optimized process sequence of the composite components can be calculated. The starting time and end time of every procedure for the three components can be also obtained at the same time.
13
Automated Production Planning and Scheduling System for Composite. . .
171
Fig. 13.7 Scheduling results information showing interface
Fig. 13.8 Result of the production scheduling
The optimized process sequence, process time, starting time, and end time of the working procedures are shown in Fig. 13.7. The result of the production scheduling is shown by using Gantt chart, as shown in Fig. 13.8. It clearly shows there is no waiting time in working team 1, working team 3, and working team 4. There is only a little waiting time in working team 2.
172
M. Zhongyi et al.
Because the process time of the later working teams is less than the process time of the former working teams, there is some waiting time in the later seven working teams.
13.8
Conclusion
The improved genetic algorithm is implemented and applied in the production planning and scheduling of the composite manufacturing. By improving traditional genetic algorithm, the convergence rate of the algorithm is increased. The developed system has been applied in the composite component manufacturing workshop of an aerospace enterprise. The application of the production planning and scheduling system has radically changed the traditional manual production planning and scheduling manner and improved the work efficiency. By using this system, the real time monitoring and feeding back of the production activities are ensured. The production planning and scheduling system is an important part of the information system in an enterprise. It provides immense amount of information support to the entire enterprise.
References Cao Y, Liu N, Yang L, Yang YL (2009) MES scheduling optimization and simulation based on CAPP/PPC integration, Proceeding of sixth international symposium on neural networks, pp. 613–622 Cheng Sheng-Leun, Jeng MuDER (2002) Manufacturing execution system (MES) for semiconductor manufacturing, Proceeding of the IEEE International conference on system, Man, Cybernetics, Oct. 6–9, 2002, Yasmine Hammamet, pp 7–11 Cheng FT, Shen E, Deng jy, Nguyen K (1998) Development of distributed object-oriented system frame work for the computer-integrated manufacturing execution system. Proc. of 1998 IEEE Inc. Conf. on rob. and aut., Leuven, Belgium, May 1998, pp 2116–2121 Huang ZH, Kan SL (2008) Based on MES for implement optimization of production scheduling of auto electronic parts manufacture. Proceeding of IEEE Congress on Evolutionary Computation 1–8:3992–3997 Mei Zhongyi, Muhammad Younus, Liu Yongjin (2010) Production planning and scheduling for the composite component manufacturing workshop, lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science, WCECS 2010, 20–22 Oct, 2010, San Francisco, pp 1053–1058 Muhammad Younus, Lu Hu, Fan Yuqing, Cong Peiyong, (2009a) Manufacturing execution system for a subsidiary of aerospace manufacturing industry, international conference on computer and automation engineering, March 2009. Bangkok, Thailand, pp 208–212 Muhammad Younus, Lu Hu, Yu Yong, and Fan Yuqing, (2009b) Realization of manufacturing execution system for batched process manufacturing industry, international conference of engineers and computer scientists, Mar 2009. Hong Kong, China, pp 1337–1341 Muhammad Younus, Cong Peiyong, Lu Hu, and Fan Yuqing (2010a) MES development and significant applications in manufacturing-A review, 2nd international conference on education technology and computer, Shanghai, China, pp (V5)97–101
13
Automated Production Planning and Scheduling System for Composite. . .
173
Younus Muhammad, Saleem Waqas, Yong Yu, Yuqing Fan (2010b) Integrated advance manufacturing planning and execution system. IJIMS 2(3/4):330–339 Qi ES, Chen JY, Wang YS (2007) The production planning and scheduling based on APS and MES. Proceedings of the 14th international conference on industrial engineering and engineering management, vol A and B pp 53–57 Qiu RG, Zhou M (2004) Mighty MESs, IEEE Robotics and automation magazine, pp 19–40 Simao JM, Stadzisz PC, Morel G (2008) Manufacturing execution systems for customized production. J Mater Process Tech 179:268–275 Walkden M (2006) Manufacturing execution system enable mill operation efficiency at UPM Changshu, paper Asia, January/February, vol 22, no 1, 2006, pp 21–23 Zhilun Cheng, Yuqing Fan, Riaz Ahmad (2006) Implementation of order oriented MES in iron and steel industry. Proceedings of the IEEE International Conference on Industrial Informatics, 2006, pp 504–508
Chapter 14
Assessment Scenarios of Virtual Prototypes of Mining Machines Teodor Winkler and Jarosław Tokarczyk
14.1
Introduction
Mining machines have big dimensions and weights. Due to that, their dislocation, testing and re-manufacturing is difficult. Mining machines often operate within the systems. Machines that belong to the same mining system are manufactured by different manufacturers located at distant places, who have no chance to test cooperation of machines in conditions similar to real ones. Complete mining systems are assembled and started up only underground. Due to more and more differentiated requirements of users for mining machines, the machines are manufactured in short series or as single copies. That is why the costs of material prototyping are not distributed on a larger number of manufactured machines (Tokarczyk 2006). Taking into account decreasing number of longwalls, and still increasing concentration of mining, every new copy of mining machine is treated as ready to be used without testing the material object. This puts high demands for designs of new mining machines. So far virtual prototyping of mining machines was mainly based on selected, single detailed methods, which most often referred to technical assessment criteria. Especially it concerns the FEM (Zienkiewicz et al. 2005). Extension of prototyping by the conditions present during operation of mining machines requires consideration of human factor and implementation of multi-criteria assessment of virtual prototypes by use of prototyping scenarios. A given scenario is realized with use of detailed numerical methods. These methods are implemented in many computer software, which is used in practice. Integration of these methods in such way that prototyping scenarios could be created for different states, in different phases of life cycle of mining machines, is needed.
T. Winkler (*) Institute of Mining Technology KOMAG, ul. Pszczyn´ska 37, 44-101 Gliwice, Poland e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_14, # Springer Science+Business Media, LLC 2011
175
176
14.2
T. Winkler and J. Tokarczyk
Components of Virtual Prototyping Scenario
14.2.1 Phases of Life Cycle of Mining Machine Virtual prototyping includes all phases of machine life. Phases of life cycle of mining machines can be divided into the following stages: – – – – – –
designing, manufacturing and assembly, stand tests, installation at workplace, operation, withdrawal from operation.
Creation of virtual prototype most often takes place at the designing phase. At this stage decisions about future product can be taken freely. Use of virtual prototyping methods aids designing process, deciding about future costs of manufacturing (Barnus´ and Knosala 2004) as well as about the quality of the product (Weiss 2004).
14.2.2 Criteria of Assessment of Virtual Prototype Criteria that determine requirements and limitations in a systematic way are the basis of assessment of virtual prototypes (Dietrych 1985). Virtual prototypes of mining machines are assessed in the light of the following criteria: strength, functional, ergonomic, safety and criterion of feasibility of machine’s operations. Strength and functional criteria are defined as technical criteria. Ergonomic and safety criteria and criterion of feasibility of operations belong to anthropotechnical criteria (Winkler 2001). Analyses in the light of criterion of operational feasibility of the machine are characteristic for mine conditions. They concern assembly/disassembly operations in confined space in underground conditions.
14.2.2.1
Strength Criterion
Durability (resistance) of virtual prototype for identified future loads, to which real object will be exposed, is assessed in strength criterion. Currently used numerical methods such as FEM enable simulation of many physical phenomena. The following values of: stresses, displacements, velocities, accelerations, strains, constrain forces, contact forces, friction forces, fatigue strength are the measure for assessment of virtual prototype of mining machine. The components of machines, its assemblies or entire machines, are assessed in the light of strength criterion at the KOMAG Institute of Mining Technology, Fig. 14.1.
14
Assessment Scenarios of Virtual Prototypes of Mining Machines
177
Fig. 14.1 Assessment of machine in the light of strength criterion – map of reduced stresses [MPa]
Determination of quantitative range in computational model depends on specified objectives. Computational models of entire machines included in virtual prototyping enable obtaining the type of interactions between components of the machine. Powered-roof support, in which forces in pin joints depend on the height and method of support, is the example. The forces can be determined with use of other methods such as MBS. Computational models, which include only machine assembly or machine body, are sufficient in some cases, and interactions between other assemblies are realized according to superposition rule. Such method of building of computational tasks results from hardware or time limitations.
14.2.2.2
Functional Criterion
In mining machines collision of cooperating machine components, which move against each other as well as required operational ranges, which result from kinematic relationships, are analyzed. Criterial models, which include only external geometrical features, are sufficient for such analyses. They can also include geometrical models of deformed structures. Functional tests of virtual prototype were carried out at KOMAG for the longwall system. Longwall shearer moves along entire longwall panel, changing direction at face ends, at drive end and return end of Armoured Face Conveyor (AFC). Position of drive end and return end requires
178
T. Winkler and J. Tokarczyk
Fig. 14.2 Collision analysis of longwall shearer arm model and return end of flight-bar conveyor
using of so-called narrow shearer arms, which enable cutting of coal in the area of face ends. Operation of passing the arm and return end is critical, Fig. 14.2. Clearance between arm and return end is so small that absolute displacement of arm end, which results from deformations of arm body and longwall shearer body, can be till 25% of initial distance between them. Virtual prototyping consists in modeling of physical phenomena. Phenomena can be described by the set of features. Transient values of these features create transient states of the technical mean. These are so-called criterial states (Winkler 1997).
14.2.2.3
Criterial Conditions
Selected exemplary critical sets of loads or supports can be the criterial states in strength criterion. Identification of criterial states takes place both during machine operation and at stand tests, Fig. 14.3. Mining of hard coal is an example of operational state of longwall shearer. Loss of stability is an example of emergency state of powered-roof support. Other states include e.g. periodical dislocations of machines due to disassembly/assembly of longwall systems. Identification of criterial states is easy in experimental tests. Types of tests, values and types of loads as well as methods of support were strictly specified. For operational conditions identification of loads to the mining machines was realized during in situ measurements (Szweda 2004).
14
Assessment Scenarios of Virtual Prototypes of Mining Machines
179
Fig. 14.3 Division of criterial states for assessment of virtual prototype
14.2.2.4
Criterial Models
So-called criterial models are built for the selected criterial states. They represent real system and include all features, which are assessed in the light of accepted detailed criteria (Winkler 1997). The process of creation of criterial model often starts with building of geometrical model. It is also possible to create criterial model by modification of already existing geometrical model. Such modification consists in edition of components making the geometrical model. That is why suitable method of creation of geometrical model and parameterization tool extend the area of application of spatial geometrical model. Depending on accepted criterial model degree of precision changes. Assessment of virtual prototype in the light of accepted criteria can lead to changes in the design. Due to that, criterial model should be simple enough to allow determining range of eventual design changes, necessary to be entered to meet these criteria and to maintain conformity of the model to the design documentation, and at the same time it should explicitly show the place of making the changes. Figure 14.4 presents criterial models of longwall shearer arm designed for: – FEM strength calculations of arm body, Fig. 14.4a, – Simulation of assembly operation or visualization of operation of main subsystems of longwall shearer arm, Fig. 14.4b. Degrees of precision, which are required for assessment of design in the light of technical and anthropotechnical criteria, are visible. Criterial models that are built for the purpose of visualization or testing the collision of the machine with surroundings have reduced number of internal details. At present this process is highly automated. Criterial models designed for assessment in the light of technical criteria belong to the simplified models, i.e. details that are not significant for a given criterion were removed from geometrical model or were omitted. Extended criterial models, which include geometrical models of technical means and models of anthropotechnical features, were used for anthropotechnical criteria of assessment.
180
T. Winkler and J. Tokarczyk
Fig. 14.4 Criterial models of longwall shearer arm
14.2.2.5
Detailed Methods
Condition for multi-criteria assessment of virtual prototype is its realization in dispersed environment of specialistic software based on detailed numerical methods. The following detailed methods are, among others, used at the KOMAG Institute of Mining Technology: – – – – –
CAD geometrical modeling, FEM, MBS analyses, Modeling of anthropotechnical features, Computer Fluid Dynamics (CFD) and Fire Dynamics Simulator (FDS).
14.3
Creation of Virtual Prototyping Scenarios
It was necessary to maintain homogeneity of data describing virtual prototype in dispersed software environment. In that way connection between each scenario stage and source, which is design documentation, was ensured. Criterial models that include features, which should be prototyped in the next phases of scenario, make realization of scenario easier, but every time they have to be taken from the same geometrical models. Creation of virtual prototyping scenarios is based on a diagram presented in Table 14.1. These scenarios consist of the following classes of components: – – – – –
phases of mining machine’s life cycle: criteria of assessment of virtual prototype: criterial states of virtual prototype: criterial models: detailed methods:
ph kr s m meth
14
Assessment Scenarios of Virtual Prototypes of Mining Machines
Table 14.1 Components of scenario of virtual prototyping ph2 ph3 Phase of machine life cycle ph1 Assessment criterion kr1 kr2 kr3 Criterial state s1 s2 s3 Criterial model m1 m2 m3 Detailed method meth1 meth2 meth3 sc1 sc2 sc3 Scenarios of virtual prototyping
181
... ... ... ... ... ...
phi krj sk ml methm scn
The possible number of scenarios scn is more than number of particular components (phi, krj, sk, ml, methm). Scenarios of prototyping were realized in the heterogeneous software environment. The commercial software can be extended by applications written in programming languages available in this software. It can also be the software written individually. The same scenario can be realized with use of different software. It is especially noticeable in the case of prototyping for the strength criterion, where wide range of calculation software is at disposal. Scenarios can have a linear structure or branched structure. Linear scenario consists only of single components, which belong to the same class. Branched scenario consists of at least two components of the same class; e.g. when scenario includes prototyping according to several criteria at the same time.
14.4
Example of Scenario of Virtual Prototyping
Exemplary scenario of virtual prototyping for powered-roof support is presented in Table 14.2.
14.4.1 Assessment of Virtual Prototype within the Technical Criteria The following two criterial states of powered-roof support were considered at the operational state, Fig. 14.5: – Symmetrical support of canopy, – Asymmetrical support of canopy. Values of reactions in pin joints of lemniscate links were determined at these criterial states. Maximal reactions in the whole range of support height were searched. Values of reactions correspond to values of forces in lemniscate links. Eighteen elastic – dumping elements were used for modeling of pin joints at both states. Method of load of powered-roof support corresponds to the method accepted
182
T. Winkler and J. Tokarczyk
Table 14.2 Scenario of virtual prototyping Object Powered-roof support Case Identification of load state of powered-roof support in a function of height. Strength verification for selected heights. Determination of minimal passages. Phase of life cycle Stand tests, exploitation Assessment criteria Technical, anthropotechnical Criterial states Operational Criterial models Geometrical features, anthropometrical models, computational models Detailed methods CAD, FEM, MBS, human modeling Tools Autodesk Inventor, MSC.ADAMS, MSC.Patran/Nastran, Autodesk 3ds Max Verification of Tests at test stand, in-situ virtual prototype Description of Values of reactions in pin joints of lemniscate links are read out for scenario seven heights for two variants of support of powered-roof support. Maximal values of cutting forces in pin joints of lemniscate links show critical height of support. Comparison standards requirements and virtual prototyping possibilities.
Fig. 14.5 Boundary conditions of computational model in MBS
at stand tests: setting to load powered-roof support between roof and floor by hydraulic legs. Forces in hydraulic legs and in tilt cylinder remained unchanged with changing the height. Results obtained for and symmetrical and asymmetrical support loads are included in Table 14.3. Due to design reasons, it was assumed that diameters of pins of left and right links were the same. Only maximal values of forces in elasticdumping elements, without partition into left and right side, at a given height of powered-roof support, are put in the Table 14.3. Identification of cutting forces for seven heights of powered-roof support showed that it was necessary to carry out strength tests for at least 3 following heights, at which maximal values of forces in links were obtained: – 2.0 m for asymmetrical support, – 2.2 m and 2.6 m for symmetrical support.
14
Assessment Scenarios of Virtual Prototypes of Mining Machines
Table 14.3 Forces in links versus the height of powered-roof support Symmetrical support [kN] Asymmetrical support [kN] Front links Rear links Front links Rear links H [m] Left Right Left Right Left Right Left 1.4 1242.2 1247.6 768.4 757.3 1479 1384.7 644 1.6 949.6 939.8 706.5 710.3 1584.6 1379.6 972 1.8 1154.4 1153.9 836.9 836.3 1635.6 1382.8 1050.4 2.0 1260.7 1263.1 975.4 977.6 1668.5 1427.8 1080 2.2 1297.4 1301.5 1047.9 1041.9 1632 1264.9 1043.3 2.4 1166.6 1146.6 986.4 998.2 1661.9 1234.3 1018.1 2.6 1150.5 1136.7 1062.2 1162.7 1581.8 1254 1036.6
183
Right 1313.6 1492.8 1607.7 1653 1608.3 1626 1637.7
Only one method of asymmetrical support was considered in the analyzed case. However the standard (PN-EN 2004) requires testing at other methods of asymmetrical support, for which maximal forces can be reached at other heights. Virtual prototype of powered roof support includes, for different heights of support, the same models of systems discretized with 3D mesh, which are grouped in a systematic way. Reuse of the same mesh in the next prototyping procedures requires parameterization of position of mesh for machine assembly. That was done by the developed Support software that uses kinematic relationships of lemniscate system. The software cooperates with MSC.Patran pre-processor. In computational model a content of groups can not be duplicated, i.e. the same geometrical model, points, nodes, finite elements, MPC components included in one group can not be present in any of the rest groups. Groups were defined during the first computational task for specified position of the machine units against each other. So-called characteristic points (A–F), coordinates of which were recorded in an external file, were marked in each next task for current and explicit position of machine units on the geometrical model, Fig. 14.6. Support software was started in the next step, where coordinates of characteristic points for required height of powered roof support were changed, Fig. 14.7. New locations of points are re-entered to the pre-processor where, with use of generally available functions, previously created groups of computational model are dislocated, Fig. 14.8. Through repeated use of verified components of computational task it is possible to analyze higher number of cases of loads and support methods than it would be possible at individual formulation of computational tasks for each case separately.
14.4.2 Assessment of Virtual Prototype within the Anthropotechnical Criteria Method of determination of minimal passages for powered-roof support can be assessed in the light of ergonomic criterion. Virtual prototyping enables showing other difficulties during movement of mining team, which differs from those
184
T. Winkler and J. Tokarczyk
Fig. 14.6 Boundary conditions of computational model in MBS
Fig. 14.7 Data exchange between preprocessor and Support software
Fig. 14.8 Modification of computational model due to change of height of powered-roof support
14
Assessment Scenarios of Virtual Prototypes of Mining Machines
185
Fig. 14.9 Minimal passage for mining teams considers external geometrical dimensions only
Fig. 14.10 Real conditions: dimensions of passage for mining team reduced by powered-roof support components
included in standards (PN-EN 2004), where external geometrical parameters are the basis for determination of minimal passages for the mining team, Fig. 14.9. Components of powered-roof support such as hydraulic hoses, which reduce width of passage for a given roof support, can be noticeable, Fig. 14.10. These components are not taken into account during determination of minimal space, in which mining team will move. So it happens that in mining conditions workers have to move in a danger zone (horizontal line hatch area), Fig. 14.11. By taking anthropotechnical systems into account in the virtual prototyping process in the light of ergonomic criterion, it is possible to assess the conditions which are not included in the standards. Besides, the results can be used for personnel training.
186
T. Winkler and J. Tokarczyk
Fig. 14.11 Assessment of virtual prototype in the light of anthropotechnical criteria
14.5
Conclusion and Future Work
Assessment scenarios of virtual prototype are realized in dispersed software environment and enables including of conditions present in operational and emergency states of mining anthropotechnical systems. Their use requires access to specialistic knowledge and software from many domains. Use of criterial models enables to keep number of design features, recreated by them, at as low as possible level. Defining of criterial states enables modeling of complex processes during operation without a need of unreasonable extension of criterial models. In mining anthropotechnical systems the work zone for people is determined by temporary position of cooperating machines. Anthropotechnical systems included in the virtual prototyping process enable assessing not only design features and operational parameters of machine, but also future work conditions. Use of extended scenarios of prototyping requires access to specialistic knowledge from many domains. Due to that, their use requires application of participatory mode of realization of computational tasks, which involves a group of specialists. Scenarios enable methodological approach to the problems of virtual prototyping, and they are especially realized in multi-criteria assessment of designed machine. At KOMAG Institute of Mining Technology, the suggested method is used during creation of state-of-the-art, innovative solutions for future mining machines. Permanent increase of state-of-the-art computers’ calculation power and new calculation software, which enables coupled analyses, require more organized, purposeful and systematized virtual prototyping. Due to that some studies on the method for creation of extended scenarios of virtual prototyping will be carried out.
14
Assessment Scenarios of Virtual Prototypes of Mining Machines
187
References Barnus´ B, Knosala R, (2004)Computer aiding of designing process oriented onto minimization of production costs. Transport Przemysłowy 2/2004, pp 31–34 (in Polish) Dietrych J (1985) System and design. Wydawnictwo Naukowo – Techniczne, Warszawa 1985. (in Polish) PN-EN 1804–1 (2004) Machines for underground mines – Safety requirements for hydraulic powered roof support – Part 1: Support units and general requirements. Polish Standard. (in Polish) Szweda S (2004) Identification of parameters characterizing powered roof support load caused by dynamic action of rock mass. Zeszyty Naukowe Politechniki S´la˛skiej: Go´rnictwo z. 259. Gliwice 2004. (in Polish) Tokarczyk J (2006) Method of creating virtual prototypes of mining machines. PhD Thesis. The Faculty Of Mechanical Engineering, Silesian University of Technology, 2006. Gliwice, Poland. (in Polish) Weiss Z (2004) Integration of designing process as the condition of rapid development of the product – methods and tools. Targi Technologii Przemysłowych i Do´br Inwestycyjnych. Poznan´ 2004. (in Polish) Winkler T (1997) Computer image of the design. CAD CAM computer aiding. Wydawnictwa Naukowo Techniczne. Warszawa 1997. (in Polish) Winkler T (2001) Methods for computer aided designing of anthropotechnical systems on the example of mining machines. Prace Naukowe Gło´wnego Instytutu Go´rnictwa, Katowice 2001. (in Polish) Winkler T, Tokarczyk J (2010) Multi-criteria assessment of virtual prototypes of mining machines. Lecture Notes in Engineering and Computer Science. Proceedings of The World Congress on Engineering and Computer Science 2010, WCECS 2010, 20–22 October, 2010, San Francisco, USA, pp 1149–1153 Zienkiewicz OC, Taylor RL, Zhu JZ (2005) The finite element method, Vol 1: Its basis & fundamentals, 6th edn. Elsevier Butterworth, Heinemann, Oxford 2005
Chapter 15
A Framework for the Selection of Logistic Service Provider Using Fuzzy Delphi and Fuzzy Topsis Rajesh Gupta, Anish Sachdeva, and Arvind Bhardwaj
15.1
Introduction
As the market becomes more global, logistics is now seen as an important area where industries can cut costs and improve their customer service quality (Yan et al. 2003). The companies are outsourcing their logistic activities to the organizations that have well established professional excellence in logistics (widely known third party logistics service providers or 3PL). Carbone and Stone (2005) defines 3PL as ‘activities carried out by an external company on behalf of a shipper and they consist of at least the provision of management of multiple logistics services. The main benefits of logistics alliances are to allow the outsourcing company to concentrate on the core competence, increase the efficiency, improve the service, reduce the transportation cost, restructure the supply chains, and establish the marketplace legitimacy (Skjoett-Larsen 2000; Hertz and Alfredsson 2003; Sohail et al. 2006). Once 3PL is selected, it will affect the business performance to the large extent. Hence, the selection of 3PL provider is crucial for the growth and competence of an enterprise. Recently, numerous researches have extensively discussed the relevant topics of 3PL in different perspectives (van Laarhoven et al. 2000; Hertz and Alfredsson 2003; Wilding and Juriado 2004). So far, different types of methods have already been designed and developed to address the supplier evaluation or provider selection problems. These methods include the data envelopment analysis (Tongzon 2001), analytic hierarchy process (Sheu 2004; Kreng and Chao-Yi 2007). TOPSIS approach (Chen and Tzeng 2004; Bottani and Rizzi 2006; Boran et al. 2009), analytic network process (Huan-Jyh and Hsu-Shih 2006;
R. Gupta (*) Department of Industrial and Production Engineering, Dr. B.R. Ambedkar National Institute of Technology, Jalandhar, Punjab 144001, India e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_15, # Springer Science+Business Media, LLC 2011
189
190
R. Gupta et al.
Jharkharia and Shankar 2007), principal component analysis and factor analysis (Carr and Pearson 2002), case-based reasoning (Yan et al. 2003), checklist and interview method (Vaidyanathan 2005), hybrid decision support system (Isiklar et al. 2007). For more than a decade, fuzzy TOPSIS method has been developed and used by many researchers (Bottani and Rizzi 2006; Kahraman et al. 2007; kozkan et al. 2008; Sachdeva et al. 2009; Gupta et al. 2010). TOPSIS is quite intuitive, easy to understand and useful tool in dealing with multi attribute or multi-criteria decision making in the real world. It was first proposed by Chen and Hwang (1992), with reference to Hwang and Yoon (1981). It is based on the logical consideration that the most suitable solution should be the closest to the Positive Ideal Solution (PIS) and the farthest from the Negative Ideal Solution (NIS). An optimization model based on fuzzy logic may take care of vagueness and impreciseness prevailing during judgmental decision making. For fuzzy set operations, one can refer (Bottani and Rizzi 2006; Liu and Wang 2009; Darbari et al. 2010; Kablan et al. 2010). Thus, combined approach of fuzzy TOPSIS enhances the accuracy of the result and proves to be helpful for DMs. The DMs are assigned weights according to their qualification, experience and designation. The rest of the paper is organized as follows. Section 2 describes the concepts and the research steps of the proposed fuzzy decision analysis approach. The robustness and effectiveness of the proposed framework is tested by an application to a real case firm of Indian automobile industry in Section 3. The last section presents the conclusion and provides future research direction in this area.
15.2
Proposed Fuzzy Approach
The proposed fuzzy approach is aimed to explain a systematic provider selection process which consists of five main phases discussed as below:
15.2.1 Fuzzy Logic to Assign Weights to the Decision Makers Step 1: As the DMs have different experience, designation and qualification, there opinion enjoys different weights in the decision making, so the weights have been assigned to the analysts on this basis. The linguistic variables for the experience, designation and qualification can be quantified using triangular fuzzy numbers as per Table 15.1. These linguistic variables can be expressed in positive triangular fuzzy numbers, as in Fig. 15.1.
15
A Framework for the Selection of Logistic Service. . .
191
Table 15.1 Linguistic variable for experience, designation and qualification Experience Linguistic variable FTN Designation Qualification 0 - <10 Low (0.0,0.2,0.4) Up to manager Under graduate 10 - <20 Average (0.2,0.4,0.6) Manager to SM Graduate 20 - <30 High (0.4,0.6,0.8) SM to GM Specialty graduation 30 - above Very high (0.6,0.8,1.0) Sr GM and above Post graduate SM Senior Manager, GM General Manager Fig. 15.1 Linguistic variables
15.2.2 Fuzzy Delphi, (Short Listing the Criteria) To shortlist the important criteria for the selection of 3PL service providers, fuzzy Delphi approach is used. The detailed steps of this preliminary screening phase are described below: Step 2: The team of experts from industry and academics should determine all possible evaluation criteria specific to the industry prior to provider selection which may vary dramatically from company to company. The evaluation criteria used for the provider selection problems have been widely discussed by many researches (Razzaque and Sheng 1998; Van Hoek 2001; Lynch 2002; Jharkharia and Shankar 2007). After carefully examining the relevant criteria, we select the criteria for the subsequent evaluation process as shown in Table 15.2. Step 3: Each DM is asked through a questionnaire to specify the importance of the each evaluation criteria on linguistic scale shown in Fig. 15.2. Its goal is to integrate the opinions of all the DMs to eliminate the unimportant criteria. The outcome of the questionnaire is the decision matrix as follows: X~1 X~2 . . . . . . X~n D 1 D 2 . . . Dn 2 3 C1 L~11 L~12 . . . : : L~1n ~ ~ ~ 7 C2 6 6 L21 L22 . . . : : L2n 7 4 : : : : : 5 ~ ~ ~ Cm Lm1 Lm2 . . . : Lmn
(15.1)
192 Table 15.2 List of the criteria shortlisted for the selection of service provider
Fig. 15.2 Linguistic scale for relative importance
R. Gupta et al.
Criteria 1 Accessibility 2 Reliability 3 Security 4 Financial strength 5 Management stability 6 Strategic alliances 7 Price 8 Experience in the similar industry 9 Geographic location and spread of services 10 Growth forecasts 11 Optimization capabilities 12 Logistics information system 13 Quality of services 14 Capability to handle specific business requirements 15 Continuous improvement 16 Value-added services 17 Professionalism of salesperson 18 Asset specificity 19 Cultural fit 20 General reputation/carrier prestige 21 Loss and profit sharing clause 22 Facility and technology 23 Responsiveness to customer needs 24 Accessibility of contact persons in urgency 25 Quality of relationship with vendor 26 Safety and insurance 27 Environmental consideration 28 Flexibility of equipment and staff 29 KPI (key performance indicator) measurement and reporting 30 Customized services
VL LOW (L)
ML
0.0
0.3
0.1
MEDIUM (M)
0.5
MH
0.7
HIGH(H) VH
0.9
1.0
where Ci : the ith evaluation criterion, i ¼ 1,2, . . . , m. Dj: the jth analyst, j ¼ 1,2, . . . , n. X~j : weight of the jth analyst, L~ij : the linguistic evaluation of criterion i by the analyst j. Each element L~ij in the decision matrix is represented as a triangular fuzzy number (la ij , lb ij , lc ij ).
15
A Framework for the Selection of Logistic Service. . .
193
Step 4: By using the appropriate fuzzy operators, weighted average of each criteria is calculated as follows n P
~i ¼ j¼1 W
Xj Lij n
(15.2)
~i ¼ weighted average of the ith criteria and i ¼ 1,2, . . . , m. This value is where W defuzzified using average method by the equation given as: Wi ¼
Wai þ Wbi þ Wci 3
(15.3)
Step 5: Eliminate unimportant criteria: Only the important criteria are considered for the subsequent evaluation. All DMs define a minimum acceptable weight R~d for all of the criteria which is calculated as: n P Xj Rj
where Rj: the minimum acceptable weight for the criteria to be R~d ¼ j¼1 n included for evaluation of the service provider defined by jth analyst. This value is defuzzified using average method by the equation given as: R¼
R a þ Rb þ R c 3
(15.4)
A defuzzified value of ‘Wi’ is compared with the value of ‘R’. The criterion Ci with ‘Wi’ less than the value of ‘R’ will be eliminated. The remaining criterion will be used in the final selection phase.
15.2.3 Brainstorming Session (For Short Listing the Service Providers) In the initial screening phase, most companies usually consider six to eight potential providers. We proposed a brainstorming session of the DM’s in efficiently eliminating the unsuitable providers. Step 6: Select the most probable service providers: At first, the analysts should identify all possible providers for logistic outsourcing from the internet, industrial directories, conferences, journals, self experience, personal rapport, by calling request for proposal or from any other source. Step 7: Reject the unqualified providers: Once the list of all the probable service provider is prepared, the service providers which are evaluated average or below in the linguistic scale by any of the DM on any of the following six criteria (experience in the same field, cultural fit, quality of service, financial stability, reputation and price) are rejected.
194
R. Gupta et al.
15.2.4 Fuzzy TOPSIS (For Final Selection of the Service Providers) So we have obtained the important evaluation criteria and the qualified provider candidates to form the MCDM problem. Now the ranking of the shortlisted service providers is done using the fuzzy TOPSIS approach. Step 8: A structured “request for information” has been prepared based on the selection criteria illustrated in Table no 2.and sent to all the shortlisted service providers. Step 9: The panel of experts is introduced the fundamental of approximate reasoning, fuzzy logic and TOPSIS methodology to be adopted. All the criteria are monotonic except price which has also been converted in benefit criteria. DMs are asked to evaluate the average performance of each criterion for all the service providers on linguistic scale as shown in Fig. 15.2. The matrix we get will be as follows: X~1 X~2 . . . . . . X~n D1 D2 . . . Dn 2 3 2 C1 O~11 O~12 . . . : : O~1n 6 C 6 O~21 O~22 . . . : : O~2n 7 7¼6 Sk ¼ 2 6 4 4 5 : : : : ~ ~ ~ Cm Om1 Om2 . . . : : Omn
3 C~1 C~2 7 7 : 5 C~m
(15.5)
Where Sk is the kth service provider, k ¼ 1 to p where p is the total number of service providers shortlisted for evaluation. O~ij is the linguistic evaluation of jth DM for ith criteria for kth service provider, Ci is the weighted average for ith (i ¼ 1 to m) criteria of all Dj ( j ¼ 1 to n) DMs whose respective weightage is Xj, n P
Ci ¼
Xj Oij
j¼1
(15.6)
n
Step 10: Normalization of the fuzzy decision matrix for shipper problem: The different criteria used to select potential 3PL service providers are measured in different units hence they are required to be normalized. If R~ denotes the normalized fuzzy decision matrix, then R~ ¼ ½rik where i ¼ 1,2,. . ...m and k ¼ 1,2,....p (p ¼ total number of service providers) where r~ik ¼
aik bik cik ; þ; þ cþ k ck ck
(15.7) k¼1;2;:::p
for all i ¼ 1,2,.....m. th cþ ck , where cþ k ¼ max k is the maximum value for i criteria out of all the service i providers.
15
A Framework for the Selection of Logistic Service. . .
195
Step 11: The weighted normalized decision matrix can be computed by multiplying the importance weights of evaluation criteria and the values in the normalized fuzzy decision matrix as follows. v~ ¼ ½v~ik ~i where w ~i are the importance weight of criterion Ci obtained and v~ik ¼ r~ik w through equation. r~ik denotes the normalized fuzzy decision matrix and v~ik is the weighted normalized decision matrix.
15.2.5 Final Ranking of the Service Providers Step 12: Determination of the FPIRP and FNIRP: Because the positive triangular fuzzy numbers are included in the interval [0, 1], the fuzzy positive ideal reference point (FPIRP, A+) and negative ideal reference point (FNIRP, A-) can be fuzzy ~þ ~ expressed as: Aþ ¼ v~þ vþ v v m and A ¼ ð~ 1 ;v 2 ::::~ mÞ 1 ;v 1 :::~ þ where v~i ¼ ð1; 1; 1Þand v~i ¼ (0, 0, 0) i ¼ 1,2,....m Step 13: Calculation for the distances of each 3PL service providers from FPIRP and FNIRP: The distance of each 3PL service provider from fuzzy positive ideal reference point (FPIRP) and fuzzy negative ideal reference point (FNIRP) can be derived respectively as: dkþ ¼
m X d v~ik ; v~þ i
i ¼ 1; 2; . . . :m and k ¼ 1; 2 . . . p
(15.8)
m X d v~ik ; v~i i ¼ 1; 2; . . . :m and k ¼ 1; 2 . . . p
(15.9)
i¼1
dk ¼
i¼1
where dð~ va ; v~b Þ, denotes the distance measurement between two fuzzy numbers, dkþ represents the distance of alternative Sk from FPIRP, and dk is the distance of alternative Sk from FNIRP. Step 14: Process to obtain the closeness coefficient and rank the order of alternatives: Once the closeness coefficient ðCCÞ is determined, the ranking order of all alternatives can be obtained, allowing the decision-makers to select the most feasible alternative. CC is calculated using (15.10): CCk ¼
dkþ
dk þ dk
(15.10)
where k ¼ 1,2,....p An alternative with index CCk approaching 1 indicates that the alternative is close to the fuzzy positive ideal reference point and far from the fuzzy negative ideal reference point. A large value of closeness index indicates a good performance of the alternative.
196
15.3
R. Gupta et al.
Application of the Proposed Methodology in the Case Company
In order to demonstrate the applicability of the proposed fuzzy decision analysis approach, it was tested on a tractor making company (named ABC to maintain secrecy) situated in the northern part of India and having near four decades of Table 15.3 Weighted aggregate of each criteria DM1 DM2 DM3 ~ Wi (0.08,0.24,0.48) (0.08,0.24,0.48) (0.36,0.64,1.0) W S/R A 0.1,0.3,0.5 0.0,0.1,0.3 0.1,0.3,0.5 0.014,0.096,0.294 0.135 R R 0.5,0.7,0.9 0.7,0.9,1.0 0.7,0.9,1.0 0.116,0.320,0.637 0.358 S S 0.3,0.5,0.7 0.0,0.1,0.3 0.1,0.3,0.5 0.020,0.112,0.327 0.153 R FS 0.7,0.9,1.0 0.7,0.9,1.0 0.5,0.7,0.9 0.097,0.293,0.620 0.337 S MS 0.9,1.0,1.0 0.7,0.9,1.0 0.7,0.9,1.0 0.127,0.344,0.653 0.375 S SA 0.5,0.7,0.9 0.7,0.9,1.0 0.3,0.5,0.7 0.068,0.234,0.537 0.280 R P 0.7,0.9,1.0 0.9,1.0,1.0 0.9,1.0,1.0 0.150,0.365,0.653 0.389 S EXP 0.7,0.9,1.0 0.7,0.9,1.0 0.9,1.0,1.0 0.145,0.357,0.653 0.385 S GL 0.5,0.7,0.9 0.7,0.9,1.0 0.5,0.7,0.9 0.092,0.277,0.604 0.324 S GF 0.0,0.1,0.3 0.0,0.1,0.3 0.0,0.0,0.1 0.000,0.016,0.129 0.048 R OC 0.3,0.5,0.7 0.3,0.5,0.7 0.5,0.7,0.9 0.076,0.229,0.524 0.276 R IT 0.3,0.5,0.7 0.5,0.7,0.9 0.1,0.3,0.5 0.033,0.160,0.422 0.205 R QoS 0.9,1.0,1.0 0.7,0.9,1.0 0.9,1.0,1.0 0.150,0.365,0.653 0.389 S CHB 0.3,0.5,0.7 0.7,0.9,1.0 0.3,0.5,0.7 0.062,0.218,0.505 0.262 R CI 0.3,0.5,0.7 0.7,0.9,1.0 0.3,0.5,0.7 0.062,0.218,0.505 0.262 R VAS 0.7,0.9,1.0 0.5,0.7,0.9 0.7,0.9,1.0 0.116,0.320,0.637 0.358 S KPI 0.7,0.9,1.0 0.5,0.7,0.9 0.7,0.9,1.0 0.116,0.320,0.637 0.358 S AiE 0.5,0.7,0.9 0.5,0.7,0.9 0.3,0.5,0.7 0.062,0.218,0.521 0.267 R CF 0.9,1.0,1.0 0.7,0.9,1.0 0.9,1.0,1.0 0.150,0.365,0.653 0.389 S GR 0.9,1.0,1.0 0.9,1.0,1.0 0.7,0.9,1.0 0.132,0.352,0.653 0.379 S LPSC 0.3,0.5,0.7 0.5,0.7,0.9 0.5,0.7,0.9 0.081,0.245,0.556 0.294 R T 0.1,0.3,0.5 0.3,0.5,0.7 0.3,0.5,0.7 0.046,0.170,0.425 0.214 R Rp 0.7,0.9,1.0 0.5,0.7,0.9 0.3,0.5,0.7 0.068,0.234,0.537 0.280 R Pf 0.3,0.5,0.7 0.3,0.5,0.7 0.3,0.5,0.7 0.052,0.186,0.457 0.232 R QoR 0.5,0.7,0.9 0.3,0.5,0.7 0.5,0.7,0.9 0.081,0.245,0.556 0.294 R SI 0.0,0.0,0.1 0.1,0.3,0.5 0.0,0.0,0.1 0.002,0.024,0.129 0.052 R EC 0.7,0.9,1.0 0.5,0.7,0.9 0.7,0.9,1.0 0.116,0.320,0.637 0.358 S Flx 0.9,1.0,1.0 0.7,0.9,1.0 0.7,0.9,1.0 0.126,0.344,0.653 0.375 S AS 0.1,0.3,0.5 0.3,0.5,0.7 0.3,0.5,0.7 0.046,0.170,0.423 0.214 R CS 0.0,0.1,0.3 0.3,0.5,0.7 0.3,0.5,0.7 0.044,0.154,0.393 0.197 R A Accessibility, R Reliability, S Security, FS Financial strength, MS Management stability, SA Strategic alliances, P Price, EXP Experience similar industry, GL Geographic location and spread, GF Strategic alliances, OC Optimization capabilities, IT Information Technology, QoS Quality of services, CHB Capability to handle business, CI Continuous improvement, VAS Value-added services, KPI Key performance indicator, AiE Accessibility in urgency, CF Cultural fit, GR Gen. reputation, LPSC Loss and profit sharing clause, T Technology, Rp Responsiveness, Pf Professionalism, QoR Quality of relationship, SI Safety and insurance EC Environmental consideration, Flx Flexibility, AS Asset specificity, CS Customized services. Criteria
15
A Framework for the Selection of Logistic Service. . .
Table 15.4 The selected criteria for further evaluation Criteria Fuzzy weight of each criteria Criteria FS 0.097,0.293,0.620 KPI R 0.116,0.320,0.637 CF MS 0.127,0.344,0.653 Flx P 0.151,0.365,0.653 EC GL 0.092,0.277,0.604 QoS VAS 0.116,0.320,0.637 EXP GR 0.132,0.352,0.653
197
Fuzzy weight of each criteria 0.116,0.320,0.637 0.151,0.365,0.653 0.127,0.344,0.653 0.116,0.320,0.637 0.151,0.365,0.653 0.145,0.357,0.653
successful operations. In the following section, we describe the detailed logistic service provider selection process for the case company. Step 1: Three analysts who hold the right to make the final decision (one from logistics, technical and corporate departments and further to be referred as DM1, DM2 and DM3 respectively) from the related industry are chosen to form the decision team. Refer Table 15.1, the weights assigned to DM1 ¼ (0.08,0.24,0.48), DM2 ¼ (0.08,0.24,0.48) and to DM3 ¼ (0.36,0.48, 1.0). Step 2: The decision team agreed to adopt the 30 criteria for selection of the logistic provider (as shown in Table 15.2) as the initial evaluation criteria used for the fuzzy Delphi process. Step 3 and 4: Each DM is asked through a questionnaire to specify the importance of the each evaluation criteria as shown in Table 15.3. Step 5: Eliminate unimportant criteria. It was decided to select all the criteria whose weight are more than 0.32 and eliminate the rest. The selected criteria are shown in Table 15.4. Step 6 and 7: The analysts identified all possible providers for logistic outsourcing from the internet, industrial directories, conferences, journals, self experience and by personal rapport. Finally six SPs are shortlisted (further to be named as SP1 to SP6) for further evaluation. Step 8: A structured “request for information” has been prepared based on the selection criteria illustrated in Table 15.4 and sent to all the shortlisted service providers. Step 9: All the three DMs are asked to evaluate the average performance of each thirteen criterion for all the service providers (Table 15.5). Step 10: Normalization of the fuzzy decision matrix: The above matrix is normalized by dividing each fuzzy number in criteria row of all the SPs by the maximum element of that row (Table 15.6 shows for three SPs). Step 11: As each criterion has different weight. The weighted normalized decision matrix for three SPs is shown in Table 15.7.
198
R. Gupta et al.
Table 15.5 Result of evaluation of SP1 on each criterion by all DMs Weighatage by the ~ Criteria DMs (Oij ) DM1 DM2 DM3 Ci
(0.08,0.24,0.48) (0.08,0.24,0.48) (0.36,0.64,1.0) C
FS R MS P GL VAS GR KPI CF Flx EC QoS EXP
0.7,0.9,1 0.7,0.9,1 0.5,0.7,0.9 0.5,0.7,0.9 0.5,0.7,0.9 0.5,0.7,0.9 0.3,0.5,0.7 0.3,0.5,0.7 0.7,0.9,1 0.5,0.7,0.9 0.7,0.9,1 0.5,0.7,0.9 0.3,0.5,0.7
H H MH MH MH MH M M H MH H MH M
MH MH MH MH M M M M MH M MH MH M
MH M M M M M M M MH M MH MH M
DM1 DM2 DM3 Weightage of the decision makers Xj
Table 15.6 Normalized fuzzy decision matrix SP1 FS 0.14,0.42,0.92 R 0.10,0.35,0.82 MS 0.09,0.33,0.79 P 0.09,0.33,0.79 GL 0.08,0.31,0.79 VAS 0.08,0.31,0.74 GR 0.08,0.28,0.70 KPI 0.08,0.28,0.70 CF 0.14,0.42,0.92 Flx 0.08,0.31,0.74 EC 0.14,0.42,0.92 QoS 0.13,0.40,0.90 EXP 0.08,0.28,0.70
0.5,0.7,0.9 0.5,0.7,0.9 0.5,0.7,0.9 0.5,0.7,0.9 0.3,0.5,0.7 0.3,0.5,0.7 0.3,0.5,0.7 0.3,0.5,0.7 0.5,0.7,0.9 0.3,0.5,0.7 0.5,0.7,0.9 0.5,0.7,0.9 0.3,0.5,0.7
0.5,0.7,0.9 0.3,0.5,0.7 0.3,0.5,0.7 0.3,0.5,0.7 0.3,0.5,0.7 0.3,0.5,0.7 0.3,0.5,0.7 0.3,0.5,0.7 0.5,0.7,0.9 0.3,0.5,0.7 0.5,0.7,0.9 0.5,0.7,0.9 0.3,0.5,0.7
0.092,0.277,0.604 0.068,0.234,0.537 0.063,0.218,0.521 0.063,0.218,0.521 0.057,0.203,0.489 0.057,0.203,0.489 0.052,0.187,0.457 0.052,0.187,0.457 0.092,0.277,0.604 0.057,0.203,0.489 0.092,0.277,0.604 0.087,0.261,0.588 0.052,0.187,0.457
SP2 0.19,0.52,1.00 0.18,0.54,1.00 0.18,0.51,1.00 0.18,0.51,1.00 0.17,0.49,0.97 0.13,0.40,0.90 0.18,0.51,1.00 0.14,0.44,0.94 0.23,0.55,1.00 0.13,0.40,0.90 0.15,0.46,0.94 0.18,0.51,1.00 0.14.44,0.94
SP3 0.14,0.44,0.99 0.18,0.54,1.00 0.14,0.42,0.92 0.23,0.57,1.00 0.14,0.42,0.92 0.19,0.52,1.00 0.14,0.42,0.92 0.13,0.40,0.90 0.13,0.40,0.90 0.18,0.51,1.00 0.13,0.40,0.90 0.18,0.51,1.00 0.19,0.52,1.00
Table 15.7 The weighted normalized decision matrix SP1 SP2 FS 0.013,0.124,0.573 0.018,0.154,0.620 R 0.012,0.114,0.524 0.021,0.164,0.637 MS 0.012,0.115,0.521 0.023,0.176,0.653 P 0.014,0.122,0.521 0.028,0.187,0.653 GL 0.008,0.086,0.452 0.016,0.135,0.589 VAS 0.010,0.099,0.477 0.015,0.128,0.573 GR 0.010,0.100,0.457 0.024,0.181,0.653 KPI 0.009,0.091,0.446 0.017,0.143,0.604 CF 0.021,0.155,0.604 0.034,0.204,0.653 Flx 0.011,0.106,0.489 0.016,0.137,0.588 EC 0.016,0.135,0.589 0.018,0.147,0.604 QoS 0.020,0.146,0.588 0.028,0.187,0.653 EXP 0.011,0.10,0.457 0.021,0.160,0.620
SP3 0.014,0.131,0.588 0.021,0.164,0.637 0.017,0.146,0.604 0.036,0.208,0.653 0.013,0.117,0.558 0.022,0.168,0.637 0.018,0.149,0.604 0.015,0.128,0.573 0.020,0.146,0.588 0.023,0.176,0.653 0.015,0.128,0.573 0.028,0.187,0.653 0.028,0.188,0.653
15
A Framework for the Selection of Logistic Service. . .
199
Step 12 and 13: Determination of the FPIRP and FNIRP and Calculation for the distances of each 3PL service providers from FPIRP and FNIRP: The distance of each SP from fuzzy positive ideal reference point (1,1,1) and fuzzy negative ideal reference point (0,0,0) is calculated as per (15.8) and (15.9). The values of dkþ and dk is calculated for each SPs. The calculations for SP1 are as follows:
d1þ
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ¼ ð1 0:0137Þ2 þ ð1 0:1245Þ2 þ ð1 0:5732Þ2 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð1 0:0122Þ2 þ ð1 0:1151Þ2 þ ð1 0:5213Þ2 þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h þ ð1 0:0121Þ2 þ ð1 0:1149Þ2 þ ð1 0:5242Þ2 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h þ ð1 0:0145Þ2 þ ð1 0:1223Þ2 þ ð1 0:5213Þ2 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h þ ð1 0:0081Þ2 þ ð1 0:0860Þ2 þ ð1 0:4524Þ2 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h þ ð1 0:0102Þ2 þ ð1 0:0993Þ2 þ ð1 0:4773Þ2 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h þ ð1 0:0105Þ2 þ ð1 0:1006Þ2 þ ð1 0:4573Þ2 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð1 0:0092Þ2 þ ð1 0:0914Þ2 þ ð1 0:4461Þ2 þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi iffi 1h 2 2 2 þ ð1 0:0212Þ þ ð1 0:155Þ þ ð1 0:6040Þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð1 0:0111Þ2 þ ð1 0:1006Þ2 þ ð1 0:4893Þ2 þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð1 0:0163Þ2 þ ð1 0:1358Þ2 þ ð1 0:5892Þ2 þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð1 0:0200Þ2 þ ð1 0:1461Þ2 þ ð1 0:5880Þ2 þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð1 0:0116Þ2 þ ð1 0:1021Þ2 þ ð1 0:4573Þ2 þ 3 ¼ 10:5974
200
R. Gupta et al.
d1
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ¼ ð0 0:0137Þ2 þ ð0 0:1245Þ2 þ ð0 0:5732Þ2 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð0 0:0122Þ2 þ ð0 0:1151Þ2 þ ð0 0:5213Þ2 þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h þ ð0 0:0121Þ2 þ ð0 0:1149Þ2 þ ð0 0:5242Þ2 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð0 0:0145Þ2 þ ð0 0:1223Þ2 þ ð0 0:5213Þ2 þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð0 0:0081Þ2 þ ð0 0:0860Þ2 þ ð0 0:4524Þ2 þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð0 0:0102Þ2 þ ð0 0:0993Þ2 þ ð0 0:4773Þ2 þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð0 0:0105Þ2 þ ð0 0:1006Þ2 þ ð0 0:4573Þ2 þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h þ ð0 0:0092Þ2 þ ð0 0:0914Þ2 þ ð0 0:4461Þ2 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi iffi 1h 2 2 2 þ ð0 0:0212Þ þ ð0 0:155Þ þ ð0 0:6040Þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð0 0:0111Þ2 þ ð0 0:1006Þ2 þ ð0 0:4893Þ2 þ 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h þ ð0 0:0163Þ2 þ ð0 0:1358Þ2 þ ð0 0:5892Þ2 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h þ ð0 0:0200Þ2 þ ð0 0:1461Þ2 þ ð0 0:5880Þ2 3 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i 1h ð0 0:0116Þ2 þ ð0 0:1021Þ2 þ ð0 0:4573Þ2 þ 3 ¼ 3:9664
Step 14: The closeness coefficient (CC) for all the SPs are calculated using (15.10) and the values are shown in Table 15.8. An alternative with higher (CC) value indicates that the alternative is close to the fuzzy positive ideal reference point and far from the fuzzy negative ideal reference point. A large value of closeness index indicates a good performance of the alternative and is ranked top and subsequent ranking of the SPs is done. Hence SP2 is recommended for selection to carry out the logistics activities on behalf of the shipper company. The company administration was quite satisfied by outsourcing its logistics activities to the selected 3PL service provider.
15
A Framework for the Selection of Logistic Service. . .
Table 15.8 Final ranking of service providers d d+ SP 1 10.5974 3.9664 SP 2 10.0721 4.8394 SP 3 10.1228 4.7592 SP 4 10.0773 4.8198 SP 5 10.2464 4.5375 SP 6 10.1811 4.6453
15.4
201
(d)/( d + d+) 0.2723 0.3245 0.3197 0.3235 0.3069 0.3133
RANK 6 1 3 2 5 4
Conclusion and Future Work
In this chapter, a framework for ranking and selecting the most suitable service provider (SP) has been presented. The proposed methodology is easy to implement and quite reliable for ranking the alternatives. Applicability of the proposed approach has been shown in an automobile company for the selection of the third party logistic provider. We have seen that even though the price quoted by SP3 was lesser than the price quoted by SP2, the SP2 has been ranked top above SP4 and SP3. It is because SP2 has a favorable cultural fit with the outsourcing organization. This approach can easily be used for other applications as well e.g. selecting the contractors for construction work, selection of the vendors to supply the components, selecting the partner for any services which are to be outsourced by an organization.
References Boran FE, Genc¸ S, Kurt M, Akay D (2009) A multi-criteria intuitionistic fuzzy group decision making for supplier selection with TOPSIS method. Expert Syst Appl 36(8):11363–11368 Bottani E, Rizzi A (2006) A fuzzy TOPSIS methodology to support outsourcing of logistics services. Supply Chain Management: An International Journal 11(4):294–308 Carbone V, Stone MA (2005) Growth and relational strategies used by the European logistics service providers: Rationale and outcomes. Transportation Research Part E: Logistics and Transportation Review 41(6):495–510 Carr AS, Pearson JN (2002) The impact of purchasing and supplier involvement on strategic purchasing and its impact on firm’s performance. Int J Oper Prod Manag 22(9):1032–1055 Chen, Tzeng GH (2004) Combing grey relation and TOPSIS concepts for selecting an expatriate host country. Mathematical and Computer Modeling 40(13):1473–1490 Chen SJ, Hwang CL (1992) Fuzzy multiple attribute decision making - methods and applications, Lecture Notes in Economics and Mathematical Systems. Springer, Berlin Heidelberg New York Darbari M, Srivastava N, Lavania S, Bansal S (2010) Information modeling of urban traffic system using fuzzy stochastic approach. Proceedings of the World Congress on Engineering, 2010, vol 1, June 30–July 2, 2010, London, UK kozkan GB, Feyzioglu O, Nebol E (2008) Selection of the strategic alliance partner in logistics value chain. Int J Prod Econ 113:148–158
202
R. Gupta et al.
Gupta R, Sachdeva A, Bhardwaj A (2010) Selection of 3pl service provider using integrated fuzzy delphi and fuzzy TOPSIS. Proceedings of the World Congress on Engineering and Computer Science, 2010, vol II, WCECS 2010, October 20–22, 2010, San Francisco, USA 1092–1097 Hertz S, Alfredsson M (2003) Strategic development of third party logistics providers. Ind Market Manag 32:139–149 Hwang CL, Yoon K (1981) “Multiple attributes decision making methods and applications, a state–of–the–art survey”. Springer–Verlag, New York Isiklar G, Alptekin E, B€ uy€ uk€ ozkan G (2007) Application of a hybrid intelligent decision support model in logistics outsourcing. Computers and Operations Research 34(12):3701–3714 Jharkharia S, Shankar R (2007) Selection of logistics service provider: an analytic network process (ANP) approach. Omega 35(3):274–289 Sheu J-B (2004) A hybrid fuzzy-based approach for identifying global logistics strategies. Transportation Research Part E 40:39–61 Kablan A, Ng, WL (2010) High frequency trading using fuzzy momentum analysis. Proceedings of the World Congress on Engineering 2010, vol 1, June 30–July 2, 2010, London, UK Kahraman C, Yasin AN, Cevik S, Gulbay M, Ayca ES (2007) Hierarchical fuzzy TOPSIS model for selection among logistics information technologies. J Enterprise Inform Manag 20(2): 143–168 Liu H-T, Wang W-K (2009) An integrated fuzzy approach for provider evaluation and selection in third-party logistic. Expert Syst Appl 36(3):4387–4398 Lynch CF (2002) 3PLs: the state of outsourcing. Logist Manag 41(6):T47–T50 Razzaque MA, Sheng CC (1998) Outsourcing of logistics functions: a literature survey. Int J Phys Distrib Logist Manag 26(2):89–107 Sachdeva A, Kumar D, Kumar P (2009) Multi-factor failure mode critically analysis using TOPSIS. J Ind Eng Int 5(8):1–9 Huan-Jyh Shyur, Hsu-Shih Shih (2006) A hybrid MCDM model for strategic vendor selection. Mathematical and Computer Modelling 44:749–761 Skjoett-Larsen T (2000) Third party logistics - from an interorganisational point of view. Int J Phys Distrib Logist Manag 30(2):112–127 Sohail MS, Bhatnagar R, Sohal AS (2006) A comparative study on the use of third party logistics services by Singaporean and Malaysian firms. Int J Phys Distrib Logist Manag 36(9):690–701 Tongzon J (2001) Efficiency measurement of selected Australian and other international ports using data envelopment analysis. Transportation Research A 35:113–128 Vaidyanathan G (2005) A framework for evaluating third-party logistics. Communications of the ACM 48(1):89–94 van Hoek RI (2001) The contribution of performance measurement to the expansion of third party logistics alliances in the supply chain. Int J Oper Prod Manag 21(1/2):15–29 Van Laarhoven P, Berglund M, Peters M (2000) Third-party logistics in Europe - five years later. Int J Phys Distrib Logist Manag 30(5):425–442 Kreng VB, Chao-Yi Wu (2007) “Evaluation of knowledge portal development tools using a fuzzy AHP approach: The case of Taiwanese stone industry”. Eur J Oper Res 176:1795–1810 Wilding R, Juriado R (2004) Customer perceptions on logistics outsourcing in the European consumer goods industry. Int J Phys Distrib Logist Manag 34(8):628–644 Yan J, Chaudhry PE, Chaudhry SS (2003) A model of a decision support system based on casebased reasoning for third-party logistics evaluation. Expert Systems 20(4):196–207
Chapter 16
Bee Algorithm for Solving Yield Optimization Problem for Hard Disk Drive Component under Budget and Supplier’s Rating Constraints and Hueristic Performance Comparison Wuthichai Wongthatsanekorn and Nuntana Matheekrieangkrai
16.1
Introduction
A hard disk drive (HDD) is a non-volatile storage device which stores digitally encoded data on rapidly rotating platters with magnetic surfaces. For HDD manufacturers, it is very to stay competitive; therefore, each manufacturer must manage the manufacturing process such that it is flexible enough to quickly support new product model in short period. Furthermore, minimizing HDD production cost and maximizing production yield are very critical. In this paper, the focus is on the product yield and the material because the product yield is highly affected by the yield of raw materials. Since there are many available suppliers for each component, an economical strategy is required to identify the cost effective configuration of HDD components. This research work only considers the raw materials or material factor in 4 M theory (man, machine, material and method) before they are loaded and preceded into the mass production. Each supplier offers different quality for its own component and the component yield for each supplier can be collected during prototype or pilot run. The goal of this research is to determine the best selection of materials from multiple suppliers for all HDD’s components so that the total product’s yield is maximized judging from the raw material defective rates. There are two crucial constraints in this problem. The first constraint is that there is a limited budget on selected components from each supplier. Sometime choosing the best-quality component might not be affordable. The second constraint deals with supplier’s rating. Hence, this problem is called HDD component yield optimization under budget and supplier’s rating constraints (YOBS) as defined in (Wongthatsanekorn and Matheekriengkrai 2010). This problem is based on a series system with
W. Wongthatsanekorn (*) Industrial Engineering Department, Thammasat University, Rangsit Campus, Klongluang, Pathum-thani 12120, Thailand e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_16, # Springer Science+Business Media, LLC 2011
203
204
W. Wongthatsanekorn and N. Matheekrieangkrai
multiple-choice constraints incorporated at each subsystem. The problem can be formulated as a nonlinear binary integer programming problem and characterized as an NP-hard problem. In the past decade, many heuristics have been proposed and successfully applied to many combinatorial optimization problems such as traveling salesman problem (TSP), quadratic assignment problem, vehicle routing problem, and job-shop scheduling problem. In this research, three heuristics are proposed to solve the YOBS problem. They are Genetic Algorithm (GA), Ant Colony Optimization (ACO), and Bee Algorithm (BA). In particular, the focus is on using conventional heuristics for YOBS problem solving and performance comparison in term of getting optimum percentage and computational time. The numerical examples are performed based on the actual data of selected HDD module from undisclosed HDD manufacturer. The optimized result is shown in terms of solution quality and computational efficiency.
16.2
Problem Formulation
YOBS problem can be formulated with the following indices, parameters and decision variables.
16.2.1 Indices i: j: l:
Index of components, i ¼ 1; ::; n Index of suppliers, j ¼ 1; ::; nsi Index of main components, l 2 f1; ::; ng
16.2.2 Parameters n: Number of components nsi : Number of suppliers available for component i Unit Cost of component i from supplier j Cij : Yield of the component i from supplier j Yij : SRij : Supplier’s rating of component i from supplier j minR: Minimum supplier rating minS: Minimum average supplier rating B: Total available budget
16
Bee Algorithm for Solving Yield Optimization Problem for Hard Disk. . .
205
16.2.3 Decision Variables
1 if the component i is supplied by supplier j 0 otherwise for i ¼ 1; 2; :::; n and j ¼ 1; 2; :::; mi xij ¼
With these notations, the YOBS problem can be formulated as the following nonlinear binary integer (NBIP) programming problem: (YOBS) Maximize Z ¼
nsi n X Y i¼1
nsi n X X
Subject to
! Yij xij
(16.1)
j¼1
Cij xij B
(16.2)
i¼1 j¼1 nsk X
SRlj xlj minR
8l
(16.3)
minS
(16.4)
8i
(16.5)
8i; j
(16.6)
j¼1 nsi n P P
! SRij xij
i¼1 j¼1
n nsi X
xij ¼ 1
j¼1
xij ¼ f0; 1g
In this formulation, there are n components to be considered and each component i can be purchased from nsi different suppliers with different cost and yield. The goal is to select the component configuration such that total product yield is maximized under limited budget. This objective function is nonlinear because it’s a product of each component’s yield as shown in Constraint (1). Constraint (2) guarantees that the limited budget will not be exceeded. Constraint (3) enforces that the main components must pass minimum rating from the selected supplier. Constraint (4) requires that the average supplier rating of all selected components is higher than the minimum requirement. Constraint (5) limits that only each component can be supplied from only one supplier. Last, constraint (6) define decision variable xij as the binary variable. The example of assessment criteria for supplier’s rating is shown in Table 16.1. For the studied manufacturer, only suppliers with outstanding and acceptable (or
206
W. Wongthatsanekorn and N. Matheekrieangkrai Table 16.1 Assessment criteria for supplier’s rating (Chin et al. 2005) Rating 90 Outstanding A 70 Rating < 90 Acceptable B 50 Rating < 70 Need improvement C Rating < 50 Unacceptable F
level A and B) rating are considered. For the case study in Section VII, minR and minS values are set to 90. Since there are a lot of suppliers in HDD industry and one HDD model contains a lot of components, this combinatorial optimization problem has exponential numbers of solutions. Solving this NBIP problem to obtain the optimal solution using commercial solver can still be very time consuming. Therefore, three heuristics are explored to find an efficient solution under reasonable computation time effort.
16.3
Heuristic Algorithm
The fundamental concepts of many optimization techniques are based on the idea of local search. It involves searching through the solution space by moving around the neighbors of a known solution. This process is repeated to obtain new solution and the algorithm continues until the stopping criterion is reached. For several hard problems in combinatorial optimization and global optimization problems, heuristic search plays an important role. A heuristic is a technique that seeks good or nearoptimal solution at a reasonable computational cost without being able to guarantee either feasibility or optimality, or even in many cases to state how close to optimality a particular feasible solution. Most modern heuristic search strategies are based on some biological metaphor. In this study, three heuristics are applied to solve YOBS problem. The details of each algorithm are presented next.
16.3.1 Genetic Algorithm (GA) John Holland (1975) proposed GA in 1975 which is considered as an adaptive learning heuristic. He incorporated features of natural evolution to propose a robust, simple, but powerful technique to solve difficult optimization problem. GA is inspired by natural selection and natural genetics. In nature, weak and unfit species within their environment are faced with extinction by natural selection. The strong ones have greater opportunity to pass their genes to future generations via reproduction. ‘In GA terminology, a solution vector is called a chromosome. Chromosomes are made of discrete units called genes. Each gene controls one or more features of the chromosome. Originally, genes are assumed to be binary digits. In later implementations, more varied gene types have been introduced. Normally, a chromosome corresponds to a unique solution in the solution space. This requires
16
Bee Algorithm for Solving Yield Optimization Problem for Hard Disk. . .
207
Fig. 16.1 Example of one-point crossover
a mapping mechanism between the solution space and the chromosomes. This mapping is called an encoding. GA operates with a collection of chromosomes, called a population. The population is randomly initialized. Generally, the components of GA are as follows; 1. Representation and Initialization In GA, the designed variables or features that characterize an individual are represented in an ordered list called string. Each designed variable corresponds to gene and the string of designed variables. These strings represent the solution in the search space. The initial population is chosen at random. For each iteration or generation, a new set of strings or offspring is created by crossing some of the solution in the current generation. Sometimes, this offspring is mutated to add diversity. GA combines information exchange along with survival of the fittest (best) among population to conduct the search. The authors (Coit and Smith 1996; Painton and Campbell 1995) applied GA in reliability field which has similar objective function in form of multiplication. 2. Fitness function The fitness function is used to select the best optimized chromosomes in the first generation. The new offspring of chromosome is then created through crossover operation depending on the crossover rate. Then, this offspring is mutated according to the mutation rate. This offspring is then compared with the previous generation. The ones with lower fitness value are not survived. Then the operation repeats itself. 3. Genetic operator Genetic operators are the stochastic transition rules employed by GA. These operators are applied on each string during each generation to generate a new improved population from old one. GA consists of following three basic operators. Reproduction involves selection of chromosomes for the next generation. In the most general case, the fitness of an individual determines the probability of its survival for the next generation. There are different selection procedures in GA depending on how the fitness values are used. Proportional selection, ranking, and tournament selection are the most popular selection procedures. The crossover operator is the most important operator of GA. In crossover, generally two chromosomes, called parents, are combined together to form new chromosomes, called offspring. The example of one-point crossover is shown in Fig. 16.1. The parents are selected among existing chromosomes in the population
208
W. Wongthatsanekorn and N. Matheekrieangkrai
with preference towards fitness so that offspring is expected to inherit good genes which make the parents fitter. The crossover rate is the parameter that affects the rate at which the crossover operator is applied. A high crossover rate introduces new strings more quickly into the population. If the crossover rate is too high, then performance strings are eliminated faster so that selection can produce improvements. A low crossover rate may cause stagnations due to the lower exploration rate. An operation rate (pc) with typical value between 0.6 and 1 is normally used as the probability of crossover. The mutation operator introduces random changes into characteristics of chromosomes. Mutation is generally applied at the gene level. In typical GA implementations, the mutation rate (probability of changing the properties of a gene) is very small and depends on the length of the chromosome. Therefore, the new chromosome produced by mutation will not be very different from the original one. Mutation plays a critical role in GA. As discussed earlier, crossover leads the population to converge by making the chromosomes in the population alike. Mutation reintroduces genetic diversity back into the population and assists the search escape from local optima. For the YOBS problem (Coit and Smith 1996; Painton and Campbell 1995), the string of numbers is used to encode the solution. Each number represents the selected supplier for each component. The fitness function is obtained by calculating the total yield. For the crossover and mutation operation, the default setting from Matlab Genetic Algorithm Toolbox is used.
16.3.2 Ant Colony Optimization For ACO (Dorigo et al. 1996), each ant represents the solution or the combination of selected supplier for each component. These following ACO procedures are applied from (Nourelfath and Nahas 2003; Nahas and Nourelfath 2005). First, m ants or solutions are constructed based on state transition rule. Next, the amount of pheromone is updated by following global updating rule. The solutions are guided by both heuristic information and pheromone information to yield the best solution. These steps are then repeated many times based on the setting of cycle counter.
16.3.2.1
State Transition Rule
The state transition rule is applied for each ant and represented by pkij ðtÞ. The probability that ant k selects component i from supplier j is shown in (16.7) below: pkij ðtÞ ¼
½tij ðtÞa ½ij ðtÞb Mi P ½tim ðtÞa ½im ðtÞb
m¼1
(16.7)
16
Bee Algorithm for Solving Yield Optimization Problem for Hard Disk. . .
209
where tij is the pheromone intensity and ij is the heuristic information between component i and supplier j, respectively. In addition, a is the relative importance of the trail and b is the relative importance of the heuristic information. In this heuristic, the problem specific heuristic information can be obtained by using ij ¼ Yij =Cij , where Cij and Yij represent the associated cost and yield of component i from supplier j respectively. Therefore, the supplier of component with smaller cost and higher yield has higher chance to be selected.
16.3.2.2
Global Updating Rule
During the constructing feasible solutions, it may be possible that an ant will result in an infeasible solution which violates the budget constraint or (16.3). To resolve this problem, high amount of pheromone is deposited if the constructed solution is feasible. On the other hand, low amount of pheromone is deposited if the constructed solution is infeasible. Since these values affect the solution these values are dependent of the solution quality. It can then be handled by assigning penalties proportional to the amount of budget violations. With the discussed arguments, the trail intensity can be updated as follows: tij ðnewÞ ¼ rtij ðoldÞ þ Dtij
(16.8)
where r is a coefficient such that ð1 rÞ means the evaporation of trail and Dtij is: Dtij ¼
m X
Dtkij
(16.9)
k¼1
where m is the number of ants and Dtkij is given by: Dtkij ¼
1 if the kth ant choose supplier j for component i 0 otherwise
(16.10)
The algorithms steps start with m ants are initially assigned. The details of ACO algorithm can be described in the following 5 steps. ACO procedures Step 1 Initialization Set NC ¼ 0 For every combination (i,j) Set tij ð0Þ ¼ t0 and Dtij ¼ 0 End For Step 2 Construct feasible solutions For k ¼ 1 to m For i ¼1 to n
210
W. Wongthatsanekorn and N. Matheekrieangkrai
Choose a supplier for component i with probability given by 16. (16.7) End For Calculate yield Yk and Ck End For Update the best solution Y* Step 3 Global updating rule For every combination (i,j) For k ¼ 1 to m Find Dtkij according to (16. 16.10) End For Update Dtij according to (16. 16.9) End For Update the trail values according to (16. 16.8) Update the transition probability according to (16. 16.7) Step 4 Next search Set NC ¼ NC + 1 For every combination (i,j) Dtij ¼ 0 End Step 5 Termination If (NC < NCmax) Then Go to step 2 Else Print the best feasible solution and Stop End If
16.3.3 Bee Algorithm The BA algorithm was proposed by Karaboga (Nahas and Nourelfath 2005) to optimize numerical problems. The algorithm mimics the food foraging behavior of swarms of honey bees. Honey bees use several mechanisms like waggle dance to optimally locate food sources and to search new ones. In BA algorithm, the colony of artificial bees contains two groups of bees, which are scout and employed bees. The scout bees’ task is to find new food source, while employed bees must work to determine a food source within the neighborhood of the food sources in their memory and share their information with other bees within the hive. The algorithm requires a number of parameters to be set, namely: NC is number of iteration, ns is number of scout bees, m is number of sites selected out of ns visited sites, e is number of best sites out of m selected sites, nep is number of bees recruited for best e sites, nsp is number of bees recruited for the other (m-e) selected
16
Bee Algorithm for Solving Yield Optimization Problem for Hard Disk. . .
211
sites, ngh is initial size of patches which includes site and its neighborhood and stopping criterion. The process of the BA algorithm can be summarized as follows: Step 1: Generate randomly the initial populations of n scout bees. These initial populations must be feasible candidate solutions that satisfy the constraints. Set NC ¼ 0. Step 2: Evaluate the fitness value of the initial populations. Step 3: Select m best solutions for neighborhood search. Step 4: Separated the m best solutions to two groups, the first group have e best solutions and another group has m-e best solutions. Step 5: Determine the size of neighborhood search of each best solutions (ngh). Step 6: Generate solutions around the selected solutions within neighborhood size. Step 7: Select the fittest solution from each patch. Step 8: Check the stopping criterion. If satisfied, terminate the search, else NC ¼ NC +1. Step 9: Assign the n-m population to generate new solutions. Go to Step 2. In step 3 and 4, bees that have the highest fitness value are chosen as “selected bees” and sites visited by them are chosen for neighborhood search. Then, in step 5 and 6, the algorithm conducts searches in the neighborhood of the selected sites, assigning more bees to search near to the best e sites. The bees can be chosen directly according to the fitness values associated with the sites they are visiting. Alternatively, the fitness values are used to determine the probability of the bees being selected. Searching in the neighborhood of the best e sites which represent more promising solutions is made more detailed by recruiting more bees to follow them than the other selected bees. Together with scouting, this differential recruitment is a key operation of the BA. However, in step 7, for each patch only the bee with the highest fitness value will be selected to form the next bee population. In nature, there is no such restriction. This restriction is introduced here to reduce the number of points to be explored. In step 9, the remaining bees in the population are assigned randomly around the search space scouting for new potential solutions. These steps are repeated until a stopping criterion is met.
16.4
Numerical Simulation & Results
To assess the feasibility, YOBS problem is solved using GA, ACO and BA. Every heuristic is implemented in MATLAB® package and the simulation cases are ran on a Intel® Core2 Duo 1.66 GHz Laptop with 2 GB RAM under Windows XP. Each case system was run 30 times with differential random initial solutions. In order to evaluate the performance, the maximum, minimum, average, and standard deviation of the yields and the average of computational time to get near optimum solution are used for evaluation. The numerical examples are performed by using the real data of one HDD model from an undisclosed HDD manufacturer. The selected HDD model consists of
212 Table 16.2 Data of test problem Component Data/supplier 1 1 Yield 0.9870 Cost ($) 4.0286 Rating 84 2 Yield 0.9780 Cost ($) 4.0637 Rating 80 3 Yield 0.9740 Cost ($) 7.0462 Rating 85 4 Yield 0.9850 Cost ($) 8.0689 Rating 73 5 Yield 0.9970 Cost ($) 0.3561 Rating 92 6 Yield 0.9979 Cost ($) 0.7400 Rating 85 7 Yield 0.9983 Cost ($) 0.1590 Rating 85 8 Yield 0.9957 Cost ($) 0.1000 Rating 88 9 Yield 0.9947 Cost ($) 0.2448 Rating 87 10 Yield 0.9983 Cost ($) 0.1400 Rating 87 11 Yield 0.9970 Cost ($) 0.2000 Rating 88 12 Yield 0.9914 Cost ($) 0.0400 Rating 85 13 Yield 0.9982 Cost ($) 0.1883 Rating 86 14 Yield 0.9554 Cost ($) 0.2277 Rating 75 15 Yield 0.9840 Cost ($) 1.0746 Rating 81
W. Wongthatsanekorn and N. Matheekrieangkrai
2 0.9920 4.5246 85 0.9820 4.3424 83 0.9760 7.0971 93 0.9899 8.6342 85 0.9980 0.4337 92 0.9982 0.7520 90 0.9987 0.1660 88 0.9989 0.1500 90 0.9951 0.2681 88 0.9998 0.1900 90 0.9989 0.2400 91 0.9990 0.0500 92 0.9997 0.2259 88 0.9882 0.3094 85 0.9931 1.1271 83
3 0.9937 4.6338 92 0.9874 4.7845 94 0.9816 7.2769 94 0.9910 8.7631 91 0.9978 0.4469 93 0.9987 0.7600 96 0.9993 0.2197 90 0.9999 0.1600 91 0.9973 0.3171 90 0.9999 0.2000 92 0.9999 0.2500 93 0.9999 0.0530 93 0.9999 0.2549 92 0.9998 0.3782 89 0.9975 1.1288 88
4 0.9941 4.6793 92 0.9910 5.3110 94 0.9916 7.6948 95 0.9935 8.8775 92 0.9987 0.4939 94 0.9991 0.7800 98 0.9999 0.2936 92 – – – 0.9993 0.3503 92 – – – – – – – – – – – – 0.9999 0.3919 92 0.9982 1.4204 90
5 0.9950 4.7472 93 0.9923 5.3575 93 0.9947 7.8235 96 0.9978 9.5310 90 0.9995 0.5128 93 – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
6 0.9961 4.7676 94 0.9931 5.4121 95 – – – 0.9986 9.5904 94 – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
7 – – – 0.9950 5.4863 95 – – – 0.9995 9.9004 93 – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
8 – – – 0.9961 5.5155 96 – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – –
16
Bee Algorithm for Solving Yield Optimization Problem for Hard Disk. . .
Table 16.3 Simulation results for GA, ACO and BA Total Budget cost Max Average ($) Method Solution ($) yield yield 28 Infeasible solution 29 BA 33231- 28.990 0.926006 0.926006 ACO 321110.926006 0.925762 GA 12133 0.926006 0.924343 30 BA 33441- 29.998 0.957044 0.957044 ACO 313420.957044 0.956978 GA 32233 0.957044 0.956842 31 BA 68541- 30.999 0.971222 0.971222 ACO 323420.971222 0.971040 GA 32233 0.971222 0.970904 32 BA 68565- 31.995 0.979939 0.979939 ACO 433430.979939 0.979919 GA 32243 0.979939 0.979889
213
Min yield
CPU % Standard Time Obtain deviation (s) optimal
0.926006 0.919125 0.919125 0.9570443 0.9564250 0.9564250 0.9712223 0.9703511 0.9703497 0.9799394 0.9797921 0.9797921
0.000000 0.001256 0.002797 0.000000 0.000163 0.000408 0.000000 0.000335 0.000376 0.000000 0.000050 0.000066
7.07 20.18 44.85 5.10 23.58 41.57 8.59 28.68 44.30 5.44 18.96 39.52
100.00 93.33 50.00 100.00 83.33 50.00 100.00 76.67 56.67 100.00 86.67 63.33
15 components. The cost and yield of all components are shown in Table 16.2. The data is originated from (Wongthatsanekorn and Matheekriengkrai 2010). According to reference paper (Coit and Smith 1996; Painton and Campbell 1995; Kennedy and Eberhart 1995; Dorigo et al. 1996; Nourelfath and Nahas 2003; Nahas and Nourelfath 2005; Karaboga and Basturk 2008; Lee and EI-Sharkawi 2007) and trial experience of many experiments, the following parameters in the GA, ACO and BA are used. For GA, the population size, crossover rate, mutation rate and crossover parameter are set to 200, [0.5,1], 0.01 and 0.5 respectively. For ACO, the value of a, b, and r are 0.1, 0.8 and 0.01 respectively. For BA, the population of bees, number of elite bees, number of other selected bees, number of bees around elite bees, number of bees around other selected bees and patch size are set to 20, 2, 3, 10, 5 and 0.1 respectively.
16.4.1 Simulation Results For this study, four different available budgets ($29, $30, $31 and $32) are considered. This can be found by relaxing NBIP formulation of YOBS such that the (16.11) is used as objective function instead. The problem is then resolved for all possible budget values until NBIP returns infeasible solution. We start at $32 and decrease the budget by $1 at at time. When the available budget is at $28, there is no solution existed that would satisfy all the constraints. The sizes for the search space for these budgets are 1.11 107, 11.19 106, 44.79 107 and 44.79 107 for budgets $29, $30, $31 and $32 respectively and they was simulated using GA, ACO and BA to compare the best solution. Simulation results are shown as Table 16.3. Z¼
nsi X j¼1
Yij xij
(16.11)
214
W. Wongthatsanekorn and N. Matheekrieangkrai Table 16.4 Optimal iteration number using BA, ACO and GA Budget ($) Algorithm 29 BA ACO GA 30 BA ACO GA 31 BA ACO GA 32 BA ACO GA
for all budget settings by Optimal iteration number 4 13 58 11 22 57 9 12 57 9 32 74
For all available budget settings ($29, $30, $31, and $32), the best solutions are given in Tables 16.3. The results of each method show the best configurations to identify the best supplier of each component. Those are 33231-32111-12133, 33441-31342-32233, 68541-32342-32233, and 68565-43343-32343 for the yield value at 0.926006, 0.957043, 0.971222, and 0.9799394 and total component cost $28.9907, $29.9989, $30.9994, and $31.9954 under the budget constraints $29, $30, $31, and $32, respectively. The solution 33231-32111-12133 represents the first component comes from supplier 3, the second component comes from supplier 3, the third component comes from supplier 2 and so on. In addition, percentage of obtaining optimal solution for BA are 100% for all budgets constraints while ACO, and GA are at 93.33, 50.00 for budget $29, 83.33, 50.00 for budget $30, 76.67, 56.67 for budget $31, and 86.67, 63.33 for budget $32 respectively.
16.4.2 Performance Comparison 16.4.2.1
Solution Quality
According to the simulation results, the best solutions of three methods are given in Table 16.3 after performed 30 trials. Obviously, all methods have succeeded in finding the optimum solution with different budget constraints. The effectiveness algorithm for this research are considered in term of accuracy and robustness from minimum, maximum, average and standard deviation of optimal solution, obtaining optimal percentage and average computation time which the result shown the best algorithm are BA, ACO, and GA respectively. In addition, the optimal iteration number of three algorithms for all budget settings is shown in Table 16.4. This optimal iteration number represents the number of iterations required to obtain optimal solution. For budget at $29, BA, ACO and GA can achieve the optimal solution in 4, 13 and 58 iterations
16
Bee Algorithm for Solving Yield Optimization Problem for Hard Disk. . .
215
respectively. The results for other budget settings show the same trend. Hence, BA is the most effective method and can converge to optimal solution the fastest. For ACO and GA, the iteration number required to obtain optimal solutions tends to increase as the budget setting increases as well.
16.4.2.2
Computation Efficiency
The comparisons of computational time of three methods are shown in Table 16.3. The average computational time of BA is the lowest in comparison to other methods which are 7.07, 5.10, 8.59 5.44 s for $29, $30, $31, $32 budget setting. In the meantime, they are 20.18, 23.58, 28.68, 18.96 s for ACO and 44.85, 41.75, 44.3, 39.52 s for GA respectively. These are compatible with the convergence characteristics of the three methods which lower convergent cycle is represented that the computation time also faster. In this research, the convergence of the BA to the optimum solution is the fastest.
16.5
Conclusions
In this paper, three heuristics are proposed to solve HDD component yield optimization under budget and supplier’s rating constraints or YOBS problem. The problem is based on a series system (one piece per component) with multiple-choice constraints incorporated at each subsystem. This problem can be casted as a nonlinear binary integer programming problem and characterized as a NP-hard problem. Based on the design for assembling of each product, the formulation can be applied with different system. The result shows that Bee Algorithm heuristic can solve realistic sized problem in the case study most efficiently when compared with Ant Colony Optimization and Genetic Algorithm in terms of accuracy, robustness and computation effort. However, in practice, heuristic performance also depends on the characteristics of problem such as objective, constraints, problem size or search space and suitable algorithm. Acknowledgment This work was supported by the National Research University Project of Thailand Office of Higher Education Commission.
References Kwai Sang Chin, I-Ki Yueng, Kit Fai Pan (2005) Development of an assessment system for supplier quality management. City University of Hong Kong, Kow Loon and the University of the West Indies, St Augustine Coit DW, Smith AE (1996) Reliability optimization of series-parallel systems using a genetic algorithm. IEEE Trans Reliab 45(2):254–260
216
W. Wongthatsanekorn and N. Matheekrieangkrai
Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern 26((B):29–41 Holland JH (1975) Adaptation in natural and artificial system. The University of Michigan Press, Ann Arbor Karaboga D, Basturk B (2008) On the Performance of Artificial Bee Colony (ABC) algorithm. Appl Soft Comput 8:687–697 Kennedy J, Eberhart R (1995) Particle swarm optimization. In: proceedings of IEEE international conference on neural networks, Perth, vol 3, pp 1942–1948 Lee KY, EI-Sharkawi MA (2007) Modern heuristic optimization techniques theory and application to power system. Wiley, Hoboken, p 38 Nahas N, Nourelfath M (2005) Ant system for reliability optimization of a series system with multiple-choice and budget constraints. Reliab Eng Syst Saf 87:1–12 Nourelfath M, Nahas N (2003) Quantized hopfield networks for reliability optimization. Reliab Eng Sys Saf 81:191–196 Painton L, Campbell J (1995) Genetic algorithms in optimization of system reliability. IEEE Trans Reliab 44(2):172–178 Wongthatsanekorn W, Matheekriengkrai N (2010) Heuristics for hard disk drive component’s yield optimization problem under budget and supplier’s rating constraint. In: Lecture notes in engineering and computer science: Proceedings of The World Congress on Engineering and Computer Science 2010, WCECS 2010, 20–22 Oct 2010, San Francisco, pp 1047–1052
Chapter 17
Integrated Modeling of Functions and Requirements in Product Design and Factory Planning D.P. Politze, J.P. Bathelt, K. Wegener, and D.H. Bergsj€o
17.1
Introduction and Related Work
A mature management of requirements is seen as a crucial activity within the development of high quality mechatronic products (Heumesser and Houdek 2003; Houdek 2003). Market demands and customer needs have to be captured (Akao 1992). In addition, the company strategy should be considered while designing new products. Moreover, further stakeholders are providing additional constraints, such as legal aspects and production limitations. A holistic approach is desired covering the requirements from every stakeholder spanning from high level strategic goals to low level elementary product functions. In the past decades, functions and the modeling of functions became a part of some well-known product design methodologies such as of Roth (1982), Pahl and Beitz (2007), Hubka and Eder (1988), Ulrich and Eppinger (1995), Suh (1998) or Ullman (1997). In order to support a common understanding of functions between all stakeholders in the design process, formal function representations and vocabularies have been defined and most of them assume an abstract, verb-object representation of functions, as influenced by earlier work on Value Engineering (Miles 1972; VAI 1993). Unfortunately, these methodologies mainly cope with mechanical main functions and corresponding solution principles and do not properly address the rise of electronics, control and optimization functions. In order to provide high quality product functions, individual optimization of mechanical, electric/electronic and software components is not sufficient. Fundamental adaptations of the traditional approaches are needed. Most important, the component’s functional interactions and their contributions to product functions become important and should be considered very early in the design process.
D.P. Politze (*) Institute for Machine Tools and Manufacturing, ETH Zurich, Switzerland e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_17, # Springer Science+Business Media, LLC 2011
217
218
D.P. Politze et al.
Fig. 17.1 Unified modeling in the context of product design (Adapted from VDI 2206 (2004))
In today’s product development processes a methodical gap can be detected. As it can be inferred from Fig. 17.1, Pahl and Beitz (2007) and VDI 2206 (2004), there exist no clear and persistent linkage between the requirements and the product development process itself. Traditional approaches like the one proposed in VDI 2206 (2004) use a list of requirements to translate them into a clear development task. This list is not well integrated into the subsequent design process and it is reused in the end, in order to check the final product against the requirements from the list. Within this work a different viewpoint is pursued that proposes a unified requirement and function modeling that embeds the demands of all stakeholders into it while enabling an assessment of the final product properties as shown in Fig. 17.1. Recently the paradigm “factory as product” (Jovane et al. 2008) emerged which regards factories as a new kind of high complex products and thus envisions the transfer of established methods from product development to factory planning. In this context, the description of the intended functionalities of a new factory is also seen as beneficial and shall serve as a basis for the succeeding planning steps. The design of transformable factories is propagated in Wiendahl et al. (2009) in order to master the implications from the ongoing globalization. According to this approach, workshops are foreseen to model the vision, mission and strategic goals of the factory to be build. In addition, the context that includes market, customers, products, competitors and suppliers has to be analyzed. Both the context and the strategic decisions should foster the list of requirements. Subsequently, necessary processes, also from a functional point of view, are modeled according to Wiendahl et al. (2009). Furthermore, an assessment of the different factory concepts is addressed. However, the continuous and final assurance of the required properties is not as prominent as in VDI 2206 (2004). Analogously, Grundig (2009) is describing the design of factories. In Schenk and Wirth (2004), goals are treated as already available data. The key entry point is the so-called production program describing the product type, envisioned lot size, cost
17
Integrated Modeling of Functions and Requirements in Product Design. . .
219
and time constraints. Similar to the other approaches the respective functions and processes are modeled subsequently. An approach for the holistic planning of factories is presented in Pawellek (2008). Initially, where to go: has to be answered by a dynamic network of goals considering economic, human, social and functional aspects. Secondly, what to change in the factory: has an impact on the operational systems (product, technology, organization, equipment, personal, finances) and the corresponding orthogonal functional systems (such as logistics, production and assembly). The bi-directional interplay between the goals and operational systems, while elaborating solutions, is regarded as a challenge and thus remains an open question. In sum, a seamless integration and modeling of goals, requirements, functions and principle solutions is desired, but not yet present. Moreover, performance indicators (PIs) are already well introduced by Grundig (2009) to assess the elaborated concepts and the final plant. Within this view, this work presents an integration approach based on Function-Oriented Product Descriptions while exploiting appropriate performance indicators for the assessment.
17.2
Function-Oriented Product Descriptions
Function-Oriented Product Descriptions (FOPD) as proposed by Politze et al. (2010) constitute an approach combining a requirements model and a functional model. This model can be refined by more detailed descriptions and have to be addressed by the product or factory to be developed. By linking stakeholder demands, requirements, functions, solutions and performance measures, the FOPD approach aims at a better propagation, improved integration and enhanced traceability for the needs throughout the design process. Additionally, a FOPD is suitable to support the development of high complex and high variant products such as factories or cars while maintaining a function oriented point of view. In Fig. 17.2 the FOPD approach is presented in its application context. The stakeholder needs are one of the input sources for the FOPD. Internal requirements (production constraints, financial resources, etc.) and external demands (from the market, customer, law, etc.) are captured and formalized as functional requirements. According to traditional product development methodologies, input can be derived from market studies, interviews, workshops or with the help of setting up a feature list as described in Pahl and Beitz (2007). The collected features may then be reformulated into product functions that constitute functional requirements which have to be fulfilled by the product to be developed. The same approach may also be used in the case of factory planning. The mission of a company represents its external and the vision its internal goal on a very broad and general level of abstraction. Also, from the mission and vision, the main strategic targets may be formulated in a functional way and from these more specific functional requirements may be derived by a step-by-step refinement.
220
D.P. Politze et al.
Fig. 17.2 Derivation, management and exploitation related to FOPDs
The well-known technique of asking “how and what” may be applied and supports this activity. Especially in the case of factory planning, the design of the product to be produced has also to be taken into account. Therefore the product documentation serves as an important input for the derivation of a FOPD. Typically this form of documentation consists of assembly or part drawings and a BOM (Bill of Materials) and allows the retrieval of production functions and assembly functions according to those that have been described in Schenk et al. (2010). Initially, FOPDs were designed to foster the subsequent steps in product development. The contents of a FOPD (i.e. the functional descriptions and their dependencies) can be exploited for the development of high complex and high variant products, since it connects to common product development methodologies by allowing the derivation of an extended function structure. A FOPD is also exploitable towards factory planning and offers some useful inputs to performance planning, process design, equipment selection and layout planning. The selection of PIs from analyzing a FOPD may not only be used for monitoring and controlling, but also for planning the performance of a new factory. Thus, target values may be defined that determine the expected performance. In addition to that, the functional requirements and the descriptions regarding the manufacturing process may be used as an input for process definition and equipment selection. The first one may be achieved by interpreting the functional requirements as process activities. The latter can be realized by grouping and filtering specific functional requirements i.e. those that start with “to weld something”, “to mill something” or “to transport something”. Similar to that, an input for the layout planning can be generated, since each of these process activities needs to be allocated to a location in the factory layout. Besides the above discussed derivation and exploitation of FOPD, management issues are indicated in Fig. 17.2. A FOPD enables the management of variability
17
Integrated Modeling of Functions and Requirements in Product Design. . .
221
information. The existing product functions are made explicit in a FOPD. Both can be used to define products or even specific product variants in the design phase. Appropriate tool support and a versioning system may enhance the degree of reuse of existing functional descriptions which in turn allows the evolvement of the descriptions to a mature and reliable functional specification of the product. This allows to capture the lessons learned from past development projects, supports the idea of continuous improvement and promotes a higher specification quality in new projects. Thus the application of a FOPD helps to reduce development time while supporting the realization of high quality product functions. A FOPD contains descriptions of functional requirements, starting from a very abstract level to a very detailed and specific view. In this way the product designer’s knowledge of the solution and the rationale is captured and documented. Thus, all product functions and their technical solutions become traceable which in turn allows the detection of redundant developments and provides a base for arguing about specific solutions and alternatives. In order to enable the smooth interplay of all elements discussed in this section, a seamless data model for a FOPD is needed and Sect. 17.4 presents such a data model. Prior to that, context-dependent artifacts which have to be considered in this data model are discussed in the next section.
17.3
Activities, Artifacts and Inheritance Related to Function Oriented Product Descriptions
Based on the considerations that have been laid out in Sect. 17.2 a FOPD is intended to be used in the very early phases of product design and factory planning in order to overcome the methodological gap that has been identified in Sect. 17.1. In this section, three activities are introduced that are seen to be essential in the context of using a FOPD. As depicted in Fig. 17.3 the three activities are iteratively connected with each other and several people may be involved in that procedure. As it can be inferred from the picture, one of the activities is goal modeling, which aims at a formalization of the needs and goals of different stakeholders. In this context, a need can be expressed as a desired function that originates directly from a stakeholder’s intention. The description of this function is then enhanced by providing additional information about desired properties and hence represents a description of a functional requirement. In a next activity an intended solution for each functional requirement can be described – again in functional terms. In this way the functional requirement is decomposed into sub-functions and dependencies between them are annotated (e.g. sequences or processes). With the help of variability parameters, also alternative solutions can be described and formalized in this activity. In case there already exist a concrete solution for a functional requirement, a specific entity or resource (e.g.
222
D.P. Politze et al.
Fig. 17.3 Activities related to function oriented product descriptions
component, machine or workplace) can be mapped to it. Moreover, the mapping to specific people automatically defines the responsibility of these people towards the fulfillment of the functional requirement. Each described solution also yields consequences i.e. the identification and fixation of solution-specific parameters. These are referred as solution properties and become a part of the description of functional requirements. The remaining activity is the assessment preparation, which mainly deals with the mapping to PIs in order to reflect and trace the fulfillment of a functional requirement in an objective way. Besides this mapping, the goal modeling is completed in this activity by setting target or reference values for the chosen indicators in order to objectify the stakeholder’s intention. This step is seen as closing the loop that is presented in Fig. 17.3, since it is assumed that specific performance demands may also lead to new goals. As it has been described above, goal modeling, solution description and the assessment preparation are the three main activities in the context of using a FOPD. Besides being iterative, three different levels of abstraction can be identified where the activities may occur. This is mainly due to the functional decomposition while describing solutions. As a consequence, new needs in the form of desired functions arise and corresponding solutions have to be found in later stages. Thus, a whole network consisting of functional requirements is created and with every newly described solution it becomes more concrete and the degree of abstraction is lowered. Depending on the level of abstraction, the involved artifacts that are related to the three previously introduced activities change as summarized by Table 17.1. Roughly, the following levels of abstraction can be distinguished. The highest level of abstraction is mostly related to aspects of a whole company (C). On this level the mission and vision are translated into strategic goals and the desired properties often state the performance goals of a company. In order to reach the strategic goals, a strategy consisting of several strategic measures is defined and also the main business processes can be defined on that level. Finally, responsibilities can
17
Integrated Modeling of Functions and Requirements in Product Design. . .
223
Table 17.1 Context-dependent artifacts on different levels of abstraction Lvl Goal modeling Solution description Assessment preparation C Mission and vision Strategic measures Responsibilities Strategic goals Main business processes Key performance indicators Desired properties Solution properties P
Stakeholders Project goals Product functions Desired properties
Project activities (processes) Function specification Solution properties
Responsibilities Performance indicators
E
Tasks-related goals Elementary functions Desired properties
Task descriptions Technical solutions Solution properties
Entity Resource Measurable values
be assigned and the success of the company is assessed with the help of the performance indicators. On the project or product (P) level, the needs and goals of several stakeholders are addressed and result in new products or dedicated projects. Related to that the project goals are defined and the demanded functions of the new product are denoted. Accordingly, the project activities or a workflow that contribute to the project goals can be described. Also, a detailed specification on how to realize the intended product functions can be given by describing a certain functionality (behavior) in terms of other functions. Since this level of abstraction often involves the middle management of a company, responsibilities and performance indicators in general play a role. The most concrete level is related to specific entities (E) that are intended to implement the described functional requirement. This can either be done by specific persons that have to work on a task or by elementary functions that are typically realized by an existing standard part (e.g. a screw). Therefore task descriptions and detailed technical solutions will be described on that level of abstraction. Moreover, a mapping to concrete entities (e.g. machines) or to specific sensed or measured values is performed on that level. As shown in Table 17.1, desired properties and solution properties can be found on every level of abstraction. While often the desired properties have their rationale on the C-level, it is expected that most of the solution properties are related to concrete solutions and thus to be found on the E-level. Since it is very important to know about consequences of a decision, an inheritance mechanism for responsibilities, entities, desired properties and solution properties is part of the FOPD approach. In order to support the solution-finding process the desired properties are inherited top-down in a FOPD and thus reflects the idea of having a multi-dimensional goal modeling for later utility analyses as proposed in Zangemeister (1970). On the other hand, the solution properties are inherited bottom-up. This way the consequences of lower level solutions are communicated to the management which in turn enables a transparent and objective reporting that may lead to the detection of inconsistencies. Also entities and responsibilities are inherited bottom-up in order to provide information on the involved persons and the needed equipment on a higher level of abstraction.
224
17.4
D.P. Politze et al.
Data Model for Function Oriented Product Descriptions
As motivated in Sect. 17.2, a data model for managing the related activities and artifacts (see Sect. 17.3) is envisioned. Following the FOPD concept, such a data model aims at supporting the very early phases of product design and factory planning by combining the documentation of requirements with a function oriented point of view. A suitable model for a FOPD that responds to specific modeling demands of the automotive industry (e.g. a notion for pre-conditions, variability aspects, activities and sequences) has been proposed by Politze and Dierssen (2008) and recently extended by some attributes from the context of factory planning. The corresponding UML class diagram is depicted in Fig. 17.4. In contrast to traditional approaches where a hierarchy of abstract functions is working on flows, the inputs and outputs of a function are modeled explicitly in this approach. Thus, the depicted UML class diagram shows a clear distinction between function, flow and corresponding descriptors for the flow-based inputs and outputs of a function. By the use of specific subclasses, the dedicated assignment of types for functions and flows is possible and thus connects to the conventional understanding of functions. These subclasses reflect well-know vocabularies such as the ones proposed by Pahl and Beitz (2007) or Hirtz et al. (2002). For the functions related to factory planning, the vocabulary according to Schenk et al. (2010)
Fig. 17.4 Partial UML class diagram of a seamless data model for FOPDs
17
Integrated Modeling of Functions and Requirements in Product Design. . .
225
can be used. Also, the diagram shows explicit classes for mapping entities (e.g. components, areas, resources or indicators) to the functions and in addition to that, classes for representing design properties and solution properties are provided. Besides the classes, the given attributes in the UML diagram enable the usage of the model as described in the previous sections. The name attribute allows the assignment of an identifier and especially in the case of naming a function (i.e. a functional requirement), a pragmatic naming is encouraged as suggested in Eisenhut (1999). var and cond are the attributes that enable the modeling of variabilities and states. sub is to refer to other functions from a functional decomposition perspective and it should be noted that the presented scheme is meant to be used recursively, which means that every function-object in the model can be re-used as a partial solution for another. in and out are references to the input and output descriptors. The alt attribute refers to functions that are seen as alternatives. Analogously, the cmpl attribute can be filled with functionalities that naturally go along with each other or that together form some kind of a whole. The tms attribute offers a way of providing time-related information about a functionality and additionally ulo is meant for specifying the existence of a location where the functionality shall be implemented and used. Finally the map attribute enables a mapping to an entity. The props attribute allows a reference to desired properties or to solution properties as they have been introduced in Sect. 17.3. Moreover, domain or application related information can be attached in this way (e.g. images, page references or contact details). In addition to the above named attributes, the descriptor class has some special attributes. Its connection to a specific flow can be expressed with the corresponding attribute. before and after are intended for defining a chronological order between the descriptors and thus to define sequences. The act attribute is for specifying the type of activity that shall happen when the referenced flow is detected. From current experience “starts”, “stops”, “pauses” and “resumes” are proposed as being useful. Finally it should be noted that the cardinality restrictions in the model have been relaxed for better exploitation and stepwise refinement in subsequent design steps.
17.5
Supporting Functional Modeling with PLM Systems
The PLM system is meant to work as the hub for product development systems and tools, increasing reliability and facilitating exchange of product data. Support for functional development is essential in industry, especially in the automotive and aerospace industries as more functions are realized by different disciplines and where mapping of one function to one component or system is difficult. To gain full benefits of the FOPD framework an integration with the company’s bill of material (BOM) system and product design structure is essential. Both the BOM and the product structure can be housed within a PLM system. Under the PLM umbrella,
226
D.P. Politze et al.
several engineering tools and information management systems are included both for managing and for authoring product data. Requirement management (RM), Product Data Management (PDM), and CAx are only a few of these. Traditionally, the integration between CAD and PDM has been an area of research. This area has, however, matured and the research has now continued into other areas such as mechatronic integration, supplier integration, and complete lifecycle traceability, where the ability to model and trace functions are more important. The biggest concern in this field is believed to be the configuration complexity, and the tractability issues regarding engineering changes, connected to both product systems and product functions. In the following chapter a PDM system will be used for modeling the functional integration into corporate systems. A PDM system has been chosen since it is the natural hub of product data during a product (or a factory’s) design phase. According to Amann (2002) there are five basic functionalities in a PDM system: Information warehouse/vault A place where product data is securely stored. Document management To manage and use documents in a structured manner including includes document control capabilities, such as versioning. Product structure management The arrangement of product definition information as created and changed during the product’s lifecycle. It facilitates the management of configurations and the Bill of Materials (BOM). Workflow management The set-up of rules and procedures for work on the product definition and associated documents, managing engineering changes and revisions. Classification management Allows similar or standard parts, processes, and other design information to be grouped by common attributes and retrieved for reuse. Previous research has targeted the expansion of PDM into PLM and how the information model can be extended over the product lifecycle and to house different types of new data objects. One example of this is the expansion of a PDM system to incorporate requirement management (Malmqvist 2001). To support working with functional models two principal features that often exists in PLM/PDM systems can be used. One of them is the configuration capability often integrated in high-end PLM systems where standard Boolean operators can be used in order to configure a product or a component (Bergsj et al. 2006). The second option is to use the object and relationship features, used for the product structure, that is the basic part of PDM systems (both high end and low end systems) and build separated and linked tree structures. In both high end and lowend systems the functional structure has to be built as a separated structure from the product structure and linked using associative links. The basic structures of the FOPD framework can then be integrated (or combined) with the product structure and the BOM in order to better manage the functional aspect of modern product development (or factory design). By using a simple example from Politze et al. (2010), the FOPD framework will be implemented in a standard commercial PDM system to show this potential.
17
Integrated Modeling of Functions and Requirements in Product Design. . .
227
Fig. 17.5 Screenshot of a PLM system showing links between functional and items structure
In the demonstrator the Product structure functionality has been used to model functions and functional relationships. This means that documents and other types of information easily can be connected to a function. It is possible to select a wide array of predefined behaviors for the functional component including versioning and processes for change management and configuration. A set of new relationships was defined in order to connect the functional structure to the product structure using associative links. The functional structure was also designed to incorporate a variety of different functions defined in the FOPD framework. To illustrate the main functionality of the demonstrator a screenshot is depicted in Fig. 17.5. In the top window of it, a BOM of a LED light is depicted. The BOM consists of several items and assemblies of items that will be manufactured and assembled in the factory. This includes different wires, a battery and LED light assembly. In the figure the 9V battery item is selected and this shows a functional relationship with the “To buy 9V battery” function. The BOM is in turn connected to the product structure and the CAD vault housed in the same system. The Functional Structure depicted in the bottom window of Fig. 17.5 shows separated trees for “Company Strategy” functions and “Product functions”. This illustrates the possibility to work with different functions within the same hierarchical tree structure. In the figure, the “Manufacture LED” function is selected and it shows a relationship to the BOM and the “LED Light full assembly” item. The implementation of the FOPD framework in a PDM system shows that it is possible with relative small alterations to work with functions integrated with the product structure and BOM in commercially available PLM systems. This is a great advantage since it offers a superior traceability between the function and the
228
D.P. Politze et al.
realization of that function in the final product, without investing in new and complicated software but rather updating the existing company systems with a few new objects and attributes.
17.6
Conclusion and Future Work
This paper presents the concept of Function Oriented Product Descriptions concerning product development and factory planning. The approach is introduced, related concepts explained and the need for an integrated data model has been motivated. Such a model has been presented and the integration with PDM/PLM was exemplified. Generally, it can be concluded that the FOPD approach is promising to find and maintain the right balance between strategic company goals, customerorientation and production while supporting mature development processes and contributing to an efficient controlling. Furthermore, the application of a FOPD contributes to a better propagation and traceability of customer needs in product development and factory planning. The approach presented in this paper has been implemented in parts and its applicability was shown in several case studies. It is further assumed that better propagation, improved integration and enhanced traceability for the customer needs will also lead to a higher degree of knowledge about these needs throughout the development process. For the future, it is envisioned to validate this assumption with a study.
References Amann K (2002) Product lifecycle management – empowering the future of business. CIMdata, Ann Arbor Akao Y (1992) QFD-quality function deployment. Verlag Moderne Industrie, Landsberg Bergsj€o D, Malmqvist J, Str€ om M (2006) Implementing support for management of mechatronic product data in PLM systems? Two case studies. In: Proceedings of IMECE2006, Chicago, 2006 Eisenhut A (1999) Service driven design: Konzepte und Hilfsmittel zur informationstechnischen Kopplung von Service und Entwicklung auf der Basis moderner KommunikationsTechnologie. ETH, Z€ urich Grundig C-G (2009) Fabrikplanung: Planungssystematik – Methoden – Anwendungen. Hanser, M€unchen Heumesser N, Houdek F (2003) Towards systematic recycling of systems requirements. In: Proceedings of 25th international conference on software engineering, Portland. IEEE Computer Society, Long Beach, pp 512–519. Hirtz J, Stone R, McAdams D, Szykman S, Wood K (2002) A functional basis for engineering design: reconciling and evolving previous efforts. Res Eng Des 13(2):65–82 Houdek F (2003) Requirements Engineering Erfahrungen in Projekten der Automobilindustrie. Softwaretechnik Trends 23(1) Hubka V, Eder WE (1988) Theory of technical systems: a total concept theory for engineering design. Springer, New York
17
Integrated Modeling of Functions and Requirements in Product Design. . .
229
Jovane F, Westk€amper E, Williams D (2008) The manufuture road: towards competitive and sustainable high-adding-value manufacturing. Springer, Heidelberg Malmqvist J (2001) Implementing requirements management: a task for specialized software tools or PDM systems? Syst Eng 4(1):49–57 Miles L (1972) Techniques of value analysis engineering. McGraw-Hill, New York Pahl G, Beitz W (2007) Konstruktionslehre: Methoden und Anwendung. Springer, Heidelberg Pawellek G (2008) Ganzheitliche Fabrikplanung: Grundlagen, Vorgehensweise, EDV Untersttzung. Springer, Berlin Politze DP, Dierssen S (2008) A functional model for the function oriented description of customer-related functions of high variant products. In: Proceedings of NordDesign’08, Tallinn, Estonia, 2008 Politze DP, Bathelt JP, Wegener K (2010) Function-oriented product descriptions in product development and factory planning, lecture notes in engineering and computer science. In: Proceedings of The world congress on engineering and computer science (WCECS) 2010, San Francisco, 20–22 October, 2010. Newswood Ltd, International Association of Engineers, Hong Kong, pp 1168–1172 Roth K (1982) Konstruieren mit Konstruktionskatalogen. Springer, Berlin Schenk M, Wirth S (2004) Fabrikplanung und Fabrikbetrieb: Methoden f€ur die wandlungsf€ahige und vernetzte Fabrik. Springer, Berlin Schenk M, Wirth S, M€ uller E (2010) Factory planning manual: situation-driven production facility planning. Springer, Berlin Suh NP (1998) Axiomatic design theory for systems. Res Eng Des 10(4):189–209 Ullman D (1997) The mechanical design process. McGraw-Hill, New York Ulrich KT, Eppinger SD (1995) Product design and development. McGraw-Hill, New York Value Analysis Incorporated (1993) Value analysis, value engineering and value management. Clifton Park, New York VDI 2206 (2004) Entwicklungsmethodik f€ ur mechatronische Systeme (Design methodology for mechatronic systems). Beuth Verlag, Heidelberg Wiendahl H-P, Reichardt J, Nyhuis P (2009) Handbuch Fabrikplanung: Konzept, Gestaltung und Umsetzung wandlungsf€ahiger Produktionsst€atte. Hanser, M€unchen Zangemeister C (1970) Nutzwertanalyse in der Systemtechnik: eine Methodik zur multidimensionalen Bewertung und Auswahl von Projektalternativen. Wittemann, M€unchen
Chapter 18
Towards a Strategy to Fight the Computer Science (Cs) Declining Phenomenon Marcela Porta, Katherine Maillet, Marta Mas, and Carmen Martinez
18.1
Introduction
The results of previous investigations have determined a declining interest in CS studies in Europe and presented the consequences of this decline (Maillet and Porta 2010). Furthermore, other investigations will identify the reasons that retain students to enroll in CS (Porta et al. 2010). Among this reasons figured: the degree of difficulty of the subjects required to master CS (fear of mathematics), a lack of understanding of the career at the moment of choice, or misconceptions about the role played by a computer scientist in society and the economic sector. In order to clarify these ideas, this study will discover the generic content in European CS programs, aiming to prove that this cause should not represent a reason to reject the career. The comparison of the CS programs of several European universities will make it possible to identify the knowledge and skills needed to become a computer scientist. Then, these skills, or common fields of study, are compared to same level programs in other areas of study to identify which of them belong exclusively to CS. The results of this investigation are confronted to job offers in the market. Future work will suggest the usage of TEL tools and solutions to acquire the knowledge and skills needed to become a computer scientist and therefore attract and retain people to this field of study.
M. Porta (*) Telecom & Management SudParis, 9 rue Charles Fourier, 91011 Evrey, France e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_18, # Springer Science+Business Media, LLC 2011
231
232
18.2
M. Porta et al.
Background and Related Work
Official European statistics (Eurostat) and the Computing Research Association (CRA) indicated in 2008 that the number of students that have access to tertiary education in Europe is increasing in a sustainable way (Eurostat Official European Statistics (Research report number 200–6.97) 2007; Computer Science Teachers Association and Computer Research association. High school computer science survey 2010). Unfortunately the increase in tertiary studies is not reflected in all disciplines. Graduates in science, mathematics and computing have decreased each year, reflecting the lack of motivation to follow scientific careers from the part of students. Some countries, having a similar economic development as in Europe (like United States and Canada), have shown similar tendencies in the number of students enrolled in technology domains. As a result, decreasing numbers of student enrolments, graduates and CS courses offered in curricula have set off a general alarm (Computer Science Teachers Association and Computer Research association. High school computer science survey 2010). Some experts called this situation “a serious warning sign” (Nagel 2009) as they discover that fewer schools are offering CS classes, which means fewer students are being formed with CS skills. Other related concerns are teacher certification levels and the lack of solid information to help understanding and fighting this problem (Computer Science Teachers Association (CSTA) and Computer Research association (CRA) 2010). In order to understand the importance of the CS decline in Europe, it is primordial to know what consequences can take place if the decline continues. Previous studies have identified economic and educational consequences as their main concern (Maillet and Porta 2010). From the economic point of view a rising in the local CS development can be expected. As explained in the economic model of elasticity, the price of a product is determined by its offer and demand in the market (Henderson HD The theory of elasticity in price 1946). When the number of CS developers decreases, then, their products will become more expensive and scarce. Thus, it is crucial to maintain a large number of students and experts in CS in order keep technological development costs down and in order to sustain the growth of this industry in Europe. Another economic impact is the migration of the industry to countries outside Europe. The increasing prices of local CS development and the decreasing numbers of professionals to satisfy the demands from industry have driven European companies to redirect or migrate their labor forces to foreign countries where CS development work is not only as efficient as in Europe, but also, where there is a greater available production capacity at a lower cost (Lohr 2006). Predictions about job losses related to shifting high-technology work to low-wage nations with strong education systems, like India and China, were greatly exaggerated. As remarked by Lohr, S. 2006, “The concern is that misplaced pessimism will deter bright young people from pursuing careers in computing, and, in turn, would erode the skills in a field that is crucial to the nation’s economic competitiveness”.
18
Towards a Strategy to Fight the Computer Science (Cs) Declining Phenomenon
233
From the educational point of view, there is an increasing need to learn other languages to succeed in negotiations with other countries. However, the cultures and languages of the countries which are involved in technology development are not related to the European. Therefore, European languages such as English, French, German and Spanish (the most widely spoken languages in Europe and in the world) may become less useful to European industry than Chinese or Russian and possibly no longer be needed. Many studies pointed out the importance of learning languages to facilitate the migration for business. “As access widens, unique educational modules, courses and programs are being designed and evaluated throughout other regions, evidencing issues, challenges, opportunities and initiatives related to this education” (Bonk and Reynolds 2009). Other educational changes are reflected in CS curricula, as it should be adapted in order to offer courses of study that will train students for the kind of language and management skills that will be needed by competitive European industries playing in global markets (Williams 2009).
18.3
Understanding the Rejection of CS Studies in European Countries
The World Congress of Engineering and Computer Science (WCECS) published in 2010 an investigation to identify the motivations from the part of student to choose or reject CS as a career in university (Porta et al. 2010). The study called “Dec-CS: The computer Science declining phenomenon”, was designed to better understand the social perception of CS and to identify how these perceptions influence a student’s choice. An analysis of the responses made it possible to list the reasons that motivate the students to reject CS. Among the obtained answers: the perceived degree of difficulty represents a barrier, the misconception of the social function also matter, the unclearness of the domain, etc., The reasons for rejecting CS in university are represented in Fig. 18.1 and detailed below (Porta et al. 2010). Wrong Image perception of the career: One of the main reasons that retains mayors from following technology studies, is the image reflected by the CS program. Between the answers, words like “geek”, “nerd” or “lab rat” were applied in order to identify the perceived image from the part of students, meaning that the person that follow this studies is identify as not very successful in social life. This stereotype is not always true, but still, is changing the mind of high school students, who as teenagers will discard the idea of becoming social-rejected in university. Other related complain is the difficulty to identify the role of a computer scientist in society, or either real or fiction characters as examples. “Other careers like medicine, military and even a builder are more clear to us as they are represented with lots of examples and defend their role in society in a better way”(Cusso´ et al. 2009).
234
M. Porta et al.
Fig. 18.1 Dec-CS: The computer science declining phenomenon; “Reasons to reject computer science as a career in university”
High degree of difficulty: The image of CS requiring a strong background in mathematics and algorithms is a major fear which prevents students from enrolling. However, the professionals in this field suggested during this investigation (Dec – CS) that this difficulty is easy to overcome and that resources are provided to improve the student’s level in these subjects to help them affronting their fear of CS. It can be concluded that the difficulty reflected in the CS program might be overestimated by the students before making the career choice, but from the part of people that have chosen CS it does not represent a justifiable reason to reject the career. Some references agree that the reason why a student does not choose CS or does not feel attracted to technology as a field of study is related to the degree of difficulty these careers reflect, presuming that the decision is determined by the content and quality in the discipline (Koppi and et al The crisis in ICT education: an academic perspective. Paper presented at the meeting of Ascilite 2008; OCDE. Evolution of the young interest for the scientific and technological studies – evolution 2006). Gender Gap: Women still feel that some scientific careers related to CS present gender issues. They pointed out that the time requirements, the lack of other women that will accompany them and the negative image the career has when it comes to feminine motivation are some reasons to take into consideration at the moment of choosing a career. It is important to remark that strategies designed to attract women to CS represent a great opportunity to increase the total number of students in this field. Other investigations are making an effort to increase gender diversity in engineering degrees, because they estimate that the number of women enrolled is still very low (Gil et al. 2006).
18
Towards a Strategy to Fight the Computer Science (Cs) Declining Phenomenon
235
Underestimation of the formation skills: New technologies allow a person to easily acquire CS knowledge such as HTML or JavaScript thanks to useful autodidactic tools. These skills are recognized as important for the investigation’s participants, e.g. for including them as proficiencies that can represent much in their curriculum. However they think is not necessary to dedicate a complete career to learn them. Experts in the CS faculty are concerned with this attitude from the part of students when these ones perceive that “computer science is just programming;” and that “Faculty must consider ways to move students toward the idea that the work you do in computer science in the real world requires a lot of creativity, not only programming and that it can be dynamic” (Lewis et al. 2008).
18.4
Towards a Strategy to Fight the CS Declining Phenomenon
As presented before, students have several reasons to reject CS in university. Some of these skills are related to: • The fear of affronting the degree of difficulty reflected in the CS program, • The difficulty of identifying themselves with the CS engineer stereotype. It is hard to determine, how students get to this conclusion, when they pointed out other facts like: • The misunderstanding of the computer scientist role, • The difficulty to identify the job they will perform after pursuing these studies. This contradiction makes us get to the conclusion that they are overestimating the degree of difficulty (they might be fearing what they don’t understand), and that the image of the career can be improve to provide some solutions. As a strategy to help solving this problem, we propose the identification of the CS knowledge and skills in a European CS program. By identifying these common subjects, it will be possible to: • Determine the degree of difficulty of the program: observing if it should represent or not a general fact of rejection. • Provide the description of a Computer scientist, clarifying its role as a professional. To confirm the truthiness of these common subjects, they might be matched to skills needed in job descriptions, corresponding to CS demand.
18.5
Discovering the Needed Skills to Become a Computer Scientist
The following information represents the analysis of different curricula in several European universities. As CS can have a lot of roots and each university and country have modified programs and content according to different needs, we
236
M. Porta et al.
took System Engineering (SE) as the basis of this investigation. This analysis will identify the skills in common that represent the real challenge for a student that follows CS in university. The comparison between curricula has been done in 6 universities of different European countries. The names of the universities are listed below: 1. 2. 3. 4. 5. 6.
Universitat Polite`cnica de Catalunya, Spain1 Universite´ d’Evry Val d’Essone, France2 Tallin University of Estonia3 Catholic University of Leuven, Belgium4 University of Reading, United Kingdom5 Heidelberg University, Germany6
Furthermore, the list of skills was also compared to curricula in other fields of study in order to determine which competencies belong exclusively to CS. The compared domains where: Management: As a career that is having great impact in our days; to observe if the CS curricula propose some management skills. Graphic design: As a domain that can sometimes include computing systems and programs related to CS e.g. animation and design tools. Construction Engineering (civil engineering): to take a popular engineering as a reference. Electronic Engineering: Because often, people that are not related with CS find confusing this domain and compare them both. In other words, it is difficult to identify where the system engineer work starts or finish when he or she has worked in collaboration with an electronic engineer. The objective analysis of this comparison permitted to list the different unique knowledge and skills required to become a computer scientist. By “unique”, we understand that these skills can only be obtained by pursuing a CS degree, and that without these competencies, a professional will not be able to perform CS work. The identified knowledge and skills, resulting from this investigation, are explained below: Mathematical Logic: Is the most common domain between CS curricula and represent a competency that needs to be mastered by the student in order to become a computer scientist. As a general use, this knowledge will teach a student the bases
1
http://www.upc.edu/ http://www.univ-evry.fr/fr/index.html 3 http://www.ttu.ee/en 4 http://www.uclouvain.be/ 5 http://www.reading.ac.uk/ 6 http://www.uni-heidelberg.de/index_e.html 2
18
Towards a Strategy to Fight the Computer Science (Cs) Declining Phenomenon
237
Fig. 18.2 Representation of a simple algorithm
of programming and also the understanding of logic structures (Oxford notes in computing 1996). Mathematical logic is a subfield of mathematics with close connections to CS and philosophical logic (Burali-Forti 1897). The field includes both the mathematical study of logic and the applications of formal logic to other areas of mathematics. The unifying themes in mathematical logic include the study of the expressive power of formal systems and the deductive power of formal proof systems. In conclusion, the mathematical content in the CS program is different from the one learned in high school; dough, it is a positive aptitude to have a good mathematical degree of understanding, it is not necessary to have a superior score in math during school to succeed as a computer scientist. Algorithms: The second important content in the CS curricula is directed to understand the usage of algorithms. Contrary to an erroneous perception, this scientific area is more related to logic than to mathematics itself. An algorithm can be an effective method for solving a problem expressed as a finite sequence of steps. Algorithms are used for calculation, data processing and many other fields (Blass and Gurevich 1998). This competency has the characteristic to be very simple or very complex. It all depends on the teaching methods and objectives of the university program. Figure 18.2 shows the representation of a simple algorithm that aloud a student to understand the concept of it. This is an algorithm that tries to figure out why a lamp doesn’t turn on and tries to fix it using different steps. A more specific approach of this skill is to be able of measuring programs executions, performance and time required to use it. Each algorithm represents a list of well-defined instructions for completing a task. Programming: In simple words, programming is the process of designing, writing, testing and maintaining the source code of computer programs. This source code is written in programming languages and may be a modification of an existing source or something completely new.
238
M. Porta et al.
The purpose of programming is to create a program that exhibits a certain desired behaviour (customization). The process of writing source code often requires expertise in many different subjects, including knowledge of the application domain, specialized algorithms and formal logic (Oxford notes in computing 1996). With good programming bases, a student is able not only to create his or her own programs, but also to understand others, written in various programming languages. After acquire this skill, the computer scientist can adapt the knowledge to all kind of progress in technology. For example, is capable to understand JAVA without necessary having learned it in a classroom (Burali-Forti 1897). Another example of the advantages of a good programming bases, is being able to use new technology integrated to older one learned before (e.g. CSS in HTML) and without having followed this course. When students learned to program in a tool called Pascal, ultimate knowledge is transferred by the Massachusetts Institute of Technology (MIT) in another tool called SKIM. This fact, sustain the theory that knowing the basis of programming is better than learn how to use a specific programming tool. As Skim, there exist other tools created with the objective to program software i.e. C++, JavaScript, C# etc. Data Bases (DB): This skill represents a complex part of the CS programs. A database consists of an organized collection of big amount of data for one or more uses, typically in digital form (Blass and Gurevich 1998). The databases in CS can be taught in two different forms, the first one consists in understanding one or several examples of databases e.g. Oracle, Access etc. . . The second more complete, consist in understanding the mathematical bases and logic of the system, permitting the capacity to analyze any database after that (Burali-Forti 1897). Data modelling: Data modelling is a method used to define and analyze data requirements needed to support the business processes of an organization. The data requirements are recorded as a conceptual data model with associated data definitions. Actual implementation of the conceptual model is called a logical data model (Ling and Tamer 2009). Operative systems: Is another subject that can be taught in different ways (depending on each university system). It consists in learning one or several operative systems like Windows, Linux or UNIX. However, a university method cannot teach a student to use every operative system in the market. Therefore, a good preparation will consist in helping the students understanding the conception of them by creating their own operative systems. Other information: After analyzing the curricula related to CS and other domains (management, graphic design, construction engineering, and electronic engineering) we conclude that mathematics, as an exact science, is not specially required to become a computer scientist. The part of the curricula related to mathematics (mathematic logic) and the part of the curricula related to logic (Algorithms) can be learned during the university degree. However, a basis of understanding is needed in order to learn these areas in an effective way. Some universities do not
18
Towards a Strategy to Fight the Computer Science (Cs) Declining Phenomenon
239
take mathematics into consideration while designing their curricula, others, include a mathematic course as an introduction to the career not necessary related to the mathematical logic or algorithms in the program (Wing 2006). Another subject that might create controversy during the training to become a computer scientist is the learning of English as a second language (if it’s not a native language). Some CS experts affirm that in addition to English courses to learn the speaking language, they would like to have an engineering English course, with the introduction to subjects and abbreviations for terms written in English (Bruce 2008). Other CS experts affirm that learning English is important, because a good percentage of books and forums (important to the domain) are available only in this language. We assume that this is a needed skill to any career and that it’s usage in jobs will depend of the type of activity the person performs.
18.6
Testing the Information: Confirming the Profile
This part of the investigation aims to prove that the information given before is correct. As an example four profiles of System Engineering job offers are taken from a public and famous research system. The online system permit to find and postulate for actual job offers in almost any European country. These examples represent opportunities given in France, no longer than one month ago.
18.6.1 Profile 1: System Manager “Knowledge of C, C++ and experience with UNIX and/or LINUX platforms is necessary. Experience working with large existing software systems or development of C++ libraries is highly recommended. Experience in Place and Route Optimization, Timing Analysis or Logic Synthesis is considered a plus. You will need excellent programming and software engineering skills and preferably previous experience in the EDA industry. Experience with multithreaded and/or distributed programming is a big plus. Self-motivation, self-discipline and the ability to set personal goals and work consistently towards them in a dynamic environment will go far towards contributing to your success”.
18.6.2 Profile 2: IT-Project leader “Object orientated programming, Oracle database (basic knowledge), SQL database, XML language. Operating systems and tools: UNIX (Linux/AIX - basic knowledge), Windows, Eclipse, Telnet, FTP, VPN, SQL Developer (or comparable DB tools)”.
240
M. Porta et al.
18.6.3 Profile 3: IT-Developer “Object orientated programming - design patterns, Oracle Database (basic knowledge), PLSQL Database (advanced knowledge) and SQL Database (advanced knowledge). Operating systems and tools: UNIX (Linux/AIX - advanced knowledge), Windows, Eclipse, Telnet, FTP, VPN, SQL Developer (or comparable DB tools)”.
18.6.4 Profile 4: IT-Application Engineer (Put into Operation) JAVA (basic knowledge), object orientated programming, Oracle DB (basic knowledge), Perl, CVS, PLSQL (basic knowledge), SQL (basic knowledge), XML (basic knowledge). Operating systems and tools: UNIX (Linux/AIX - basic knowledge), Windows, Eclipse, Telnet, FTP, VPN, SQLDeveloper (or comparable DB tools). It is important to highlight that this offers concerned French territory but is originally written in English; though they don’t ask English as a required skill, it is needed in order to understand the offer.
18.7
Profile Description
Thanks to the identification of the needed knowledge and skills necessary to become a computer scientist (system engineering) and to the comparison with job offers related to CS, it is possible to design a profile that will better represent the experts in this domain. A system engineer is the person that understands, develops, maintain and use computing systems. Thanks to unique competencies such as the understanding of mathematical logic, algorithms, programming, DB, data modeling and operative systems; this person is capable to answer to a specific demand in the market: the creation, use and maintenance of software. Other authors affirm that in order to achieve qualifications and work-ability in CS, one may seek technical or professional certification (diploma that testifies the person’s level in each skill). “To become certified as a database management, for example, is one standard step on the path to becoming a database administrator” (Computer et al. 2010).
18.8
Conclusions and Future Work
This investigation presented the different knowledge and skills needed to become a computer scientist, more specifically system engineer. As an important part of a bigger research work (Porta et al. 2010), this research aims to reduce the
18
Towards a Strategy to Fight the Computer Science (Cs) Declining Phenomenon
241
negative consequences in the CS field due to the low number of students enrolling in this domain. Highlighting the misconception of the competencies in the CS program from the part of students, this study shows how the low number of students is affecting the development of CS and the European technology industry. The study determined that mathematical logic, algorithms, programming, data bases, data modeling and operative systems represent the necessary competencies to develop CS work. Other subjects as English and a good mathematical bases are to be taken into consideration. Some random profiles of job offers will confirm that companies are willing to find these competencies in the profile of people demanding a CS job. The information presented in this paper lead us to ask if technology can provide some help that will enhance the teaching and learning methods of the needed skills to become a computer scientist. After this, a longer investigation will analyze how technology is contributing in the teaching and learning methods of these skills by providing dynamic learning software. The usage of this software can help a student losing the fear in front of logic mathematics; algorithms etc. . . and therefore attract him or her to follow CS studies in university, contributing to the development of the technology industry in European countries. The idea of relating technology as a conduct to learn these skills is due to two reasons: The first one is that the usage of technology can motivate a student to lose fear of technology by having personal experiences with computers. The second one is that nowadays technology is being used to teach almost any domain in an effective and dynamic way (Windschitl and Andre 1998). As a source, the development of techniques such as Technology Enhanced Learning (TEL), representing teaching and learning methods with the usage of technology, are being developed by European communities in the name of education. Other studies suggest the use of technological environments to learn CS and assume that the problem of low number of students enrolled in these studies can be controlled by creating interfaces and different kind of devises (Mantey and Nolan 1996). Initiatives are already aiming to attract and retain talent to CS by sharing technology experiences (Cusso´ et al. 2009) and by creating teaching and learning methods to support students with the help of information technologies such as effective video clips for learning Web-languages (like JavaScript) (Kobayashi and AL 2009); or using 3D animation environments to teach introductory CS courses (Williams 2009).
References Porta M, Maillet K, Gil M (2010) “Dec-CS” The s declining phenomenon. Proceedings of The world congress on engineering and computer science 2010, Lecture Notes in Engineering and Computer Science, WCECS 2010, 20–22 Oct 2010. San Francisco, pp 1173–1178
242
M. Porta et al.
Maillet K, Porta M (2010) Consequences of the declining interest in computer science studies in Europe. In: Educon conference 2010, IEEE Engineering Education, Madrid Karat J, Karat CM, Ukelson J (2010) Motivation, and the design of user interfaces. Commun ACM 43(8):49–52 Eurostat Official European Statistics (Research report number 200–6.97) 2007. http://epp.eurostat. ec.europa.eu/portal/page/portal/eurostat/home/ Computer Science Teachers Association & Computer Research association. High school computer science survey 2010. (Research report no. 47B32). Retrieved from CSTA Website http://csta. acm.org/Research/sub/CSTAResearch.html Nagel D (2009)_ Computer science courses show steep decline. Retrieved from THE Journal 2009. http://thejournal.com/articles/2009/09/09/in-brief.aspx Computer Science Teachers Association (CSTA) and Computer Research association (CRA). 2010. High school computer science survey. Henderson HD The theory of elasticity in price 1946. Hancourt Editorial Lohr S (2006) Study plays down export of computer jobs. NYT Harvard Business Review 2006. http://www.nytimes.com/2006/02/23/technology/23outsource.html Bonk CJ, Reynolds T (2009) A special passage through Asia. Paper presented at the meeting of AACE conference 2009, Vancouver Williams A (2009) Lessons in Mandarin Chinese-language school near reality. New York times, 12 Aug 2009. p 15 Cusso´ R et al (2009) Sharing technology experiences of information from high school to university. Int Rev Eng Educ 2:1173–1178 Koppi T et al The crisis in ICT education: an academic perspective. Paper presented at the meeting of Ascilite 2008. Melbourne OCDE. Evolution of the young interest for the scientific and technological studies – evolution 2006. http://www.oecd.org/home/0,3675,fr_2649_201185_1_1_1_1_1,00.html. Gil M et al (2006) Real projects to involve undergraduate students in CS degrees. Paper presented at the meeting of the annual global engineering education conference 2006. Madrid Lewis A, Lichele H, Waite W (2008) Student and faculty attitudes and beliefs about computer science. Contributed Articles, Communications of the ACM 53(5) Oxford notes in computing, fourth edition. Library of Congress Cataloguing in Publication Data available 1996, ISBN 01928 Burali-Forti C (1897) A Question on transfinite numbers. reprinted in van Heijenoort 1976, pp 104–111 Blass A, Gurevich Y (1998) Algorithms: a quest for absolute definitions. Bulletin of European Association for Theoretical Computer Science 81. Dickson, California Ling L, Tamer M (2009) Concepts and relations. Encyclopaedia of database systems, pp 3–60, ISBN 978-0-387-49616-0 Wing J (2006) Computational thinking. Commun ACM J 49(3):33–35 Bruce W (2008) Making IT work since 1974. Online report retrieved from Oxford notes in computing, fourth edition 2008, ISBN 01928 Computer Science, Technology and Database Administration Careers and Jobs, Bureau of labor statistics, U.S. Department of Labor 2010. Career overviews Windschitl M, Andre T (1998) Using computer simulations to enhance conceptual change: the roles of constructivist instruction and student epistemological beliefs. J Res Sci Teach 35:145–160 Mantey PE, Nolan R (1996) Computer music to attract students to Computer science engineering. Presented at the frontiers in education conference, Salt Lake City Kobayashi T, AL (2009) Effective video clips for web language. Paper presented at the meeting of AACE conference 2009. Vancouver
Chapter 19
Development of Power Plant Simulators and Their Application in an Operators Training Center Jose´ Tavira-Mondrago´n and Rafael Cruz-Cruz
19.1
Introduction
One of the most important parts of the training programs of power plant operators is carried out through simulators; a big number of these simulators are of the type called full-scope. Full-scope simulators incorporate detailed modeling of those systems of the referenced plant with which the operator interacts in the actual control room environment. Usually, replica control room operating consoles are included (International Atomic Energy Agency 2004). In these simulators, the responses of the simulated unit are identical in time and indication to the responses received in the actual plant control room under similar conditions. A significant portion of the expense encountered with this type of simulator is the high fidelity simulation software that must be developed to drive it. The completeness of training using a full-scope simulator is much greater than that available on other simulator types since the operator is performing in an environment that is identical to that of the control room. Experienced operators can be effectively retrained on these simulators because the variety of conditions, malfunctions, and situations offered do not cause the operator to become bored with the training or to learn it by rote (Instrument Society of America 1993). In recent years the power increase of computers, their reliability and variety of graphical interfaces, added to the continued search to cut costs caused a new technological trend. In this trend, the power plants have replaced their former control boards with a local area network of Personal Computers (PCs) with graphical user interfaces (Pevneva et al. 2007). In this way, new or modernized power plants have a Human Machine Interface (HMI), where all the supervising and operation actions are carried out through interactive processes diagrams and
J. Tavira-Mondrago´n (*) Instituto de Investigaciones Ele´ctricas (IIE), Reforma 113 Col. Palmira, Cuernavaca, Mor., Mexico e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_19, # Springer Science+Business Media, LLC 2011
243
244
J. Tavira-Mondrago´n and R. Cruz-Cruz
another auxiliary functions as graphical trends and alarm displays are also included. Naturally, the operators of these plants need a suitable training because they face a complete change in their operation paradigm, and because of this, the training simulators also require a HMI as the ones in the actual plants (Tavira-Mondrago´n and Cruz-Cruz 2010). The Federal Electricity Commission (CFE) is a company created and owned by the Mexican government. It generates, distributes and markets electric power for almost 34.2 million customers. This figure represents almost 100 million people. The CFE incorporates more than a million new customers every year. The infrastructure to generate electric power is made up of 177 generating plants, having an installed capacity of 51,571.1 megawatts (MW), where 23.1% of its installed capacity stems from 22 plants which were built with private capital. The CFE produces electric power by means of various technologies and various primary energy sources. In this way, it has thermoelectric, hydroelectric, coalfired, geothermal and wind powered plants and facilities, as well as one nuclear power plant (CFE). The Ixtapantongo National Training Center for Operators (CENAC-I) of the CFE is devoted to train fossil-fuel power plant operators in order to satisfy the CFE requirements of highly qualified operation personnel. The CENAC-I receives each year, about one thousand workers of the generation process of thermoelectric energy; this includes power plants which utilize fossil fuels like charcoal, natural gas, diesel and oil. Among the main training courses offered by the CENAC-I are those related with normal operation and occurrence of malfunctions. In the courses of normal operation, the trainee carries out the start up of each one of the systems of the simulated plant, from the condition of cold shutdown to nominal power, according to the scope of the specific course. For instance, when the trainee takes a course about the steam generator operation, one of the first operation actions is to start the boiler burners to increase the steam pressure, during this time, the trainee practices the right way to control the warming of the metallic elements of the simulated equipment, with the aim of decreasing the thermal stress of such elements. In the course related with turning and acceleration of the steam turbine, the trainee increases the speed of the turbine from 0 to 3,600 rpm, but during this process, the trainee must control the temperature and pressure of the main steam, keeping at the same time the temperatures of casings, stages and rotor of the turbine in suitable conditions. On the other hand, are the training courses related with the occurrence of malfunctions in the steam generator, turbo-generator and auxiliary equipment. In these courses the trainee learns to carry out suitable corrective actions when some malfunction (e.g., feed-water pump trip, rupture of super-heater pipes, etc.) is activated during the simulation session. With these courses the trainees increase its power of decision and analysis capacity; hence, they improve their operation skills when they face problems in the actual power plant. All of the courses are directed by the specialized instructors of the CENAC-I supported by a complete group of power plant simulators. The main features of a hardware-software platform for training simulators are presented, and its application to upgrade two simulators and to build another two
19
Development of Power Plant Simulators and Their Application. . .
245
simulators is described. All these developments were done with the aim of modernizing and increasing the infrastructure of training simulators of the CENAC-I.
19.2
Simulator Architecture
Currently the CENAC-I has simulators based on control boards, classroom simulators, portable simulators, and recently as a consequence of the new HMI of the power plants, the Electric Research Institute (IIE) developed simulators with interactive process diagrams. These simulators have a modern hardware-software platform, and based on the computing power and low cost of personal computers, their selection as a computer platform was a logical selection. Regarding the operating system, Windows XP was selected based on aspects of portability, ease of coding and available software to develop graphical interfaces. Currently, the platform is been tested under Windows 7.
19.2.1 Hardware Architecture The basic computer platform consists of at least three PCs interconnected through a Fast Ethernet local area network (a typical configuration has five). Each PC has one processor, at least one GB memory and Windows XP operating system. Figure 19.1 shows a diagram of this architecture. In this Figure, the IC station is an instructor console with two 1900 monitors; OC1 and OC2 are the operator consoles, each one with two 1900 monitors and one 5000 monitor. The trainee can use any one of the OC1 and OC2 to supervise and control any process of the simulated plant. Additionally, and depending on the selected configuration, the simulator can include a station to observe the boiler flames, and additional PC as maintenance station, which serves as a backup if the IC is out of service, or as a test station. This means that any software modification is tested and validated in this station before any change can be implanted in the simulator.
19.2.2 Software Architecture The required features for the new simulators involved the development of original simulation software (instructor console, HMI and real time executive). About the upgraded simulators, the mathematical models corresponding to the process and electrical areas just were migrated to the new platform, while the required mathematical models to integrate the other two simulators were fully developed. The software architecture of the simulation environment has three main parts, the real time executive and the modules of the operator and the instructor console
246
J. Tavira-Mondrago´n and R. Cruz-Cruz
Fig. 19.1 Computer platform
(Tavira et al. 2010). Figure 19.2 shows the general structure of the software architecture. Each one of these modules can be hosted on a different PC, and they are connected through the TCP/IP protocol. All the modules of the simulation environment are programmed in C#, while the electrical and process mathematical models are programmed in Fortran. In the case of the control models, they are coded according to the simulator, the upgraded simulators have their models in Fortran, and the new ones have their models in C#. A brief description of each module is shown in the following paragraphs. (a) Real time executive: This module coordinates all simulation functions and its main parts are: (a) the mathematical model launcher; (b) the manager module for interactive process diagrams; (c) the manager module for the global area of mathematical models; (d) the manager module for the instructor console; (e) data base drivers. (b) Sequencer. It is in charge of sequencing in real time all the functions which require a cycling execution, these are: mathematical models, control models and another additional functions like historical trends. (c) Mathematical models. Almost all these models are formulated on the basis of lumped parameters (one exception is the modeling of the warming process of the turbine metals). The mathematical models include electrical and process areas. In the first group are the models of electrical generator and electrical grid, their mathematical formulation is based on Park theory and Kirchof’s law, the main variables calculated for these models are: power generation; turbine-generator speed and power plant voltages. The process models consist of the water cycle and its auxiliary services, their main components are: boiler; combustion process, main and reheated steam; turbine; main condenser and feedwater system (naturally, the water cycle and its services are specifics of each one of the simulators); all these models are formulated on the basis of momentum, heat and mass conservation principles. To customize the models to the actual power plant, each one of the equipment (tanks, valves,
19
Development of Power Plant Simulators and Their Application. . .
247
Fig. 19.2 Software architecture
pumps, fans, heat exchangers, etc.), are characterized with design information and operation data. (d) Control models: They simulate the digital and analog control loops of the actual plant. The digital loops deal with all the required conditions to turn on/off any equipment like pumps, fans, valves. On the other hand, the analog control is devoted to maintain process variables (pressures, temperatures, etc.) in pre-set values. Examples of the major control loops simulated are: boiler level, main steam temperature and combustion control. In the case of the simulators of the coal-fired and combined cycle power plants, the control models were integrated through the dynamic assembly of predesigned components. In the case of the other two simulators the controls were migrated from their original computer platform. (e) Transducer: This module adapts the flow information between the mathematical models and control models. (f) Operator: The operator module is in charge of the operator HMI and manages the information flow with the executive system. The HMI consists of interactive process diagrams, which are Flash movies; the Flash movies have static and dynamic parts. The static part is constituted by a drawing of a particular flow diagram whereas the dynamic part is configured with graphic components stored in a library which are related to each one of the plant’s equipment, e.g., pumps, valves, motors, etc. These components have their own properties and which are established during the simulation. (g) Instructor console: This module is the instructor HMI and consists of five parts: (a) a main module to carry out all the tasks related to the graphical
248
J. Tavira-Mondrago´n and R. Cruz-Cruz
interface of the instructor; (b) a module to retrieve the static information of the simulation session, e.g., malfunctions, internal parameters, etc.; (c) a module to store information in a data base using SQL; (d) a module to dynamically update the instructor console with the simulation information; (e) a module to communicate the instructor console with the real time executive. (h) Control editor: This module provides a graphical interface to model modern control systems. In these systems, the control algorithms are organized in basic components with a very specific function (PID, Set/Reset, Dead Band, Limiters, etc.), and they are represented through a hierarchical components network. This module is required only when it is necessary to develop new control models, like the coal-fired and combined cycle power plant simulators, in the case of the other two simulators this module was not utilized because their controls were migrated from their original computer platform.
19.2.3 The Human Machine Interfaces With the continuing progress in recent years of personal computer technology, graphical user interfaces have become an indispensable tool in day-to-day business, these interfaces thanks to their multi-window environment provide a tool easyto-understand and easy-to-use for power plant operators (Yamamori et al. 2000). A typical training session with a simulator is guided by a qualified instructor of the CENAC-I, the instructor is in charge of establishing the initial condition and directing the simulation session of the trainees from the IC. A partial view of the HMI of the IC is shown in Fig. 19.3; this HMI is a windows-friendly application with pull-down menus and icons. The main functions of the IC are: (a) Run/Freeze. The instructor starts or freezes a dynamic simulation. (b) Initial Condition. The main options are selecting an initial condition for beginning the simulation session, recording a new initial condition (snapshot) and erasing an initial condition. (c) Malfunctions, with this function, the instructor introduces/removes an equipment malfunction at any time during the simulation session. For instance, pump trips, heat exchanger tubes breaking, and control valve obstructions. All the malfunctions are grouped in systems and subsystems for easy location. (d) The instructor has the option of internal parameters for simulating the operative actions not related to automated equipment. These operative actions are associated with the local actions performed in the actual plant by an auxiliary operator. For instance, to open valves and to turn pumps/fans on. (e) The option of external parameters allows the instructor to modify the external conditions of the process. These conditions are: atmospheric pressure, room
19
Development of Power Plant Simulators and Their Application. . .
249
Fig. 19.3 HMI of the instructor console
temperature, voltage and frequency of the external electric system and fuel composition. (f) The instructor can create automatic training exercises. Each one of these exercises can include: initial conditions, malfunctions, local actions, and a time sequence. The exercises are stored for their subsequent use. (g) In its default mode, the simulator is executed in real time, but the instructor can execute the simulator up to ten times faster or up to ten times slower than real time. On the other hand, the HMI trainee is also completely graphical and based on a multi-window environment with interactive process diagrams to operate the simulated unit, these diagrams are organized in hierarchical levels following the organization of the power plant systems, i.e., boiler, turbine, electric generator, etc. There are two main types of diagrams: information diagrams and operation diagrams. The first ones show values of the selected variables by the trainee, or a predefined set of variables. The values are presented as bar or trend graphs. The trainee utilizes the operation diagrams to control and supervise the whole process, with this HMI he turns pumps/, fans/compressors on, opens valves, modifies set points of automatic controls and carries out any feasible operation in a similar way as he would do in the actual power plant. When the trainee needs to perform an action, he selects the suitable pictogram with the cursor, and then a pop-up window appears with the corresponding operation buttons. This window can be
J. Tavira-Mondrago´n and R. Cruz-Cruz
250
moved anywhere on the screen. At any one time the trainee can open all the windows he wants, and can do this in any operation console. In the operation diagrams, the trainee easily visualizes the off-service equipment because it is shown in white and the on-service equipment has a specific color depending on its working fluid. To this end, green equipment handles water, blue equipment handles air, red equipment handles steam, and so on. The main features of these diagrams are very similar for each one of the simulators, but there are some differences due to the customization carried out for each one of them as a result of the CFE requirements.
19.3
The Simulators
The upgraded fossil-fuel simulators correspond to fossil-fuel power plants of 300-MW (Fig. 19.4) and 350-MW. The first one handles 1,879 communication signals between the control boards and the mathematical models, and the second one has 4,539 signals. The big difference is that the last one has a distributed control system, which is completely simulated. Each one of the control board simulators were hosted in Compaq Work Stations with UNIX Tru64 operating system. The 300-MW simulator had three control boards and the 350-MW simulator had five control boards. These boards are connected to their corresponding Work Station through an input/output system based on RTP controllers. As such, the mathematical models of each one of these simulators were utilized to integrate two new simulators with the hardware-software platform previously described. Therefore these simulators instead of control boards have a completely redesigned hardware-software platform and a new HMI for the operation of the simulated process. The tasks carried out to upgrade each one of these simulators can be summarized as: (a) Migrating the mathematical models of process and control. These models are written in Fortran. (b) Migrating the databases (mathematical models, instructor console, input/output signals, etc.). (c) Customizing the simulation environment to the particular features of each simulator. (d) Designing and developing of the interactive process diagrams. (e) Integrating all simulator components and validating its behavior, instruction functions and real time operation. Additionally, in the case of the 300-MW simulator, the mathematical models of turbine and control turbine were replaced by models with a wider scope (Tavira et al. 2009). Figure 19.5 shows the final architecture for the new 300-MW simulator, the architecture utilized for the 350-MW simulator is very similar. The developed simulator for a coal-fired unit has as reference a 350-MW power plant, therefore, this simulator and the fossil-fuel simulator of 350-MW have some
19
Development of Power Plant Simulators and Their Application. . .
251
Fig. 19.4 Control board simulator
Fig. 19.5 Upgraded simulator
similar systems, in this way, the tasks carried out to build this simulator can be summarized as: (a) Developing the mathematical models related with the coal systems (process and control). (b) Adapting the current mathematical models of the 350-MW simulator to the requirements of the coal-fired simulator.
252
J. Tavira-Mondrago´n and R. Cruz-Cruz
Fig. 19.6 Interactive process diagram
(c) Adapting the databases of the 350 MW simulator to the requirements of the coal-fired simulator. (d) Designing and developing of the interactive process diagrams. (e) Integrating of all simulator components and validating its behavior, instruction functions and real time operation. Finally, the simulator for a combined cycle power plant simulator has as reference a 450-MW power plant, and it has as a starting point an existent 150-MW gas turbine full scope simulator (Rolda´n et al. 2008), therefore the main activities developed to integrate the combined cycle power plant simulator were: (a) Developing the mathematical models (process and control) of steam turbine, heat recovery steam generator, auxiliary systems (feedwater, lubricating oil, etc.), and electric systems (main generator and electric grid). (b) Integrating the former gas turbine model with the heat recovery steam generator. (c) Developing the databases required for the combined cycle power plant simulator. (d) Designing and developing of the interactive process diagrams.
19
Development of Power Plant Simulators and Their Application. . .
253
(e) Integrating of all simulator components and validating its behavior, instruction functions and real time operation. For each one of the simulators, the design of the interactive process diagrams was very important because they are the trainees’ interface during the courses in the simulator, therefore, such design was done having as a guide the actual process diagram of the reference power plant, so the simulator diagrams have a similar appearance to the actual diagrams, and this enhances the training quality. Figure 19.6 shows a partial view of an interactive process diagram for the feedwater system of the upgraded 300-MW simulator, this Figure shows two pop-up windows which are opened by trainee request to carry out the operation action required. With these windows the trainee can operate the simulated devices like he usually does in the actual plant.
19.4
Results
In order to test and validate the correct operation of each simulator, each one of them were exhaustively tested; the tests were carried out by qualified personnel of the CENAC-I with technical support of the IIE. In brief, these tests were: (a) Carrying out a complete installation of all the software required. (b) Verifying the right communication among the stations of the local area network. (c) Validating each one of the functions of the instructor console and operation consoles, according to the expected effects. (d) Carrying out availability tests with no aborts in any simulator task. (e) Carrying out operative tests from cold iron to full-load generation, shutdown and malfunctions. Currently the simulators are utilized as a part of the training courses for power plants operators.
19.5
Conclusion
A redesigned hardware-software platform to host training simulators and a methodology to upgrade and to integrate new simulators has been successfully tested with four different cases. Because of its computer platform, which is based on PCs and Windows XP operating system, it is expected that these simulators will have fewer operative and maintenance costs, compared to control board simulators. The simulators keep their full-scope and their real time features, and provide HMI suitable for modern power plant operators, where they no longer use control
254
J. Tavira-Mondrago´n and R. Cruz-Cruz
boards. Additionally, the CENAC-I instructors have a user-friendly HMI, with all the required functions to lead and track the training sessions. The hardware-software architecture described in this work was customized to the requirements of particular projects, but the simulation environment is flexible enough to be adapted to a stand-alone simulator, a multi-session simulator or any other simulator. Thanks to the training programs based on full-scope simulators, the benefits provided by the CENAC-I to the power plants are related with having a bettertrained staff, which represents a greater reliance on the operation of the generation units, greater security of the facilities and personnel and the achievement of better efficiencies. Moreover the operation staff is always trained to assure the useful life of the equipments. Finally, and according to the CENAC-I instructors’ experience, the main challenge for the simulator users (operators) is the cultural change, because now operators have to utilize a modern tool like a PC instead of a control board, and therefore the operators must forget their former operation habits and adopt novel operation techniques for a fluent and safe navigation in a different HMI. Acknowledgment The authors would like to thank all the personnel of the IIE and CENAC-I who participated in these projects. Especially the authors would like to acknowledge to: Luis Jime´nez, Fernando Jime´nez, Rogelio Martı´nez and Iliana Parra who developed the simulation environment, the instructor console and the execution software for the control systems; to Jorge Garcı´a, Victor Jime´nez, Jose´ Melgar, Yadira Mendoza, Jose´ Montoya, Sau´l Rodrı´guez, Edgardo Rolda´n, Guillermo Romero, Mayolo Salinas and Ana Vazquez who participated in the labors of migrating, updating and developing the majority of the mathematical models of process and control. Also we want to thank Dionisio Mascote, Alejandro Matı´as, Juan Mezo, Roni Orozco and Jose´ Te´llez for their support during the acceptance tests of the simulators.
References CFE. Comisio´n Federal de Electricidad http://www.cfe.gob.mx Instrument Society of America (1993) Fossil-fuel power plant simulators–functional requirements, ISA-S77.20-1993, p 23 International Atomic Energy Agency (2004) Use of control room simulators for training of nuclear power plant personnel, IAEA-TECDOC-1411, p 2 Pevneva NY, Piskov VN, Zenkov AN (2007) An integrated computer-based training simulator for the operative personnel of the 800-MW power-generating unit at the Perm district power station. Therm Eng 7:542–547 Rolda´n E, Mendoza Y, Zorrilla J, Cardoso M, Cruz R (2008) Development of a gas turbine full scope simulator for operator’s training. In: Proceedings of the european modeling symposium, EMS 2008, 8–10 Sept 2008, Liverpool Tavira J, Melgar J, Garcı´a J, Cruz R (2009) Upgrade of a full-scope simulator for fossil-fuel power plants. In: Proceedings of the spring simulation conference 2009, SpringSim’09, 22–27 Mar, San Diego Tavira J, Jime´nez L, Romero G (2010) A simulator for training operators of fossil-fuel power plants with an HMI based on a multi-window system. Int J Comp Aided Eng Tech 1:30–40
19
Development of Power Plant Simulators and Their Application. . .
255
Tavira-Mondrago´n J, Cruz-Cruz R (2010) Development of modern power plant simulators for a operators training center. In: Proceedings of the World congress on engineering and computer science, WCECS 2010, 20–22 Oct 2010, San Francisco Yamamori T, Ichikawa T, Kawaguchi S, Honma H (2000) Recent technologies in nuclear power plant supervisory and control systems. Hitachi Rev 2:61–65
Chapter 20
Benefits of Unstructured Data for Industrial Quality Analysis Christian H€anig, Martin Schierle, and Daniel Trabold
20.1
Introduction
Although Natural Language Processing (NLP) methods gained a lot of scientific interest over the past few decades, industrial use cases are still rare. Companies used to have mainly structured data (if there were data warehouses at all), and NLP methods were often complicated, unstandardized or just too slow. Nowadays even small companies keep track of lots of textual data, for example call center reports, e-mail correspondences or repair order texts. In fact, with regard to the rise of the internet and especially the Web 2.0, textual content seems to explode. Luckily NLP methods also evolved – besides lexical resources like WordNet (see Miller 1995), there are also more than sufficient software resources available for usage. Often pre-trained and standardized, it is easy to engineer systems using these modules and a suitable framework like UIMA (see Ferrucci and Lally 2004) or GATE (see Cunningham 2000). Furthermore the current methods are (with respect to modern computer technology) fast enough to use, and there are also unsupervised methods available, to keep the amount of manual annotation of training sets down. But there is still one question left – what exactly is the additional value of textual analysis compared to structured information? Considering the internet and the Web 2.0, the overvalue is conclusive, as there is no similar amount of information available in structured form. But with respect to internal data sources of a company and a specific industrial task, there is not much research dealing with the comparison between structured and unstructured data. Admittedly this comparison is hard to perform, requiring large datasets of comparable data in structured and unstructured form.
C. H€anig (*) Daimler AG, Research and Technology, 89081 Ulm, Germany e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_20, # Springer Science+Business Media, LLC 2011
257
258
C. H€anig et al.
This paper will try to answer this question for the specific task of quality analysis in the automotive domain. This task is not only of major importance for every manufacturing company, but it also requires methods of high accuracy. We will apply two different subtasks on a very large dataset of repair data, which contains as well structured information like damage and part codes, but also unstructured data in the form of a repair order text. This text is written by the mechanic, and contains the customer’s complaint as well as the mechanic’s repair actions.1
20.2
Text Pre-processing
First step of the NLP workflow is dealing with text pre-processing to increase the textual quality. Therefore the language is detected based on letter n-grams, using the NGramJ library which is based on Cavnar and Trenkle (1994). After the language recognition and the tokenization using regular expressions several cleaning steps are performed. These include context-sensitive replacement of abbreviations as well as context-sensitive spell-checking using neighbourhood word co-occurrences. The cleaning steps are described in more detail in Schierle and Schulz (2007). The application scenarios described in the following sections are all based on concepts (or: unnamed entities) which are detected in the text using a domain specific taxonomy containing a restricted vocabulary of technical terms. The taxonomy is organized as a multilingual and poly-hierarchical knowledge base, and was derived from available company sources and extended manually. The hierarchy is organized in terms of semantics instead of technical arrangements. To achieve multilingualism we represent every word as a language independent concept, and every concept may have more than one parent (e.g. a radio fuse may be situated under radio system as well as under fuses). Every concept can be defined in several languages and with several synonyms per language. For example fuses is just the English word for a concept that is called Sicherungen in German and fusibles in French. The handling of ambiguities is ensured by the optional specification of part-of-speech tags per word, and a context per synonym for each language. Therefore pump can be identified as a component if the word is used as a noun, or as an action when used as a verb. The POS tags where obtained by a Hidden-Markov-Model based POS tagger, which was trained on approx. 26,000 manually annotated words from our domain, and yields an accuracy of 92.2% (see Schierle and Trabold 2010). In order to extract relations from plain text, we add information about structure using syntactic trees. This is a crucial step for relation extraction and allows the discovery of relations of any arity (e.g. component symptom or component
1
Parts of this work have been published previously in H€anig et al. (2010).
20
Benefits of Unstructured Data for Industrial Quality Analysis
259
symptom condition). Syntactical parsers trained on general-purpose treebanks do not yield accurate results due to the absence of annotated corpora containing domain specific structures and terminology. Thus, we use an unsupervised parser (unsuParse, see H€anig et al. 2008; H€anig 2010) which solely needs raw text with annotated syntactic word classes. The unsuParse algorithm learns the structure of a language based on neighbourhood co-occurrences and preferred positions of tokens. To circumvent data sparseness it uses word classes instead of word forms. Those syntactic classes can be annotated in an unsupervised way (unsuPOS, see Biemann 2006) to stay language and domain independent. Additionally, we assign semantic tags like component, symptom and location, which are derived from the categories of our knowledge base described above. We use a set of predefined heuristic rules to extract relations from those resulting parse trees. Every rule contains information about the two participating categories of concepts and a maximum distance. This distance is calculated as the sum of the word distance and the node distance within the parse tree. The detailed algorithm is described in H€anig and Schierle (2009). It yields very accurate results for relation extraction on repair orders (precision of 86–89% and recall between 67% and 84% depending on the type of relation).
20.3
Early Warning
For an international manufacturer of premium brand vehicles it is crucial to be aware of any kind of quality problem as early as possible. Even some days can make a difference with respect to customer satisfaction and media attention. Beside the impact on brand perception and marketing, there are also legal issues like liability to consider. Therefore every manufacturer runs Early Warning processes as part of the everyday business intelligence. Input to these algorithms is usually structured data like part codes or damage codes. In addition to these data sources, some companies also posses large amounts of unstructured texts like call center reports or repair order texts, but their potential for early warning is still unclear. This section describes how early warning algorithms can be applied on textual input, and evaluates the results on historical data from our company. Input to the algorithm is a set of approximately 2.5 million repair cases R ¼ (r1, . . . rn), all taken for one specific model family, and restricted to cases in the US. We applied this restriction as repair order texts are only written down continuously in the US. For every repair one unstructured text is recorded, normally as noisy text of at most several ten words of domain language. After the application of the information extraction steps outlined in Sect. 20.2, we will end up with the following data per repair: 1. The code for the defective component, as given by the company’s damage codes 2. The code for the observed symptom, as given by the company’s damage codes
260
C. H€anig et al.
3. A set of concept-ids of components as extracted from the text 4. A set of relations between components and symptoms as extracted from the text Being now only confronted with structured data, the Early Warning algorithm can be applied to the company’s damage codes and the text respectively. The comparison is done on a component level and a relation (component with symptom) level, because those two approaches differ in complexity of the applied algorithms. The algorithm which we will use is similar to the real process in our company, but simplified in some specific points which are of no importance to the comparison of structured data and unstructured data. We will furthermore only create warnings for cars with less than 6 months of usage. For the further work, we will define the following: 1. A specific damage code from all codes D is defined by dl. For simplicity components and relations extracted from text are transformed into codes in order to apply the same algorithm to all data. 2. A specific test month is denoted by mi, a specific production month by pj. 3. C defines the set of cars, while C(pj) defines the set of cars from the given production month. We will use C6 + for the set of cars used more than 6 months, and C6 for the other cars respectively. Be aware, that the time of usage is not implied by the production month, as the car might have been sold later. 4. Repairs are denoted by R, R6 is used for cars with less than 6 months of usage. R6 (mi, pi) is the set of repairs on cars from the given production month in the given test month, which had less than 6 months of usage. As a first step, all codes DS are identified, which show a seasonal behaviour. These are excluded from the early warning detection as an analysis of these codes is far more complicated. The damage rate X of a given month mi and damage dj is defined by the ratio of cars having a repair of di in mi to the amount of all cars C in usage in this month. Xm ðmi ; d j Þ ¼
jR6 ðmi ; d j Þj jC6 ðmi Þj
(20.1)
The dataset with all repairs is divided in two subsets, one for training, and one for evaluation. On the training set (which covers two complete years of data) the damage rates X(mi, di) are calculated for every damage code and every calendar month (without respect to the production month). These values are input to a multivariate linear regression, assuming that seasonal damages (like problems with heater or air-conditioning) follow a trigonometric function over test months: f ðmi Þ ¼ a þ bx1 þ cx2 x1 ¼ sinðXðmi ÞÞ x2 ¼ cosðXðmi ÞÞ
ð20:2Þ
20
Benefits of Unstructured Data for Industrial Quality Analysis
261
The coefficient of determination R2 is used to determine the quality of the regression. Every code with R2 > s is considered to be seasonally influenced. The first part of the evaluation will be the comparison of seasonal codes from structured and unstructured data. For the early warning process itself, a different damage rate is calculated. With respect to the seasonalities we calculated X based on the repairs in a given calendar month. Aiming at the identification of production anomalies, the subject of analysis is a given production month. The test-month only defines the month of the analysis (as part of the daily quality analysis process), but the repairs will be counted up to this month. Additionally only cars are considered which completed their first 6 months of usage in or before the test month ml. The damage rate X0 (pi, dj, ml) of a given production month pi and damage dj in a given test month ml is defined as follows: P 0
X ðpi ; dj ; ml Þ ¼
k2fi:::lg
jR6 ðpi ; dj ; mk Þj
jC6þ ðpi ; dj ; ml Þj
(20.3)
This value is calculated once for every code for the training data (but for no specific production date), and denotes the error probability p ¼ X0 . By assuming an underlying binomial distribution of erroneous cars, we can infer the mean m and the standard deviation s for a given population C6 + . For the warning process, the damage rate X0 is calculated for the cross product of damage codes, test months and production months, and compared with the upper control limit (UCL): UCL ¼ m þ 3s
(20.4)
If the damage rate for the given damage code and production month is higher than the UCL, the system will generate a warning. The comparison between the early warning processes is done by comparing the generated warnings. For the evaluation one has to be aware of one important fact: despite all the company’s knowledge about historical quality issues, it is nearly impossible to state which source of data is right, if there are additional or missing warnings. We cannot rely on domain experts or the companies documentation for this evaluation, as they are biased by the early warning systems used at this time. Furthermore one structured damage code normally maps to several text codes (because the text might name several components where the code only names one), leading to much more warnings generated from text than from structured data. Therefore a direct mapping of warnings is not possible either. To deal with these issues, we evaluated the following way: 1. Warnings (and seasonalities) generated from text were clustered using cooccurrence significances of the related concepts (calculated using the t-score measure) and the Markov Clustering algorithm (see van Dongen 2000). This is done to keep the manual effort for evaluation down, but all clusters were manually reviewed and corrected to exclude the clustering process as possible erroneous influence. This leads to the text clusters CT.
262 Table 20.1 Agreement of calculated seasonalities and warnings
C. H€anig et al.
Seasonalities Warnings
Component based(%) 69 41
Relation based(%) 45 37
Fig. 20.1 Seasonal behaviour of the structured code ‘AC compressor insufficient effect power’
2. The warnings (and seasonalities) from structured data were manually clustered. But as they are defined and used distinctly, only few codes were clustered together – leading to the structured data clusters CS. 3. By including as much information as possible, the clusters from both early warning (and seasonality) calculations are manually compared and mapped if possible. Two warnings are considered equal if they are temporarily (within 6 month) and semanticaly (describing identical problems, depending on each other or having the same root cause) close. After this process, we can calculate the agreement A between the two approaches as follows: AðCT ; CS Þ ¼
jCT \ CS j jCT j [ jCS j
(20.5)
We argue that a high agreement of structured and unstructured information can be considered as a proof that textual data can be seen as an at least comparable source of data with respect to early warning. The results of the agreement calculation can be seen in Table 20.1. Examples for analogous seasonalities can be seen in Figs. 20.1 and 20.2, examples for warnings in Figs. 20.3 and 20.4.
20
Benefits of Unstructured Data for Industrial Quality Analysis
263
Fig. 20.2 Seasonal behaviour of the relation ‘AC unintended temperature’ extracted from text
Fig. 20.3 Warning generated from structured data for the shock absorbers
The results show, that the agreement between the generated warnings and seasonalities is too high to be neglected. Nearly half of the calculated items could be mapped. It is to mention, that many of the other items were rather similar in nature, or even identical but related to quite different test months.
264
C. H€anig et al.
Fig. 20.4 Warning generated from text for struts
Regarding the items which could be mapped, we became aware of three interesting observations: 1. Damage codes were designed to uniquely identify a specific part of the car. They are highly technical and very specific in nature. The concepts and relations extracted from text are more customer centric and general. While technical codes are better suited to find erroneous part of the car, the text is better suited to identify the general misbehaviour of the car, as noted by the customer. For example we identified several different structured codes of the air conditioning as seasonally influenced, but mainly only two different problems derived from the text: An air conditioning which doesn’t blow cold, and an air conditioning which smells bad. Of course there might be lots of different technical parts of the air conditioning possible to fail, but the customer will always only notice two distinct problems. Therefore we conclude that an early warning based on text can be important to become aware of customer problems, instead of car problems. 2. The information from the text is perfectly suited to reinforce and verify the structured information. Warnings which were found by both approaches can be seen as confirmed. On the other hand, the text might help to find erroneous repairs or encoding problems. Our evaluation identified several examples where the structured code didn’t fit to the text. 3. For all the warnings in agreement, we took a closer look on when the warnings were generated. It turned out, that the structured data warned earlier in five cases for components (two cases for relations), while the text was faster in seven cases (one for relations). In average of these cases, the structured data was 2 months faster (1.5 for relations), the text was 2.1 months faster (2 for relations). Therefore we conclude that text warnings are important for quality analysis, as they might occur earlier.
20
Benefits of Unstructured Data for Industrial Quality Analysis
265
In summary we conclude, that textual early warning can be done with sensible results. Although the results might not be completely identical to the ones from structured data, they give interesting insights and reveal problems from the customers view. Furthermore the traditional results can be confirmed and enriched with additional information.
20.4
Repeat Repair Detection
A repeat repair is a second (or further) repair attempt to a customer’s complaint. Thus, a high fixed first visit rate indicates excellent work and satisfied customers in the dealership. Of course, having a problem not solved during the first dealership visit leads to negative responses. But knowledge about successful repair attempts – that ones not being followed by a repeat repair – can be used to gather comprehensive information about the dealership process. Statistics about the success rate of repairs identify: • the repair approach to a given problem achieving the highest success rate, • the spare parts which are used in succeeding repair attempts, • the mechanic yielding best results. Hence, accurate detection of repeat repairs based on textual descriptions of the problems is able to benefit the complete scheduling process including spare part ordering and assignment of the accountable mechanic. Service operators can provide customers with the mechanic holding corresponding skills for the customer’s problem, can order the spare parts which will be used most likely and when the customer comes in, all needed resources will be available. Summing up, there are basically two challenges. Firstly, statistics regarding the success of repair approaches have to be accurate to be able to provide reliable predictions. Secondly, the customer’s complaint – available as textual data – has to be analyzed and classified to access statistics of the addressed problem. While methods using structured and/or unstructured data can be used to calculate repair attempt statistics, the second challenge can only be approached by text mining algorithms as structured data will only be available after repair actions are completed. In this section, we want to introduce a new methodology using textual data provided by the customer to detect repeat repairs. Additionally, we will compare it to the performance of currently applied approaches using structured data.
20.4.1 Methods Using Structured Information For every repair action taken, a so called repair order will be created. It contains free text fields like the customer’s complaint, the cause of the problem and the
266
C. H€anig et al.
corrective actions. While those three fields where typed in by the technician, structured information is added by a service advisor after completion of the repair. Two types of codes are of special interest here: labour operation codes encode the applied repair actions, and part codes are used to encode employed spare parts. To detect repeat repairs using structured data, codes for labour operations and parts are manually clustered to recognize different labour operations applied to similar symptoms. Repair orders which are classified into the same category are considered to be repeat repairs.
20.4.2 Analyzing Textual Data Textual information is unstructured data and thus, we need to create a new methodology to find similar problem descriptions. Basically, there are three clues for detecting a repeat repair: 1. Direct reference: Some repair orders contain a direct reference to former repair orders. This reference can be given as a date or – as in most cases – as a repair order number. For example: Cutomer states vehicles pulsates when stopping see previous ro# 121753 Those references are easy to extract, of high accuracy but also very rare and exist only for about 2.5% of all repeat repairs. 2. Expression of repetition: Some customers mention the repetition in the complaint. The most common (and still friendly) utterance that can be observed is: a/c is blowing warm again These expressions are – similar to direct references – very rare and barely existent. 3. Similar textual problem description: Similar problem descriptions refer to similar problems and thus, similar problem descriptions of a customer – within a certain range of time and mileage – lead to the assumption, that the first dealership visit did not fix the problem. Text based repeat repair detection classifies a repair order as repeat repair, if at least one of the above mentioned clues is present. While references to former attempts and utterances of repetition are basically pattern matching problems, the third one is more complex. We use feature vectors vtext for repair order representation as used in information retrieval. Words are weighted using their relative frequencies while stop words are
20
Benefits of Unstructured Data for Industrial Quality Analysis
Table 20.2 Precision and recall values for repeat repair detection
Method Labour operations and parts vtext vtaxonomy
Precision 0.74 0.79 0.82
267
Recall 0.46 0.81 0.94
F-score 0.57 0.80 0.88
neglected. To calculate similarities simtext between those vector representations, we apply the cosine measure. As language provides various ways to express the same subject or problem, this basic approach cannot detect all repeat repairs. Thus, we use our internal knowledge base which is able to deal with synonymous expressions. We build vectors vtaxonomy containing the extracted concepts of our taxonomy to additionally match similar expressions like does not work and inoperative. The weights depend on the relative frequency of the corresponding concepts and are multiplied by 0. 5 for each level above the matched expression in the hierarchy of our taxonomy. The vectors vtext and vtaxonomyfor the repair order center cap is missing are given in Eqs. 20.6 and 20.7. 0 1 center 0:33 vtext ðcenter cap is missingÞ ¼ @ cap 0:33 A (20.6) missing 0:33 0
center cap B tires B vtax ðcenter cap is missingÞ ¼ B chassis and suspension B @ missing loss of something
1 0:31 0:15 C C 0:08 C C 0:31 A 0:15
(20.7)
Some repair orders contain more than one problem. It is possible, that a textual description containing similar components and/or symptoms leads to high similarity scores although the concepts are not related the same way as in the other repair order. To avoid those false classifications, both repair orders must have at least one relation in common.
20.4.3 Evaluation To evaluate and compare the performance of those three different approaches, we have chosen 608 repair order pairs (the repair orders of such a pair belong to the same vehicle) randomly and manually annotated the second as repeat repair or normal repair order. This evaluation sample contains 146 repeat repairs. Calculated scores for precision, recall and f-score are given in Table 20.2. The results show the superior coverage and higher accuracy of textual data for this task. Even the basic approach using the text without concept recognition
268
C. H€anig et al.
outperforms structured data easily. The following two examples illustrate the advantages of text based classification.
20.4.3.1
Example 1: Labour Operations and Parts Match Exactly
Labour operations of both repair orders match exactly and so do the used spare parts. But – obviously to humans reading the text – it is not a repeat repair. The textual data reveals that a center cap fell off in both cases, but different ones were affected. Codes for labour operations are not that fine grained as it would be necessary for some tasks. In this case, structural data leads to a false alert. Customer states driver front wheel center cap is missing Labour operations: 22600501 Parts: CH52013653-AA Customer states passenger side rear hub cap fell off Labour operations: 22600501 Parts: CH52013653-AA
20.4.3.2
Example 2: No Structured Data Matches
The second example shows a really obvious repeat repair. The customer states the same complaint again. Though the technician applies different corrective actions to the same problem, codified information is not able to find the linkage between these two repair orders. Customer states a/c blows out hot air at times check and advise Labour operations: 24010102 (Leak check) Parts: CH5015778-AA CH82300329 a/c is blowing warm again Labour operations: 08194995 (Reprogram control module) Parts:
20
Benefits of Unstructured Data for Industrial Quality Analysis
269
20.4.4 Conclusions The comparison of methods based on structured and unstructured data for repeat repair detection comes up with a clear result: unstructured data contains more precise information and outperforms the current approach using labour operation and part codes. Additionally, methods analyzing textual data can be applied before structured data is even available and can be used to improve dealership processes.
20.5
Conclusions and Further Work
Summarizing the results of this work, we conclude that unstructured data is (if available) a useful extension to structured data and needs to be analyzed as well. With respect to early warning, textual data yields comparable results to damage codes and additionally provided interesting insights in the customer’s point of view. Regarding the task of repeat repair detection, unstructured information is even superior to structured data due to fine-grained information like conditions and locations which is available in textual data. In our future work, we will examine the influence of textual data to further tasks like root cause analysis and identification of erroneous encodings. Another crucial improvement will be the adaption to other languages and domains besides optimizations of the relation extraction algorithms.
References Biemann C (2006) Unsupervised part-of-speech tagging employing efficient graph clustering. In: Proceedings of the COLING/ACL-06 student research workshop, Sydney, Australia Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, Las Vegas. pp 161–175 Cunningham H (2000) Software architecture for language engineering. PhD thesis, University of Sheffield. http://gate.ac.uk/sale/thesis/ Ferrucci D, Lally A (2004) UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng 10(3–4):327–348 H€anig C (2010) Improvements in unsupervised co-occurrence based parsing. In: Proceedings of the fourteenth conference on computational natural language learning, Uppsala. Association for Computational Linguistics, Uppsala, pp 1–8 H€anig C, Bordag, S, Quasthoff, U (2008) Unsuparse: unsupervised parsing with unsupervised part of speech tagging. In: Proceedings of the sixth international conference on language resources and evaluation (LREC’08), Marrakech H€anig C, Schierle M (2009) Relation extraction based on unsupervised syntactic parsing. In: Proceedings of the conference on text mining services, Leipzig
270
C. H€anig et al.
H€anig C, Schierle M, Trabold, D (2010) Comparison of structured vs. unstructured data for industrial quality analysis. In: Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2010, San Francisco. pp 432–438 Miller, GA (1995) Wordnet: a lexical database for English. Commun ACM 38:39–41 Schierle M, Schulz S (2007) Bootstrapping algorithms for an application in the automotive domain. In: Proceedings of the sixth international conference on machine learning and applications, Los Alamitos. IEEE Computer Society, pp 198–203 Schierle M, Trabold D (2010) Multilingual knowledge-based concept recognition in textual data. In: Advances in data analysis, data handling and business intelligence. Studies in classification, data analysis, and knowledge organization. Springer, Berlin/Heidelberg, pp 327–336 van Dongen SM (2000) Graph clustering by flow simulation. PhD thesis, University of Utrecht
Chapter 21
Evaluation the Objectivity Measurement of Frequent Patterns Phi-Khu Nguyen and Thanh-Trung Nguyen
21.1
Space of Boolean Chains
Let B ¼ {0,1}, Bm is the space of m-tuple Boolean chains, whose elements are s ¼ (s1, s2, .., sm), si ∈ B, i ¼ 1,..,m.
21.1.1 Definitions A Boolean chain a ¼ (a1, a2, .., am) is said to cover another Boolean chain b ¼ (b1, b2, .., bm) – or b is covered by a, if for each i ∈ {1, .., m} the condition bi ¼ 1 implies that ai ¼ 1. For instance, (1,1,1,0) covers (0,1,1,0). Let S be a set of n Boolean chains. If there are k chains in S covering a chain u ¼ (u1, u2, .., um) then u is called a form with a frequency of k in S and [u; k] is called a pattern of S. For instance, if S ¼ {(1,1,1,0), (0,1,1,1), (0,1,1,0), (0,0,1,0), (0,1,0,1)} then u ¼ (0,1,1,0) is a form with a frequency 2 in S and [(0,1,1,0); 2] is a pattern of S. A pattern [u; k] of S is called a maximal pattern if and only if the frequency k is the maximal number of Boolean chains in S. In the above instance, [(0,1,1,0); 3] is a maximal pattern in S. The set P of all maximal patterns in S whose forms are not covered by any form of other maximal pattern, is called a representative set and each element of P is called a representative pattern of S.
P.-K. Nguyen (*) Km20 Hanoi Highway – Quarter 6 – LinhTrung Ward, ThuDuc District, Ho Chi Minh, Vietnam e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_21, # Springer Science+Business Media, LLC 2011
271
272
P.-K. Nguyen and T.-T. Nguyen A 5x4 matrix:
[(0,1,1,0); 4]:
and
1
1
1
0
1
1
1
0
0
1
1
1
0
1
1
1
1
1
1
0
1
1
1
0
0
1
1
1
0
1
1
1
0
0
1
1
0
0
1
1
Fig. 21.1 A 5 4 matrix for [(0,1,1,0); 4]
[(1,1,1,0); 2]: 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 0 0 1 1
[(0,0,1,0); 5]: 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 0 0 1 1
[(0,1,1,1); 2]: 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 0 0 1 1
[(0,0,1,1); 3]: 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 0 0 1 1
Fig. 21.2 Matrices for [(1,1,1,0); 2], [(0,0,1,0); 5], [(0,1,1,1); 2] and [(0,0,1,1); 3]
Based on this definition, the following proposition is trivial: Proposition 1: Let S be a set of m-tuple Boolean chains and P be the representative set of S, then any couple of two elements in P are not coincided. By listing all of n elements of the set S of m-tuple Boolean chains in a Boolean n m matrix, each element [u; k] of the representative set forms a maximal rectangle with maximal height of k in S set. For instance, if S ¼ {(1,1,1,0), (0,1,1,1), (1,1,1,0), (0,1,1,1), (0,0,1,1)}, then representative set of S consists of five representative patterns: P ¼ {[(1,1,1,0); 2], [(0,1,1,0); 4], [(0,0,1,0); 5], [(0,1,1,1); 2], [(0,0,1,1); 3]}. All of five elements of S are listed in a form of Boolean 5 4 matrix, as follows: Figure 21.1 shows a maximal rectangle with boldface 1 s and a maximal height of 4 corresponding to the pattern [(0,1,1,0); 4]. Other maximal rectangles formed by elements of P are (Fig. 21.2): A noteworthy case of the above instance: [(1,1,0,0); 2] is a maximal pattern, but it is not a representative pattern of S because its form is covered by (1,1,1,0) of the pattern [(1,1,1,0); 2].
21.1.2 Binary Relations Let a ¼ (a1, a2, .., am) and b ¼ (b1, b2, .., bm) be two m-tuple Boolean chains, then a ¼ b if and only if ai ¼ bi for any i ¼ 1, .., m. Otherwise, it is denoted by a 6¼ b. Given patterns [u; p] and [v; q] in S, [u; p] is contained in [v; q] – denoted by [u; p] [v; q], if and only if u ¼ v and p q. To negate, the operator! is used e.g. [u; p] ! [v; q].
21
Evaluation the Objectivity Measurement of Frequent Patterns
273
The minimum chain of a, b – denoted by a \ b, is a chain z ¼ (z1, z2, .., zm) determined by z ¼ a \ b where zk ¼ min(ak, bk) with k ¼ 1, .. ,m. The minimum pattern of [u; p] and [v; q] is a pattern [w; r], denoted by [u; p] \ [v; q], defined as follows: [u; p] \ [v; q] ¼ [w; r] here w ¼ u \ v and r ¼ p + q.
21.2
An Improved Algorithm
21.2.1 Theorem 1: For Adding a New Element Given S be a set of m-tuple Boolean chains and P representative set of S. For [u; p], [v; q]∈ P and z 2 = S, let [u; p] \ [z; 1] ¼ [t; p + 1], [v; q] \ [z; 1] ¼ [d; q + 1]. Only one of the following cases must be satisfied: i. [u; p] [t; p + 1] and [u; p] [d; q + 1], t ¼ d ii. [u; p] [t; p + 1] and [u; p] ! [d; q + 1], t 6¼ d iii. [u; p] ! [t; p + 1] and [u; p] ! [d; q + 1]. Proof: From Proposition 1, obviously u 6¼ v. The theorem is proved if the following claim is true: Let (a) u ¼ t, u ¼ d, and t ¼ d; (b) u ¼ t, u 6¼ d, and t 6¼ d; (c) u 6¼ t and u 6¼ d, only one of the above statements is correct. By the method of induction on the number m of entries of chain, in the first step, we show that the claim is correct if u and v differ at only one kth entry. Without loss of generality, we assume that uk ¼ 0 and vk ¼ 1. The following cases must be true: – Case 1: zk ¼ 0; Then min(uk, zk) ¼ min(vk, zk) ¼ 0, hence t ¼ u \ z ¼ (u1, u2, .., 0, .., um) \ (z1, z2, .., 0, .., zm) ¼ (x1, x2, .., 0, .., xm), xi ¼ min(ui, zi), for i ¼ 1, .., m, i 6¼ k; d ¼ v \ z ¼ (v1, v2, .., 1, .., vm) \ (z1, z2, .., 0, .., zm) ¼ (y1, y2, .., 0, .., ym), yi ¼ min(vi, zi), for i ¼ 1, .., m, i 6¼ k. From the assumption ui ¼ vi when i 6¼ k thus xi ¼ yi, so t ¼ d. Hence, if u ¼ t then u ¼ d and (a) is correct. On the other hand, if u 6¼ t then u 6¼ d, therefore (c) is correct. – Case 2: zk ¼ 1; We have min(uk, zk) ¼ 0, min(vk, zk) ¼ 1 and t ¼ u \ z ¼ (u1, u2, .., 0, .., um) \ (z1, z2, .., 1, .., zm) ¼ (x1, x2, .., 0, .., xm), xi ¼ min(ui, zi), for i ¼ 1, .., m, i 6¼ k; d ¼ v \ z ¼ (v1, v2, .., 1, .., vm) \ (z1, z2, .., 1, .., zm) ¼ (y1, y2, .., 1, .., ym), yi ¼ min(vi, zi), for i ¼ 1, .., m, i 6¼ k. So, t 6¼ d. If u ¼ t then u 6¼ d, thus the statement (b) is correct.
274
P.-K. Nguyen and T.-T. Nguyen
True statements when u ≠ v, and their first r entries are different:
True statements when u ≠ v, and their first r+1 entries are different:
True statements when combining the two possibilities:
(a)
(a)
(a)
(a)
(b)
(b)
(a)
(c)
(c)
(b)
(a)
(b)
(b)
(b)
(b)
(b)
(c)
(c)
(c)
(a)
(c)
(c)
(b)
(c)
(c)
(c)
(c)
Fig. 21.3 Cases in comparision
In summary, the above claim is true for any u and v of S that differ only at one entry. By induction in the second step, it is assumed that the claim is true if u and v differ at r entries, and only one of the three statements (a), (b) or (c) is true. Without loss of generality, we assume that the first r entries of u and v are different, and they differ at (r + 1)-th entries. Applying the same method in the first step where r ¼ 1 to this instance, it is obtained (Fig. 21.3). Therefore, if u and v are different at r + 1 entries, only one of the (a), (b), (c) statements is correct. The above claim is true, and Theorem 1 is proved.
21.2.2 Algorithm: For Finding a New Representative Set Let S be a set that consists of n m-tuple Boolean chains, and P the representative set of S. If a m-tuple Boolean chain z is added to S, the following algorithm is used to determine the new representative set of S [ {z}: ALGORITHM NewRepresentative(P, z) //Finding new representative set for S when one chain is added to S. //Input: P is a representative set of S, z: a chain added to S. //Output: The new representative set P of S [ {z}. 1. M ¼ //M: set of new elements of P 2. flag1 ¼ 0 3. flag2 ¼ 0 4. for each x ∈ P do 5. q ¼ x \ [z; 1] 6. if q 6¼ 0 // q is not one chain with all elements 0
21
Evaluation the Objectivity Measurement of Frequent Patterns
275
7. if x q then P ¼ P \ {x} 8. if [z; 1] q then flag1 ¼ 1 9. for each y ∈ M do 10. if y q then 11. M ¼ M \ {y} 12. break for 13. endif 14. if q y then 15. flag2 ¼ 1 16. break for 17. endif 18. endfor 19. else 20. flag2 ¼ 1 21. endif 22. if flag2 ¼ 0 then M ¼ M [ {q} 23. flag2 ¼ 0 24. endfor 25. if flag1 ¼ 0 then P ¼ P [ {[z; 1]} 26. P ¼ P [ M 27. return P
21.2.3 Theorem 2: For Finding Representative Sets Let S be a set of n m-tuple Boolean chains. The representative set of S is determined by applying NewRepresentative algorithm to each of n elements of S in turn. Proof: This theorem is proved by induction on the number n of elements of S. Firstly, when applying the above algorithm to the set S of only one element, this element is added into P and then P with that only element is the representative set of S. Thus, Theorem 2 is proved in the case of n ¼ 1. Next, assume that S consists of n elements, the above algorithm is applied to S, and it is obtained a representative set P0 of p patterns. Each element of P0 allows to form a maximal rectangle from S. When adding a new m-tuple Boolean chain z to S, it is necessary to prove that the algorithm allows to find a new representative set P of S [ {z}. Indeed, with z, some new rectangle forms will be formed along with the existing rectangle forms. But some of these new rectangle forms which are covered by other rectangle forms, the so-called “redundant” rectangles, need to be removed to gain P. The fifth statement in the NewRepresentative algorithm shows that the operator \ is applied to z and p elements of P0 to produce p new elements belonging to P. This means z “scans” all elements in the set P0 to find out new rectangle forms when
276
P.-K. Nguyen and T.-T. Nguyen
adding z into S. Consequently, three groups of 2p + 1 elements in total are created from the sets P0, P, and z. To remove redundant rectangles, we have to check whether each element of P0 is contained by elements of P or not, and elements of P contain other one, in which z is an element in P. Let x be an element of P0 and consider the form x \ z, there are two instances: if the form of z covers the one of x then x is a new form; or if the form of x covers the one of z then z is a new form. Anyway, the frequency of the new form is always one unit greater than frequency of the original. According to Theorem 1, with x ∈ P0, if some pattern w contains x then w must be a new element which belongs to P, and that new element is q ¼ x \ [z; 1]. To check whether x is contained by elements belonging to P, we do that whether x is contained by q or not. If x contained by q, it must be removed from the representative (line 7). In summary, first the algorithm checks whether elements belonging to P0 is contained by elements belonging to P. Then, the algorithm checks whether elements of P contain one other (from line 9 to line 18), and whether [z; 1] is contained by elements belonging to P or not (line 8). Finally, the above NewRepresentative algorithm can be used to find new representative set when adding new elements to S.
21.3
The Objectivity Measurement
21.3.1 Association Rule Discovery Advanced technologies have enabled us to collect large amounts of data on a continuous or periodic basis in many fields. On one hand, these data present the potential for us to discover useful information and knowledge that we could not find before. On the other hand, we are limited in our ability to manually process large amounts of data to discover useful information and knowledge. This limitation demands automatic tools for data mining to mine useful information and knowledge from large amounts of data. Data mining has become an active area of research and development. Association rules – a data mining methodology that is usually used to discover frequently co-occurring data items, for example, items that are commonly purchased together by customers at grocery stores. Association rule discovery problem were introduced in 1993 by Agrawal (Agrawal and Srikant 1994). Since then, this problem has received much attention. Today the exploitation such rules is still one of the most popular way for exploiting pattern in order to conduct knowledge discovery and data mining. Exactly, association rule discovery is the process discovering sets of value of attribute appearing frequently in data objects. From the frequent patterns,
21
Evaluation the Objectivity Measurement of Frequent Patterns
277
association rules can be created in order to reflect the ability of appearing simultaneously the value of attribute in the set of objects. To bring out the meaning, an association rule of X ! Y reflects the occurrence of the set X conduces to the appearance of the set Y. In other words, an association rule indicates an affinity between X (antecedent) and Y (consequent). An association rule is accompanied by frequency-based statistics that describe that relationship. The two statistics that were used initially to describe these relationships were support and confidence (Agrawal and Srikant 1994). The apocryphal example is a chain of convenience stores that performs a analysis and discovers that disposable diapers and beer are frequently bought together (Ye 2003). This unexpected knowledge is potentially valuable as it provides insight into the purchasing behavior of both disposable diaper and beer purchasing customers for the convenience store chain. So that, we can find association rules help identify trends in sales, customer psychology, etc. to make strategic layout item, business, marketing, and so on. In short, the association rule discovery can be divided into two parts: • Finding all frequent patterns that satisfy min-support. • Finding all association rules that satisfy minimum confidence. Most research in the area of association rule discovery has focused on the subproblem of efficient frequent pattern discovery (for example, Han et al. 2000). When seeking all associations that satisfy constraints on support and confidence, once frequent patterns have been identified, generating the association rules is trivial (Nguyen and Nguyen 2010).
21.3.2 The Methods of Finding Frequent Patterns • Candidate Generation Methods: for instance, Apriori method proposed by Agrawal in piece of research (Agrawal and Srikant 1994; Ye 2003), and algorithms rely on Apriori such as AprioriTID, Apriori-Hybrid, DIC, DHP, PHP,. . . in Han and Kamber (2006); Adamo (2006); Ye (2003). • Without Candidate Generation Methods: for example, Zaki’s method relies on IT-tree and intersection part of Tidsets in order to reckon support, (Han and Kamber 2006); or J. Han’s one relies on FP-tree in order to exploit frequent patterns, (Han et al. 2003); or methods Lcm, DCI, . . . presented in Han et al. (2000). • Paralleled Methods: instances in Agrawal and Shafer (1996); Zaki et al. (1996); Kosala and Blockeel (2000).
278
P.-K. Nguyen and T.-T. Nguyen Table 21.1 Invoice details Invoice code o1 o1 o1 o2 o2 o2 o3 o3 o3 o4 o4 o4 o5 o5
Goods code i1 i2 i3 i2 i3 i4 i2 i3 i4 i1 i2 i3 i3 i4
21.3.3 Issues Need to Be Solved • Working with the varying database is the biggest challenge. Especially, it need not scan again the whole database whenever having need of adding a new element. • A number of algorithms are effective, but their basis of mathematics and way of installation are complex. • The limit of computer memory. Hence, combining how to store the data mining context most effectively with costing the memory least and how to store frequent patterns is also not a small challenge. • Ability of dividing data into several parts for paralleled processing is also concerned. Using the NewRepresentative algorithm is making good the above problems (Nguyen and Nguyen 2010).
21.3.4 Data Mining Context Let O be a limited non-empty set of invoices, and I be a limited non-empty set of goods. Let R be a binary relation between O and I so that: for o ∈ O and i ∈ I, (o, i) ∈ R if and only if the invoice o includes the i-th goods. Then R is a subset of the product set O I, and the trio (O, I, R) describes a data mining context. For example, we have a context which is illustrated in Table 21.1, consisted of five invoices with the codes oj, j ¼ 1, .., 5 and four kinds of goods ik, k ¼ 1, .., 4. The corresponding binary relation R is described in Table 21.2 which can be represented by a 5 4 Boolean matrix.
21
Evaluation the Objectivity Measurement of Frequent Patterns
Table 21.2 Boolean matrix of data mining context
Table 21.3 Boolean matrix of data mining context with customers
i1 1 0 0 1 0
o1 o2 o3 o4 o5
o1 o2 o3 o4 o5
279
i2 1 1 1 1 0
Customer C1 C2 C3 C1 C4
I3 1 1 1 1 1
i1 1 0 0 1 0
i2 1 1 1 1 0
i4 0 1 1 0 1
i3 1 1 1 1 1
i4 0 1 1 0 1
21.3.5 The Objectivity Measurement of Frequent Patterns Let’s survey Table 21.3 By observing with the naked eye, we can notice immediately that the frequent pattern having form (1110) appears in two invoices (o1, o4) and is decided by the only customer C1. Meanwhile, the frequent pattern having form (0010) appears in five invoices (o1, o2, o3, o4, o5) and is decided by all four customer (C1, C2, C3, C4). Obviously, with minsupp ¼ 40%, the both frequent patterns are satisfied. But, whether or not we can use these two frequent patterns to generate association rules, then apply these rules to prospective customers? To answer this question requires us to consider objectivity measurement of each frequent pattern. Or, more detail, we must determine the ratio of number of customers involved in the process of creating a frequent pattern to total of customers. The higher the ratio is, the better the objectivity measurement of the frequent pattern is. Viz, the frequent pattern is created by most customers. Since then, we hope that this will also correct for the majority of prospective customers. Back to the above example, the frequent pattern (1110) has the objectivity measurement of 1/4 ¼ 25%. Meanwhile, the frequent pattern (0010) has the objectivity measurement of 4/4 ¼ 100%. With these parameters, business managers will have more information when making decisions. When applying the algorithm NewRepresentative, we can solve this problem. The tenor of the algorithm is to find out new rectangle forms created by applying the operator \ to corresponding line of data of current step and rectangle forms found in prior steps. However, in the first step, grouping the data mining context in order to gain the first evident rectangle forms and their corresponding height is a need. In per step, if the rectangle form created newly covers an existent rectangle form then removing the old rectangle form is necessary. Besides, it also needs to remove the newly created rectangle form if they mutually cover.
280
P.-K. Nguyen and T.-T. Nguyen
In summary, after per step, the set P whose elements differ mutually is created. This means to achieve various maximal rectangle forms in the database, as from the first line of data to the line of data corresponding to the current step. To calculate the objectivity measurement, in lines 22 and 25 of the algorithm NewRepresentative, before addition q to M and [z, 1] to P, we add the list of customers to q and [z, 1]. Customer of [z, 1] is the customer purchasing the invoice forming [z, 1]. Customer list of q is generated from customer lists of x and [z, 1]. Adding customer list includes examining whether the duplicate customer or not for increasing the number of iterations of customers. Specifically, consider the following example. Based on the Boolean matrix in Table 21.3 with a minsupp of 40%. Let P be the set of maximal rectangle forms of that Boolean matrix. It can be determined by following steps: Step 1: Consider line 1: [(1110); 1] (l1) Since P now is empty means should we put (l1) in P, we have: P ¼ {[(1110); 1; (C1-1)] (1)} Step 2: Consider line 2: [(0111); 1] (l2) Let (l2) perform the \ operation with the elements existing in P in order to get the new elements (l2) \ (1): [(0111); 1] \ [(1110); 1] ¼ [(0110); 2] (n1) Considering excluded: Consider whether the old elements in P is contained in the new elements: We remain: (1) Consider whether the new elements contain each other (note: (l2) is also a new element): We remain: (l2) and (n1) After considering excluded, it supplements the list of customers of the elements remaining: (1): [(1110); 1; (C1-1)] (l2): [(0111); 1; (C2-1)] (n1): [(0110); 2; (C1-1, C2-1)]//(n1) is generated by (l2) and (1) Putting the elements into P, we have: P ¼ {[(1110); 1; (C1-1)] (1) [(0110); 2; (C1-1, C2-1)] (2) [(0111); 1; (C2-1)] (3)} Step 3: Consider line 3: [(0111); 1] (l3) Let (l3) perform the \ operation with the elements existing in P in order to get the new elements (l3) \ (1): [(0111); 1] \ [(1110); 1] ¼ [(0110); 2] (n1) (l3) \ (2): [(0111); 1] \ [(0110); 2] ¼ [(0110); 3] (n2) (l3) \ (3): [(0111); 1] \ [(0111); 1] ¼ [(0111); 2] (n3) Considering excluded: Consider whether the old elements in P is contained in the new elements:
21
Evaluation the Objectivity Measurement of Frequent Patterns
281
We remove: (2) because of being contained in (n1), and (3) because of being contained in (n3) We remain: (1) Consider whether the new elements contain each other (note: (l3) is also a new element): We remove: (n1) because of being contained in (n2), and (l3) because of being contained in (n3) We remain: (n2) and (n3) After considering excluded, it supplements the list of customers of the elements remaining: (1): [(1110); 1; (C1-1)] (n2): [(0110); 3; (C1-1, C2-1, C3-1)] //(n2) is generated by (l3) and (2) (n3): [(0111); 2; (C2-1, C3-1)] //(n3) is generated by (l3) and (3) Putting the elements into P, we have: P ¼ {[(1110); 1; (C1-1)] (1) [(0110); 3; (C1-1, C2-1, C3-1)] (2) [(0111); 2; (C2-1, C3-1)] (3)} Step 4: Consider line 4: [(1110); 1] (l4) Let (l4) perform the \ operation with the elements existing in P in order to get the new elements (l4) \ (1): [(1110); 1] \ [(1110); 1] ¼ [(1110); 2] (n1) (l4) \ (2): [(1110); 1] \ [(0110); 3] ¼ [(0110); 4] (n2) (l4) \ (3): [(1110); 1] \ [(0111); 2] ¼ [(0110); 3] (n3) Considering excluded: Consider whether the old elements in P is contained in the new elements: We remove: (1) because of being contained in (n1), and (2) because of being contained in (n2) We remain: (3) Consider whether the new elements contain each other (note: (l4) is also a new element): We remove: (n3) because of being contained in (n2), and (l4) because of being contained in (n1) We remain: (n1) and (n2) After considering excluded, it supplements the list of customers of the elements remaining: (3): [(0111); 2; (C2-1, C3-1)] (n1): [(1110); 2; (C1-2)] //(n1) is generated by (l4) and (1) (n2): [(0110); 4; (C1-2, C2-1, C3-1)] //(n2) is generated by (l4) and (2) Putting the elements into P, we have: P ¼ {[(0111); 2; (C2-1, C3-1)] (1) [(1110); 2; (C1-2)] (2) [(0110); 4; (C1-2, C2-1, C3-1)] (3)} Step 5: Consider line 5: [(0011); 1] (l5) Let (l5) perform the \ operation with the elements existing in P in order to get the new elements
282
P.-K. Nguyen and T.-T. Nguyen
(l5) \ (1): [(0011); 1] \ [(0111); 2] ¼ [(0011); 3] (n1) (l5) \ (2): [(0011); 1] \ [(1110); 2] ¼ [(0010); 3] (n2) (l5) \ (3): [(0011); 1] \ [(0110); 4] ¼ [(0010); 5] (n3) Considering excluded: Consider whether the old elements in P is contained in the new elements: We remain: (1), (2), and (3) Consider whether the new elements contain each other (note: (l5) is also a new element): We remove: (l5) because of being contained in (n1), and (n2) because of being contained in (n3) We remain: (n1) and (n3) After considering excluded, it supplements the list of customers of the elements remaining: (1): [(0111); 2; (C2-1, C3-1)] (2): [(1110); 2; (C1-2)] (3): [(0110); 4; (C1-2, C2-1, C3-1)] (n1): [(0011); 3; (C2-1, C3-1, C4-1)]//(n1) is generated by (l5) and (1) (n3): [(0010); 5; (C1-2, C2-1, C3-1, C4-1)]//(n3) is generated by (l5) and (3) Putting the elements into P, we have: P ¼ {[(0111); 2; (C2-1, C3-1)] (1) [(1110); 2; (C1-2)] (2) [(0110); 4; (C1-2, C2-1, C3-1)] (3) [(0011); 3; (C2-1, C3-1, C4-1)] (4) [(0010); 5; (C1-2, C2-1, C3-1, C4-1)] (5)} So, the frequent patterns satisfy minsupp ¼ 40% (2/5) is listed: {{i2, i3, i4}(2/5); {i1, i2, i3}(2/5); {i2, i3}(4/5); {i3, i4}(3/5); {i5}(5/5)} However, when considering the objective measurement or the impact measurement of customer, we clearly see that the frequent pattern {i1, i2, i3} (2/5) just was created by only customer C1. Thus, when analyzing, we will have better information about the objective measurement of this frequent pattern, viz. 1/4 ¼ 25%. In addition, the objective measurement of {i2, i3, i4} (2/5) is 2/4 ¼ 50%, {i2, i3} (4/5) is 3/4 ¼ 75%, {i3, i4} (3/5) is 3/4 ¼ 75%, and {i5} (5/5) is 4/4 ¼ 100%.
21.4
Conclusions and Future Work
This research proposed the improved algorithm for mining frequent patterns. It ensures a number of the following requests: • To solve adding data into the data mining context in which scanning again the database is unnecessary. • To install easy and the low complexity (n22m, where n is number of invoices and m is number of goods. In reality, m is not varying and thus 22m is considered a constant).
21
Evaluation the Objectivity Measurement of Frequent Patterns
283
• The representative set P created deputies mainly for the data mining context. So, sometimes, to save the capacity of computer memory, it needs only to store the P set instead of the whole context. • To surmount the limit of computer memory not enough for storing the enormous data mining context. Because the algorithm allows classifying the context into several parts for processing one by one. • Applying simply paralleled strategy to the algorithm. • Finding out the objectivity measurement A number of problems need studying further: • Expanding the algorithm for the circumstance of altering data. • Improving the rate of the algorithm. • Investigating for applying the algorithm to the real works.
References Adamo JM (2006) Data mining for association rules and sequential patterns. Springer, New York Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowledge Data Eng 8:962–969 Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, Santiago, Chile Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, San Francisco Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD conference on management of data, Dallas, pp 1–12 Kosala R, Blockeel H (2000) Web mining research: a survey. SIGKKD Explorations 2(1):1–15 Nguyen PK, Nguyen TT (2010) The objectivity measurement of frequent patterns. In: Proceedings of the world congress on engineering and computer science, WCECS 2010, Lecture notes in engineering and computer science, San Francisco, 20–22 Oct 2010, pp 439–444 Ye N (2003) The handbook of data mining. Lawrence Erlbaum, Mahwah Zaki MJ, Ogihara M, Parthasarathy S, Li W (1996) Parallel data mining for association rules on shared-memory multiprocessors. Technical Report TR618
Chapter 22
Change-Point Detection Based on SSA in Precipitation Time Series Naoki Itoh and J€ urgen Kurths
22.1
Introduction
Data mining aims to extract any information that is nontrivial but useful. With the significant developments of computer performance, we can manipulate large amounts of data and analyze the structured data dynamically. Change-point detection is defined as one of the techniques. The work of change-point detection has been studied in many scientific fields to give a better interpretation of properties extracted from a complicated system. Singular Spectrum Transformation (SST) (Ide´ and Inoue 2005, 2004; Ide´ 2006; Itoh and Kurths 2010) that we introduce in this study is based on Principal Component Analysis (PCA). The methodology itself has been described as Singular Spectrum Analysis (SSA) by Golyandina et al. (2001). The basic idea of the change-point detection using the SSA is explained by a paper of Moskvina and Zhigljavsky (2003). SST is still being developed and improved (e.g. Mohammad and Nishida 2009). An advantage of using the SST is that any knowledge about ad-hoc tuning and modification for time series is not required. Therefore, we suppose that it is suitable for analyzing data in real world such that only a little a priori knowledge about the data is provided. In this paper Kenyan towns (Nakuru, Naivasha, Narok, and Kisumu) of the equatorial region and Wrangel Island in Arctic land (Matsuura and Willmott 2004) significantly impacted by the global warming such as drought and ice melting will be analyzed by this method in order to investigate a similarity of the data in the same country and a difference between the both areas.
N. Itoh (*) Center for Dynamics of Complex Systems University of Potsdam, Karl-Liebknecht-Straße 24/25, 14476 Potsdam-Golm, Germany e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_22, # Springer Science+Business Media, LLC 2011
285
286
N. Itoh and J. Kurths
Section 22.2 describes the algorithm of SSA. Section 22.3 defines change-point detection by using results from the SSA and shows some examples. The application to the climate time series is discussed in Sect. 22.4. Then finally the conclusion is described in Sect. 22.5.
22.2
Singular Spectrum Analysis
22.2.1 Background SSA can be defined as a model-free technique, which aims to decompose measured time series into some useful and interpretable components such as global trends, harmonic terms, and noise. The tasks of SSA can typically be classified as follows (Hassani 2007): 1. 2. 3. 4. 5. 6. 7. 8.
Finding trends of different resolution Smoothing Extraction of seasonality components Simultaneous extraction of cycles with small and large periods Extraction of periodicities with varying amplitudes Simultaneous extraction of complex trends and periodicities Finding structure in short time series Change-point detection
From these items the first six analyses can be performed by mathematically decomposing and reconstructing the whole time series. If the tasks of the above seventh and eighth items are, however, analyzed, it is necessary to define an additional parameter to partition the time series. In particular, the change-point detection used in this paper comes from the idea of Ide´ et al. (Ide´ and Inoue 2005, 2004; Ide´ 2006; Itoh and Kurths 2010).
22.2.2 Algorithm On the first stage, we make a trajectory matrix from a single time series and then decompose it by using Singular Value Decomposition (SVD). The trajectory matrix consists of the short vectors, Xj ¼ ðyj ; ; yjþL1 ÞT 2 RL ð j ¼ 1; ; KÞ;
(22.1)
from the single time series Y ¼ ðy1 ; ; yN Þ; which forms a Hankel matrix (Phillips 1971), where the parameters K and L can be described by K ¼ N L þ 1. The L is
22
Change-Point Detection Based on SSA in Precipitation Time Series
287
called a window length restricted by 2 L N/2 as a single parameter for the SSA. By SVD the trajectory matrix can be expressed as follows: 0
y1
B B y2 B X¼B B .. B . @ yL
y2 y3 .. . yLþ1
yK
yN
1
C yKþ1 C C C .. C .. . C . A
¼ ½X1 : : XK ¼ ðxij ÞL;K i; j¼1 ¼ USVT :
(22.2)
where U and V are respectively an L L and a K K unitary matrices, which consist of orthonormal vectors regarded as basis vectors of X, S is an L K diagonal matrix of singular values with nonnegative real numbers on the diagonal, and T means the conjugate transpose. If an order d will be defined as the number of the positive singular values, i.e. d : ¼ max{“index”, such that “singular value” > 0} ¼ rank X, it allows us to write as X¼
d pffiffiffiffi d X X li Ui V Ti ¼ Xi ; i¼1
(22.3)
i¼1
where Xi is defined as a rank-one orthogonal pffiffiffiffi elementary matrix. A set of these three notations consisting of singular value li , empirical orthogonal functions (EOFs) Ui, and principal components (PCs) Vi is called the i-th eigentriple of the matrix Xi. Note that there exists a relationship of the orthogonality among these sub matrices. The elements of each matrix can therefore be considered as the characteristics of the data, which can be classified according to the magnitude of the singular value. Reconstructing the time series from each matrix, it is possible to identify the data properties. Since the trajectory matrix of X has a Hankel matrix form, the initial single time series can be reconstructed by diagonal averaging again as follows: 8 1 Pn > m¼1 xm;ðnmþ1Þ n > > < P yn ¼ L1 Lm¼1 xm;ðnmþ1Þ > > > : 1 PNKþ1 m¼nKþ2 xm;ðnmþ1Þ Nn
ð1 n < LÞ ðL n < KÞ :
(22.4)
ðK n < NÞ
If the method is applied to the elementary matrix obtained from the eigentriple, the reconstructed time series has also the orthogonal relationship with the other reconstructed time series. If the singular values corresponding to the time series are almost same their patterns may be similar. By this grouping, the time series can be decomposed into components such as trends, harmonics, and noise.
288
22.3
N. Itoh and J. Kurths
Change-Point Detection
Change-point detection problem is defined as a quantitative estimate of structural changes behind time series. The topic has been discussed by using several methods, such as a parametric method based on Autoregressive model. The approach, however, does not always lead to a good result from heterogeneous and nonstationary data because it is not easy to prepare any sufficient knowledge for most of actual data consisting of complex factors of nature. Therefore as a proposal of the changepoint detection singular spectrum transformation (SST) deriving from SSA will be adopted in this study. SST is a nonparametric change-point detection method, in which feature extraction can basically be achieved by using SVD described in Sect. 22.2. Since this method does not need to assume a certain stochastic model, it is suitable for a wide range of time series.
22.3.1 SST Technique The calculation of the change-point using SST can be performed by finding a structural difference between before and after a reference time in the initial time series. If the reference time t will be chosen in the time series Y, then two kinds of sub time series of the past part Y(p) and future part Y(f ) can be described as follows: YðpÞ ¼ ðy1 ; ; yt1 Þ;
YðfÞ ¼ ðytþg ; ; ym Þ;
(22.5)
where g is defined as an initial time point of the sub series of the future part and m is restricted to t + g < m N. Then let us make a trajectory matrix from each sub series. XðpÞ ¼ ½XtK : : X t1 ;
XðfÞ ¼ ½Xtþg : : XtþgþK1 ;
(22.6)
where each vector consists of L elements of Y. Then from (22.2) and (22.3), the both trajectory matrices can be written by the eigentriples: X
ðpÞ
ðpÞ ðpÞ
¼U S V
ðpÞT
d qffiffiffiffiffiffiffiffi X ðpÞ ðpÞ ðpÞT ¼ li U i V i ; i¼1
XðfÞ ¼ UðfÞ SðfÞ VðfÞT ¼
d qffiffiffiffiffiffiffi X ðfÞ ðfÞ ðfÞT li U i V i :
ð22:7Þ
i¼1
Since they are orthogonally decomposed into the elementary matrices as representative of the components in the data, then it would appear that just the left singular
22
Change-Point Detection Based on SSA in Precipitation Time Series
289
vectors U(p) and U(f ) explaining these data structures are sufficiently useful to compare between the both sub time series. The elements of both matrices are defined respectively as follows: 0 U
ðpÞ
ðpÞ
u11
B ðpÞ Bu B 21 ¼B B .. @ . ðpÞ
uL1
u12
ðpÞ
ðpÞ
.. .
u22 .. . ðpÞ uL2
ðpÞ
u1L
1
ðpÞ C u1L C C ; .. C C . A ðpÞ uLL
0 U
ðfÞ
ðfÞ
u11
B ðfÞ Bu B 21 ¼B B .. @ . ðfÞ
uL1
u12
ðfÞ
ðfÞ
.. .
u22 .. . ðfÞ uL2
ðfÞ
u1L
1
ðfÞ C u1L C C .. C C . A
(22.8)
ðfÞ
uLL
According to the description in the reference Ide´ and Inoue (2005, 2004), Ide´ (2006), and Itoh and Kurths (2010) the first l vectors from the past part and the first vector from the future part are used to compute a change-point score which shows the structural difference, that is to say, ðpÞ
ðpÞ
Ul ¼ ½U 1 ; ; Ul ;
ðfÞ
b ¼ U1 :
(22.9)
Since the left vectors are arranged in the order of decreasing singular values implying a contribution to the initial time series, the change-point score shows how different it is to the most dominant property in the future time series from the first l dominant properties in the past time series, which can then be defined as follows: z¼1
l X
Kði; bÞ2 ;
(22.10)
i¼1
where K is an inner product of the vectors: ðpÞ
Kði; bÞ :¼ bT ui :
(22.11)
22.3.2 Demonstration As simple examples of SST, let us generate two kinds of time series: three linear functions with the positive, zero, and negative slopes and three kinds of sine functions with noise (Ide´ and Inoue 2005; Itoh and Kurths 2010). Figure 22.1 shows each original time series and its result by SST. Between the both initial time series there is seemingly no similarity, but by the change-point detection it is possible to find out the common properties that there exist the change points at about t ¼ 150 and at about t ¼ 300.
290
N. Itoh and J. Kurths
0.5
2
0
0 100
200 Time
300
0.05 0
−2
400
CP−Score
CP−Score
−0.5
100
200 Time
300
400
100
200 Time
300
400
100
200 Time
300
400
0.05
0
Fig. 22.1 Results of change-point score by SST; left panels: linear data set and right panels: harmonic data set
Fig. 22.2 Measuring stations in Kenya (East Africa) and in Wrangel Island (Arctic land); left: their positional relation, right top: tundra on Wrangel Island, right bottom: Rift Valley in central Kenya
22.4
Application
In our study the monthly precipitation at the four measuring stations in Kenya (Nakuru, Naivasha, Narok, and Kisumu) in the equatorial area (0. 27∘ S, 36. 0∘ E) and in Wrangel Island in Arctic area (71. 25∘ N, 179. 75∘ E) are analyzed by SST. Figure 22.2 shows their positional relation and the atmospheres of the both areas. Although we have the data with different time length, as shown by the solid line frames in the left side of Fig. 22.3 the region between January 1950 and December 1985 (for 35 years) corresponds to 432 data points overlapping at all the stations.
22
Change-Point Detection Based on SSA in Precipitation Time Series
x 10
CP
1920
1940 1960 1980 Year Naivasha [1950−1986]
0 1950
2000
1960
1970 Year Naivasha
1980
1960
1970 Year Narok
1980
1960
1970 Year Kisumu
1980
1970 Year Wrangel Island
1980
0.02
0
CP
200
1920
1940 1960 1980 Year Narok [1913−1991]
0.01 0 1950
2000
−3
5
200 0
1920
1940 1960 1980 Year Kisumu [1901−1988]
0 1950
2000
400 CP
0.01
200 0
x 10
CP
400
1920
1940 1960 1980 Year Wrangel Island [1937−2000]
0.005 0 1950
2000
1960
0.01 CP
Precipitation Precipitation Precipitation
200
Precipitation
Precipitation
5
0
Nakuru
−3
Nakuru [1904−1991]
291
50 0
1920
1940 1960 Year
1980
2000
0.005 0 1950
1960
1970 Year
1980
Fig. 22.3 Left four panels: monthly precipitation. The solid line frames mean the overlapping time length of all precipitation data for 35 years. Right four panels: change-point scores in Nakuru, Naivasha, Narok, Kisumu, and Wrangel Island. The dashed line frames emphasize time region around 1960 with paeks in the change-point score for Kenya
In order to compare their data structures during the overlapping time interval, we calculate the change-point scores of them. For the parameter L ¼ 24 (i.e. for 2 years) the change-point scores can be depicted as sharp curves (see the right side of Fig. 22.3). Although it is not easy to discuss similarity and dissimilarity from the raw precipitation time series, by comparing the results of the change-point detection in Fig. 22.3 we can especially see that the structures of the precipitation in each of the four Kenyan towns make a high change-point score around 1960 (see the dashed line frames in the right side of Fig. 22.3). This means that a common property of the precipitation can be discovered. On the other hand, the result from Wrangel Island can be shown as obviously a different kind of pattern than the results from the Kenyan towns.
292
22.5
N. Itoh and J. Kurths
Conclusion
In this study, as a data mining of the climate, we were able to demonstrate the capability of SST method, which was defined here as a change-point detection based on SSA, to detect quantitative changes in the structure of heterogeneous data sets. As described in Sect. 22.4 the change-point scores displayed a characteristic expression according to location. From the measuring stations in Kenya the commonality of the change-point could be shown around 1960. On the other hand, Wrangel Island remote from the equatorial area has shown obviously a different pattern of the change-point. Following the change-point detection proposed by Ide´ and Inoue (2005) the trend component extracted from the future sub time series was used in the calculation of the change-point score. So the result reflects just the change in the trend. Therefore, the next step will be to compute the change-point score for other components such as periodicity and noise term, in order to find the results from other time scale. Acknowledgment The precipitation data from http://www.ncdc.noaa.gov/ (GHCN v2 database) used in this study were provided by Norbert Marwan (PIK Potsdam). Udo Schwarz (Center for Dynamics of Complex Systems, University of Potsdam) and Tsuyoshi Ide´ (IBM Tokyo Research Laboratory) gave some quite valuable comments and suggestions. The authors would like to acknowledge their help.
References Ghil M, Allen MR, Dettinger MD, Ide K, Kondrashov D, Mann ME, Robertson AW, Saunders A, Tian Y, Varadi F, Yiou P (2001) Advanced spectral methods for climatic time series. AGU Rev Geophys 40:1–41 Golyandina N, Nekrutkin V, Zhigljavsky A (2001) Analysis of time series structure, SSA and related techniques. Chapman & Hall/CRC, Boca Raton Hassani H (2007) Singular spectrum analysis: methodology and comparison. J Data Sci 5:239–257 Ide´ T (2006) Speeding up change-point detection using matrix compression, 2006 workshop on information-based induction science, 31 Oct–2 Nov 2006 Ide´ T, Inoue K (2004) Knowledge discovery from time-series data using nonlinear transformations. In: Proceedings of the fourth data mining workshop (The Japan Society for Software Science and Technology, Tokyo, 2004), No. 29, 2004, pp 1–8 Ide´ T, Inoue K (2005) Knowledge discovery heterogeneous dynamic systems using change-point correlations. In: 2005 SIAM international conference on data mining (SDM 05), April 2005 Itoh N, Kurths J (2010) Change-point detection of climate time series by nonparametric method. In: Proceedings of the world congress on engineering and computer science WCECS 2010, San Francisco, 20–22 Oct 2010, pp 445–448 Matsuura K, Willmott CJ (2004) Arctic land-surface precipitation: 1930–2000 gridded monthly time series. Center for Climatic Research, Ver. 1.10, March, 2004 Mohammad Y, Nishida T (2009) Robust singular spectrum transform. In: Twenty second international conference on industrial, engineering & other applications of applied intelligent systems (IEA/AIE 2009), Taiwan, June 2009
22
Change-Point Detection Based on SSA in Precipitation Time Series
293
Moskvina V, Zhigljavsky A (2003) An algorithm based on singular spectrum analysis for changepoint detection. Commun Stat-Simul Comput 32(2):319–352 Phillips JL (1971) The triangular decomposition of Hankel matrices. Math Comput 5(115): 599–602
Chapter 23
Non-linear Image Recovery from a Single Frame Super Resolution Using Pearson Type VII Density* Sakinah Ali Pitchay
23.1
Introduction
Super-resolution (SR) aims to recover a high resolution (HR) image from one or more low resolution (LR) images. The limitations of the capturing source often results in loss of resolution and the introduction of additive noise. Thus, the capturing image will be distorted and will be insufficient to sample the image with adequate resolution. In this scenario when there exists only a single frame of low resolution, often the observed frame is deficient or noisy, which makes the recovering process an ill-posed problem where the solution does not exist or is unique. In a simple form to describe image recovery of classical super-resolution application, the low resolution suffers from down-sampled, blurred and some additive noise. The down-sampled version has fewer pixels than the high resolution image, which makes the problem harder to solve because the single frame version is essentially under-determined too. Thus, extra knowledge is vital to acquire an adequate solution and well-known as image prior. Employing the probabilistic model based framework, this extra information may be specified as a prior distribution on the salient statistics that images are known to have. The two main criterions are apparently contrary each other: local smoothness and the existence of edges. Hence the requirement of a good image prior is demanding. The former prior models have been proposed in the literature, yet with no substantiation. Gaussian-MRF represents a common choice for its computational tractability.
*
Revised and extended version of the work WCECS2010 (Pitchay 2010).
S.A. Pitchay (*) Faculty of Science and Technology, Islamic Science University of Malaysia (USIM), Bandar Baru Nilai, 71800 Nilai, Negeri Sembilan, Malaysia e-mail:
[email protected];
[email protected] School of Computer Science, The University of Birmingham, B15 2TT Birmingham, United Kingdom S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_23, # Springer Science+Business Media, LLC 2011
295
296
S.A. Pitchay
The Huber-MRF is prominent since it is more robust but still convex and works in He and Kondi (2003, 2004) and Pickup et al. (2007) are considered to be the state of the art approach. In previous paper (Ka´ban and Pitchay 2010), we proposed a robust density, the univariate version of Pearson type VII formulated as Markov Random Field in super resolution approach. Previously, the comparisons with the existing image priors are concentrating on compressive matrices transformation in Pitchay (2010). Due of curiosity, we formulated and examined the multivariate of Pearson type VII and compare it with the state of the art approach using the classical super resolution technique. This density is formerly used as robust density estimation in Sun et al. (2010) as alternative to the t-mixtures and in stock market modelling (Nagahara 1996). In this paper, we revised the work in Pitchay (2010) and we present image recovery from blurred and down-sampled of low resolution image in under-determined problem. The remainder of the paper is organised as follow. In Sect. 23.2, we describe the problem formally including how to estimate the high resolution image. Section 23.3 presents the image prior that we investigate. Section 23.4 presents our proposed solution and Sect. 23.5 details the experimental result. Finally, Sect. 23.6 concludes the paper.
23.2
Framework of Image Super Resolution
23.2.1 Observation Model The high resolution image of N ¼ r c pixel intensities will be vectorised and denoted as z. This image suffers from a complicated transformation into a low-resolution frame which includes blur and down-sampled. We adopt a linear model to express this transformation which, although it is not completely accurate, it has worked well in many previous studies on super-resolution (Hardie and Barnard 1997; He and Kondi 2003; Pickup et al. 2007). Denoting the low resolution frame by y in a vectorised form, and the linear transform that takes z into y by W where it is a stack of trasformation matrices into a single matrix W. We expand the forward model as the following: y ¼ Wz þ
(23.1)
W is a product of blurring and down-sampling matrix of size [M N], usually ill-conditioned matrix that models a linear blur operation and the down-sampling by row and column operator. h is a vector that represents an additive noise, assumed to be Gaussian with zero-mean and s2 variance. The blur operation is a linear blur of a 2-dimensional convolution matrix from an averaging filter matrix of 3 3. The down-sampling operator discards some of elements of a matrix while others remain
23
Non‐linear Image Recovery from a Single Frame Super Resolution. . .
297
unchanged. It works by removing some of the row and columns element. To make the problem more challenging, an additive noise is contaminated to the blurred and down-sampled image.
23.2.2 The Joint Model The overall model is the joint model of the observations y and the unknowns z. That is, Pr( y, z). Using these, we have joint probability Prðy; zÞ ¼ PrðyjzÞPrðzÞ
(23.2)
where the first term is the observation model, the second term is the image prior model. Hence we have for the first term in (23.2): PrðyjzÞ / exp
1 2 ðy WzÞT ðy WzÞ 2s
(23.3)
This is also called the model likelihood, because it expresses how likely it is that a given z produced the observed y through the transformation W. The second term of (23.2) will be instantiated with either one of the image priors discussed in Sect. 23.3. To achieve our goal, we need to ‘invert’ the causality described by our model, to infer the latent variables z from the observed variables y.
23.2.3 Inverting the Model to Estimate z We can invert the causality encoded in a probabilistic model by the use of Bayes’ rule. PrðzjyÞ ¼
PrðyjzÞPrðzÞ PrðyÞ
(23.4)
This is called the posterior probability of z given the observed data y. Equation 23.4 says that, the probability that z is the hidden image that gave rise to what we observed, i.e. y, is proportional to the likelihood that this z fits the data y and the probability that this bunch of N intensity values, i.e. the vector z, actually looks like a valid image. Note that the latter is desperately needed in underdetermined systems, since there are infinitely many vectors z that fit the data.
298
S.A. Pitchay
23.2.4 MAP Inference Through Optimisation To obtain the most probable estimate of z that conforms to our model and data, we need to maximise (23.4) as a function of z. Observe that, the denominator, Pr(y) does not depend on z. Hence, the maximum value of the fraction (23.4) occurs for exactly the same z for which the maximum of the numerator does. That is, the most probable estimate is given by: ^z ¼ arg max z
PrðyjzÞPrðzÞ PrðyÞ
¼ arg max PrðyjzÞPrðzÞ z
(23.5)
(23.6)
Further, this maximisation is also equivalent to maximising the logarithm in the right hand side, since the logarithm is a monotonic increasing function. We can also turn the maximisation into minimisation, by flipping the signs, as in the following equivalent rewriting: ^z ¼ arg min f log½PrðyjzÞ log½PrðzÞg z
(23.7)
In words, the most probable high resolution image is the one for which the negative log of the joint probability model takes its minimum value. Thus, our problem is now solvable by performing this minimisation. The expression to be minimised, i.e. the negative log of the joint probability model may be interpreted as an error objective, and shall be denoted as: ObjðzÞ ¼ log½PrðyjzÞ log½PrðzÞ
(23.8)
Observe that, to find the most probable value of the variable of interest, i.e. z, there is a direct correspondence between having devised a probabilistic model, and having devised an error-objective. Solutions that are assigned high probability of occurrence by the model, are exactly the ones that achieve low error by the error-objective. The most probable estimate is the ^z that has highest probability in the model. Equivalently the one that achieves the lowest error. Since our model has had two factors (the likelihood or observation model, and the image prior), consequently our error-objective also has two terms: the misfit to observed data, and penalty for violating the smoothness and/or other characteristics encoded in the prior. By plugging in the functional forms for the observation model and for the various possible priors into (23.8), we now give the specific form of this objective function
23
Non‐linear Image Recovery from a Single Frame Super Resolution. . .
299
below, so the interpretation of the individual error terms is more evident. We will make use of the following notation, taking the log of (23.3): lðzÞ :¼ log PrðyjzÞ þ const: ¼
23.3
(23.9)
1 ðy WzÞT ðy WzÞ 2s2
(23.10)
Prior Image Model: Markov Random Fields
The main characteristic of any natural image is a local smoothness. That is, the intensities of neighbouring pixels tend to be very similar. A MRF is a joint distribution over all the pixels on the image that captures spatial dependencies of pixel intensities. A first-order MRF assumes that, for any pixel, its intensity depends on the intensities of its closest cardinal neighbours but does not depend on any other pixel of the image. Here we will adopt the first order MRF that conditions each pixel of intensity on its four cardinal neighbours in the following way. For any 1 pixel zi we define: Prðzi jzi Þ ¼ Prðzi jz4neighbðiÞ Þ 0 1 ¼ Pr @zi 4
X
(23.11) 1 zj A
(23.12)
j24neighbðiÞ
where the notation zi means all the pixels excluding the i-th, and the set of four cardinal neighbours of zi was denoted as 4neighb(i). This is a univariate probability distribution. Consequently, for the whole image of N pixels, the MRF represents the joint probability over all the pixels on the image – a multivariate probability distribution. PrðzÞ /
N Y
Prðzi jz4neighbðiÞ Þ
(23.13)
i¼1
1X Pr zi z ¼ j24neighbðiÞ j 4 i¼1 N Y
(23.14)
The notation ’/’ means proportional to, i.e. there is a division by a constant that makes the probability density integrate to one. This constant may depend on various parameters of the actual instantiation of the building block probability densities, but it does not depend on z. Since in this work we only need to estimate z, therefore we
300
S.A. Pitchay
can ignore the expression of the normalising constant throughout. This form of MRF has been previously employed with success in e.g. Hardie and Barnard (1997) and He and Kondi (2003). Alternatives include the so-called total variation model, employed in e.g. Pickup et al. (2007), which is based on image gradients, also quite simple. In He and Kondi (2004), an experimental comparison of these two alternatives suggests these have comparable performance, the former being slightly superior though. The simplicity of (23.14) is also intuitively appealing. One can think of the difference between a pixel intensity and the average intensity of its neighbours, i.e. P zi 14 j24neighbðiÞ zj , as a feature. Considering that we want to encode the general smoothness property of images, it is easy to see that this feature is very useful: Whenever this difference is small in absolute value, we have a smooth neighbourhood. Whenever it is large in absolute value, we have a discontinuity. Hence, to express smoothness, we just need to instantiate the probability distribution P over this feature, i.e. the uni-variate densities in the product (23.14), Prðzi 14 j24neighbðiÞ zj Þ, with symmetricdensities around zero, which give high probability to small values. The Gaussian is a good example. In the same time, to allow for a few discontinuities, we need to use heavy tail densities, such as the Huber or the Pearson type VII density. To simplify notation and it is conveniently to create the symmetric N N matrix D to encode the above neighbourhood structure, with entries: 8 > <1 d ij ¼ 1=4 > : 0
if i ¼ j; if i and j are neighbours; otherwise:
Then we may write the i-th feature in a vector form, with the aid of the i-th row of this matrix (denoted Di) as the following: zi
1 4
X j24neighbðiÞ
zj ¼
N X
dij zj
(23.15)
j¼1
¼ Di z
(23.16)
Again, this is the i-th neighbourhood feature of the image, and there are i ¼ 1, . . . , N such features on an N-pixel image. The studies of data visualisation of the neighbour-hood features (Diz) from several natural images are presented in a histogram. We now turn to instantiate the functional form of the probability densities that describe the shape of the likely values of these features. Figure 23.1 shows a few examples of observed histograms of these features, from natural images. The probability densities that we employ in our image priors should ideally have similar shapes.
23
Non‐linear Image Recovery from a Single Frame Super Resolution. . .
4500
cameraman
ladybug
panda bear
3000
chamomile flower
2500
4000
2500
2000
3500
2000
3000
2000
1000
1500
1000
1500 1000
count
2000
count
1500
2500
count
count
301
500
1500 1000
500
500
500 0
−0.2
0
0.2
0.4
0
−0.1
0
0.1
0
0.2
Diz
Diz
−0.2
0
0.2 0.4 0.6
Diz
0
−0.2
0
0.2
Diz
Fig. 23.1 Examples of histograms of the distribution of neighbourhood features Diz, i ¼ 1, , N from natural images
23.3.1 Gaussian-MRF The Gaussian MRF is the most widely used image prior density. l is the variance parameter and it has the following form: PrðzÞ /
N Y
exp
i¼1
( ¼ exp
1 ðDi zÞ2 2l
N 1 X ðDi zÞ2 2l i¼1
(23.17) ) (23.18)
23.3.2 Huber-MRF The Huber density is defined with the aid of the Huber function. It takes a threshold parameter d, specifying the value at which it diverts from being quadratic to being linear. A generic variable u in the definition of this function will be instantiated later as a neighbourhood-feature Diz within the image prior use. ( HðujdÞ ¼
u2 ;
if juj
2djuj d2 ;
otherwise:
(23.19)
The Huber-MRF prior is then defined in (23.21) where l is similar to a variance parameter. PrðzÞ /
N Y i¼1
exp
1 HðDi zjdÞ 2l
(23.20)
302
S.A. Pitchay
( ¼ exp
23.4
) N 1 X HðDi zjdÞ 2l i¼1
(23.21)
Pearson Type VII-MRF
23.4.1 The Univariate Pearson Type VII-MRF The Pearson-MRF made of univariate building blocks: A zero mean univariate Pearson prior, is defined as: PrðzÞ /
N n Y
ðDi zÞ2 þ l
oð1þn 2 Þ
(23.22)
i¼1
where n and l control the shape of the distribution.
23.4.2 The Multivariate Pearson Type VII-MRF A zero mean multivariate Pearson-MRF density in a generic N-dimensional vector of Diz, has the following form: ( PrðzÞ /
N X
)ðnþN 2 Þ 2
ðDi zÞ þ l
(23.23)
i¼1
23.4.3 Discussion on the Two Versions of Pearson-MRF The version devised in Sect. 23.4.1 may be regarded as having independent Pearson-priors on each neighbourhood-feature. Of course, we ought to point out that the neighbourhood features are not independent in reality. However, since each pixel only depends on four others, it may be a reasonable approximation. The version gave in Sect. 23.4.2, in turn, does not allow such independence interpretation. Conversely, this can has the advantage that the spatial dependencies are not broken up, but more reliably accounted for. However, on the downside, the heavy tail behaviour is more advantageous to have on the pixel level, i.e., on the distribution of neighbourhood features. Indeed, it is the distribution of neighbourhood features the one in which the edges from the image creates outliers.
23
Non‐linear Image Recovery from a Single Frame Super Resolution. . .
303
In turn, the multivariate Pearson-MRF is a density on images. Hence, its heavy tail behaviour would be well suited to account for outlying or atypical images. Including both of these versions in our comparison will therefore uncover to us which of these pros or cons are more important for recovering quality high resolution images.
23.5
Experiments
We present two set of a single frame image super resolution experiments illustrating the performance of the hyper-parameters for testing the Pearson prior. We compare the state of the art image priors such as Gaussian and Huber. The LR image is blurred by the unifrom blur matrix of size 3 3, down-sampled by factor 4 and contaminated by standard deviation of Gaussian noise of 0.001, 0.01, 0.05 and 0.1. All images are in size [100 100] and the pixel intensities are scaled to interval [ 0. 5, 0.5]. The initial guess is initialized with Gaussian-MRF with s2 / l set to 1 and was used as a starting point for the recovery algorithm in previous work (Pitchay 2010). In this paper, we present the manual selection and address the issue of parameter selection in Pitchay (2010) by estimating n and l. For this automated estimation, we initialised with a product of the inverse transformation matrix and the low resolution. We employed a conjugate gradient type method,1 which requires the gradient vector of the objectives.
23.5.1 Hyper-Parameters Selection of Pearson-MRF The performance of the image recovery of high resolution is depending on how good selection value of hyper-parameters in image prior. Bad estimation can lead to produce a bad result. Since we are assessing the performance of both parameters, the recovery algorithm is assuming knowing the true noise variance s2. From the observation using the constructed blur and down-sampling matrix W, we found practical range of l and n. Figure 23.2 shows the performance variation on varying several l and n for four level of noise. Too small l (0.01) and n reduces the effect of prior and the solution approaching the Maximum Likelihood(ML). Whilst too big l (10,000) will blur the edges. The overall performance of the recovered image is depending strongly on the selection of l. We can conclude that the n can be fixed into a practicable range
1
http://www.kyb.tuebingen.mpg.de/bs/people/carl/code/minimize
304
S.A. Pitchay MSE vary ν with several fixed λ and (σ=0.005)
10−2.1
MSE vary ν with several fixed λ and (σ=0.01)
10
−2.3
λ=0.001 λ=0.01 λ=0.1 λ=1 λ=10 λ=100 λ=1000 λ=10000 λ=100000
MSE
MSE
10−2.2
λ=0.001 λ=0.01 λ=0.1 λ=1 λ=10 λ=100 λ=1000 λ=10000 λ=100000
10−2.3
10−2.4 −2.4
10
−15
10
−2.2
10
10
−10
ν
−15
−5
10
10
10
−10
ν
MSE vary ν with several fixed λ and (σ=0.05) λ=0.001 λ=0.01 λ=0.1 λ=1 λ=10 λ=100 λ=1000 λ=10000 λ=100000
10
−5
MSE vary ν with several fixed λ and (σ=0.1) 10
λ=0.001 λ=0.01 λ=0.1 λ=1 λ=10 λ=100 λ=1000 λ=10000 λ=100000
−2.2
MSE
MSE
−2.3
10
10
−2.3
−2.4
10
−15
10
10
−10
ν
−5
10
−15
10
10
−10
ν
10
−5
Fig. 23.2 Fixed n varying several fixed l on cameraman image of size [100 100]
(i.e.: 1–10) so that the iteration could terminate earlier and the l is found best from 0.1 to 100.
23.5.2 Results To asses the goodness of the proposed method, Pearson-MRF estimation results are compared with image enhancement state of the art methods in Hardie and Barnard (1997), He and Kondi (2003), and Pickup et al. (2007) using the qualitative measurement, mean square error (MSE). Our proposed algorithm for parameter estimation in Pitchay (2011) illustrates the performance result over fivefolds cross validation. All the competing image priors used this automated estimation and the
23
Non‐linear Image Recovery from a Single Frame Super Resolution. . .
Classical W: [2601,10000]
x 10−3
u−PearsonVII MRF m−PearsonVII MRF Huber MRF Gaussian MRF
4.6 4.5
MSE over 5 folds cross validation
305
4.4 4.3 4.2 4.1 4
3.9 3.8 3.7 3.6
0.01
0.02
0.03
0.04
0.05
σ
0.06
0.07
0.08
0.09
0.1
Fig. 23.3 Comparative MSE performance for under-determined system using cameraman image varying four level of noise. The low resolution suffered from the blurred, down-sampled and contaminated with some additive noise. The best value for hyper-parameters for every image prior found using fivefolds cross validation technique. The error bars are over ten independent trials. Pearson prior can achieve state-of-the-art performance and give a competitive solution to other priors across the four levels of noise
comparison is to find out how good is the Pearson prior when the parameter estimation is no longer choose by the best manual selection as presented work in Pitchay (2010). The competing image priors are: Gaussian-MRF, a multivariate Pearson type VII based MRF and the Huber-MRF. These results are presented in Fig. 23.3 and we can see that the univariate Pearson type VII based MRF can achieve state-of-the-art performance and give a competitive solution to Huber-MRF across the four levels of noise. Finally we also illustrated two set of image recovery in Fig. 23.4.
306
S.A. Pitchay
Fig. 23.4 Example image recovery of ‘cameraman’ (10,000 pixels) from blurred and downsampled to 2,601 pixels and additive noise with s2 ¼ 1e 3 using univariate Pearson type VII based MRF
23.6
Conclusion
In this paper we formulated two versions of Pearson-MRF image priors, and conducted a comparative experimental study among these and state of the art methods of image prior from a single noisy version of low resolution image. We demonstrate that our proposed prior, univariate Pearson Type VII-MRF is likewise superior with Huber for all level of noise. The recovered image is always consistent although it has several local optima and we asses two different images. Our motivation for Pearson-MRF prior has been the heavy tail property of the Pearson type VII-distribution, which indeed seems to be a good way of preserving the edges too while imposing smoothness. We tested this in under-determined systems, using the optimal value under various natural images. Future work is aimed towards
23
Non‐linear Image Recovery from a Single Frame Super Resolution. . .
307
recovering from multiple frames and working with multiple scenes for underdetermined system and over-determined system as well. Acknowledgment The author would like to express her gratitude to School of Computer Science, University of Birmingham, United Kingdom and Universiti Sains Islam Malaysia (USIM) for the support and facilities provided. A warm thankful to the Ministry of Higher Education of Malaysia for sponsor the author studies.
References Hardie RC, Barnard KJ (1997) Joint MAP registration and high-resolution image estimation using a sequence of undersampled images. IEEE Trans Image Process 6(12):621–633 He H, Kondi LP (2003) MAP based resolution enhancement of video sequences using a Huber-Markov random field image prior model. In: IEEE conference of image processing, Barcelona, Spain, pp 933–936 He H, Kondi LP (2004) Choice of threshold of the Huber-Markov prior in MAP based video resolution enhancement. In: IEEE electrical and computer engineering Canadian conference, Ontario, Canada, 2004 Ka´ban A, Pitchay SA (2010) Single-frame image super-resolution using a pearson type VII MRF. In: Proceedings of the IEEE international workshop on machine learning for signal processing (MLSP 2010), Kittil€a, Finland Nagahara Y (1996) Non-gaussian distribution for stock returns and related stochastic differential equation. Asia-Pac Financ Markets 3(2):121–149 Pickup LC, Capel DP, Roberts SJ, Zissermann A (2007) Bayesian methods for image superresolution. Comput J 52(1):101–113 Pitchay SA (2010) Non-linear restoration from a single frame super resolution using pearson type VII density. In: Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science WCECS 2010, 20–22 Oct, 2010, San Francisco, pp 426–431 Pitchay SA (2011) Single frame image recovery for super resolution with parameter estimation of pearson type VII density. IAENG Int J Comput Sci 38(1):57–65. http://www.iaeng.org/IJCS/ issues_v38/issue_1/IJCS38_1_07.pdf (Invited paper for the publication in the special issue) Sun J, Kaba´n A, Garibaldi J (2010) Robust mixture modeling using the pearson type VII distribution. In: Proceedings of the international joint conference on neural network, 2010. http://www. cs.bham.ac.uk/~axk/PearsonTypeVIIMixture.pdf
Chapter 24
Research on the Building Method of Domain Lexicon Combining Association Rules and Improved TF*IDF Shouning Qu and Simon Xu
24.1
Introduction
Text is one of the most important media for information recording and spreading. Most Internet information appears as text form as well. Text mining studies how to find useful information from considerable text data, It is also a process of extracting valuable knowledge that spreads in the document and is effective, novel, useful, understood using above knowledge to organize information (Feldman and Sanger 2009; Kodratoff 1999). The word segmentation and extraction is the foundation for building text-oriented domain lexicon and text mining. A good domain lexicon can greatly improve the efficiency and accuracy of text mining. The building of domain lexicon addresses the problems that common lexicon efficiency decline with the growing of lexicon and technical terms can not be extracted. So it can improve the efficiency and accuracy of the topic words extraction. The building of domain lexicon is mainly based on the topic words extraction. The domain topic lexicon is made up of text topic words extracted from considerable domain documents. TF*IDF algorithm (Salton and Buckley 1988) is a widely used topic words extraction method based on statistic the frequency of the word. Its advantage is simple, efficient, easy to implement and high recall ratio. But TF*IDF algorithm is easy to affected by the text length, feature length and feature position, It does cbunot consider the weights of feature items affected by feature items that distribute unevenly and incompletely in classes or between classes (Xiong et al. 2008);
S. Qu (*) Information Network Center, University of Jinan, Jinan 250022, China e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_24, # Springer Science+Business Media, LLC 2011
309
310
S. Qu and S. Xu
recognizes poorly on compound words and unknown words (Huang et al. 2010; DU and Xiong 2010). Therefore, we present a building method of domain lexicon combining association rules and improved TF*IDF to solve the above problems.
24.2
The Building of Domain Lexicon
The building of domain lexicon includes several steps as follows. 1. Select background document collections from some large-scale document collections of several given domains (Juanzi et al. 2007) and fill them in a data sheet, select foreground document collections from part of document collections of a certain domain and fill them in another data sheet. 2. Segment foreground document collections and background document collections by using common lexicon and forward maximum matching algorithm (Hu 2008, Liu 2009). 3. Extract features items and express as vectors. 4. Recognize compound words based on segmented adjacent feature items and association rules, and fill recognized compound words in common lexicon. 5. Statistic the foreground and background words based on feature items appearing in foreground documents. 6. Calculate the weight of feature items appearing in foreground document collections based on the last step. 7. Set the threshold value, select representative feature items for foreground domain lexicon. The built domain lexicon may be meliorated by changing training parameters. The process is shown in Fig. 24.1.
24.2.1 Word Segmentation Chinese word segmentation (Liu 2010; Sun 2004; Liu et al. 2010) is the basis for Chinese information processing and the necessary step for building domain lexicon. Different from alphabetic writing, Chinese processing must include the word segmentation procedure because of its intrinsic characteristics. Because building a domain lexicon needs training a large number of document collections, so we should choose a word segmentation method of easy implement, simple, efficient method and easy maintenance. Because of its simple, fast and efficient, dictionary-based positive maximum matching algorithm is convenient for segmenting large-scale document collections, maintaining lexicon pertinently and building domain lexicon quickly. As for the weaknesses of positive maximum matching algorithm, such as too mechanical, and too simple, we can overcome them through compound words and unknown word recognition technology (Su et al. 2004).
24
Research on the Building Method of Domain Lexicon. . .
311
Fig. 24.1 Build domain lexicon
Based on the above analysis, we build domain lexicon using forward maximum matching algorithm for Chinese word segmentation. The basic idea of forward maximum matching algorithm is that: For a given segment Chinese word, we calculate the length of this text firstly. In this length range, we select n words from left to right, where n is determined by the longest word of the lexicon. Then we match it with words in lexicon. If the same words can be found, we segment the word from the text. Else, we reduce a word from it and match again. If there is only one word lastly and no word can match, it indicates that the word is not in the table. Then, beginning with the second word, choose n words and mach over again. If a given segment Chinese word can be matched, we begin new finding with behind word of it. Word segmentation results will be saved to a database or in memory for text mining purposes.
24.2.2 Extraction of Feature Items and Express as Vectors Feature items of foreground documents usually appear several times. In order to statistic the frequency of foreground and background words and calculate the weight
312
S. Qu and S. Xu
of each feature items, we extract features items appearing in foreground documents in advance. In this paper, we extract feature items in virtue of the vector space model (Salton et al. 1975) (Vector Space Model, VSM) which is used widely now. In the VSM (Sebastiani 2002) model, document collections are seen as a vector space which consists of a set of orthogonal feature items. Each document is seen as one of the standardized vector, V (d) ¼ (t1, w1 (d); . . .; ti, wi(d); . . . tn, wn(d)), in which ti is features items, wi(d) is the weight of ti for the document d. The features items of the vector space are equivalent to the segmented features items, may appear again and again. Accordingly, the weight of features items is calculated by the selected features items.
24.2.3 Identification of Compound Words Although dictionary-based matching algorithm is simple, fast and efficient, the segmentation process is mechanical and can not match those long compound words that do not exist in dictionary. This makes some of the useful compound words segmented into a few simple words. While these compound words are of significance for understanding the text. So, it brings meaning alienation. Such as “NATO”, it can be segmented into “North”, “Western”, “Covenant,” and “organization”, resulting in alienation of meaning. So, this weakness restricts the extraction of feature items and the building of domain lexicon. In this paper, we use the compound word recognition method based on association rules (Agrawal and Srikant 1994), treat the foreground document collections as a transaction database. The feature items segmented are seen as a set of transaction items. The text transactions can be expressed as: (Document ID, t1, t2, t3, . . ., tn). Thus, the feature items association analysis is converted to association mining of transaction items in transaction database. At the same time, the issue comparability analysis is converted to association rule mining of transaction items in transaction database. Comparing to conventional association rule mining algorithm, the keyword-based text association analysis, involves two major steps (Holt and Chung 2002): (1) mine frequent appearing keywords, namely frequency item set. (2) According to the frequency item set, generate association rules between keywords. According to the association rule mining algorithm, if ti and tj are adjacent with frequent co-occurrence, we can conclude that ti and tj form a compound word association rule. Similarly, if several adjacent feature items is of frequent cooccurrence, and confidence and minimum support exceed a certain threshold, the more complex association rules can be mined out, and the longer compound words can be identified. Further, the association rules will be filled in a form named association rule table as the form of (no, front, rear, S, C). Association rules table facilitate the identification of compound words, the extraction of more useful compound words and the building of better domain lexicon. (No, front, rear, S, C) means (the serial number of generated association rules, the front association rules, the rear association rules, support grade, confidence).
24
Research on the Building Method of Domain Lexicon. . .
313
Fig. 24.2 Basic steps of statistic word frequency and its application
24.2.4 Statistic the Frequency of Feature Items Word frequency statistics (Dai 2008) is a kind of lexical analysis method, is the basis of text word segmentation and weight calculation of feature items. It plays an important role in natural language processing areas, for example, information retrieval, text proofreading, text classification, clustering, and so on. It describes vocabulary distributing through the statistical word frequency of some document collections and analytical results. The major steps are shown in Fig. 24.2. This article mainly involves two kinds of word frequency statistics, namely foreground word frequency statistics and background word frequency statistics. Foreground word frequency statistics aims at counting the appeared times of feature items in foreground documents. Background word frequency statistics counts the appeared times of feature items in background documents.
24.3
Weight Calculation of Feature Items Based on Improved TF*IDF
Each feature item has a weight value which can be used to analyze domain lexicon. The weight value means the importance of the feature item for the text, which determines if the feature item should be added to the domain lexicon. This section presents an improved TF*IDF algorithm, and calculates weight using the improved TF*IDF.
314
S. Qu and S. Xu
The familiar weight calculation of feature items include TF algorithm (Dai 2008), IDF algorithm (Auen 1991), TF*IDF algorithm, and so on. TF*IDF algorithm inosculates advantages of TF algorithm and IDF algorithm, makes up for shortcomings each other. TF*IDF algorithm Combines TF and IDF, It can not only show contents of classifications, but can be distinguished from other classes. The more a feature item appears in a document or some kind documents, namely the higher the foreground word frequency of the feature items means the stronger capability of reflecting contents for the feature item. The wider a feature item appears in documents, namely a higher background word frequency of the feature item, means the lower capability of reflecting contents for the feature item. In VSM, the traditional TF*IDF weight calculation formula is: wi ðdj Þ ¼ tfij idfi ¼ tfij log2 ðN=Ni þ 0:01Þ
(24.1)
In the formula, tfij is the frequency of the i document feature item appearing in document dj, N is the frequency of a document collection, Ni is the frequency of the i text feature item appearing in document collection. TF*IDF can effectively weaken high frequency stop-words in most documents. Empty words, such as “的”, “呢”, “了”, and so on, is easy to be filtered out. This reduces the step of removing stop words. It can not only show contents of classifications, but can be distinguished from other classes. Traditional TF*IDF weight calculation method is vulnerable to the length of the document, length of feature items and the location of feature items. We put forward a improved TF*IDF to address these problems. In 1988, Salton put forward that for those feature items with the same frequency, it is of more importance appearing in short documents than in long documents. So, we standardize the length of documents; convert the foreground document and background document into the same length of 100 characters. Background word frequency and foreground word frequency can be converted using Eq. 24.2. stfij ¼
tfij 100 dj:length
(24.2)
In the formula, stfij is standardize foreground document frequency of the i feature item document. tfij is the frequency of the i feature item appearing in document j. dj.length is the length of the document j. The feature item length can be seen as a weight factor. The statistical result of automatic word segmentations shows that: The single word appeared in the document is usually of the largest amount, but of less information. The multi-word is usually of the least amount, but of more information and of more importance. Usually, the longer feature term expresses some special “concept” rightly. Such as “U.S. Open” specifically refers to “sports”. So a higher weight should be endowed to a multi-word. In general, feature items appeared in title, abstract or the first few lines are of more importance than that in the body, and of more topic information. So, these feature items be endowed with higher weights.
24
Research on the Building Method of Domain Lexicon. . .
315
In VSM, the formula (Eq. 24.3) is commonly used for the traditional weight calculating of text feature items in TF*IDF. wi ðdj Þ ¼ tfij log2 ðN=Ni þ 0:01Þ
(24.3)
In the formula, tfij is the frequency of the i document feature item appearing in document dj, N is the frequency of a document collection, Ni is the frequency of the i text feature item appearing in document collection. Considering the weight calculating effected by the text length, the feature item length and the location, we standardize foreground word frequency and make appropriate weighted processing. Finally, the improved weight calculating formula (Eq. 24.4) is obtained by normalization processing. stfij ðwpij þ wlij Þlog2 ðN=Ni þ 0:01Þ wi ðdj Þ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n P ðstfkj ðwpkj þ wlkj ÞÞ2 ½log2 ðN=Nk þ 0:01Þ2
(24.4)
k¼1
In the formula, stfij(stfkj) is standard foreground document frequency of the nth feature item of document j, wlij(wlkj) and wpij(wpkj) represent the weighted weight of the nth feature item length and its location of document j respectively. The value of wlij and wpij can take an experience value or a appropriate value by repeated training document collections. Based on the document standardized and weighted processing approach, in this paper, we solve the feature item weight calculation problem of which the weight value is effected by the text feature item length and its position. We improve TF*IDF algorithm as well. The improved TF*IDF algorithm can calculate the feature item weight more accurately and effectively. This facilitates the building of high-quality domain lexicons.
24.3.1 Selecting Feature Item by Threshold and Building Domain Lexicon Based on the weight of feature items, we analyze and sort feature items for further validating and evaluating. We select representative feature items by setting a certain threshold, and add them to domain lexicons. The main purpose of building domain lexicon is to address the efficiency descending of common lexicons with the size increasing. Therefore, the built domain lexicons should comprise of words representing its domain in order to mining text application more rapidly and effectively, such as text classification, topic extraction, and so on. Based on a certainty number of vocabulary, the selecting threshold may be set higher as soon as possible for ensuring the correct
316
S. Qu and S. Xu
ratio of extracted domain topic. After repeated attempts, we think that setting the weight threshold as 0.16 is appropriate. You can set other appropriate threshold according to actual situation as well. At the same time, we can set the selecting rules according to actual situation, such as eliminating out some single words of high weight. When the selected feature items can roughly represent the topic of the training document, they may be added to the domain lexicon.
24.4
The Data Design of Domain Lexicon
24.4.1 Data Dictionary In this paper, Microsoft Office Access 2003 database is used to domain lexicon, involves the following five tables. Common information table (words), namely the common lexicon, is used to store the selected feature items from foreground and background document collections. Its structure is shown as in Table 24.1. Background document collections information table (backtext) is used to store or access background document collections. Its structure is shown as in Table 24.2. Foreground document collections information table (backtext) is used to store or access foreground document collections. Its structure is shown as in Table 24.3. Weight information table (weight) is used to store the weight information of analyzed and sorted feature items. Its structure is shown as in Table 24.4.
Table 24.1 Structure of words Listing Data type Words_ID Automatic ID Words_word Text
Field size Long integer 50
Primary key Yes No
Function description Lexicon item No. Storage lexicon item
Table 24.2 Structure of backtext Listing Data type Backtext_ID Automatic ID Backtext_sentence Text
Field size Long integer 300
Primary key Yes No
Function description Lexicon item No. Storage lexicon item
Table 24.3 Structure of foretext Listing Data type Foretext_ID Automatic ID Foretext_sentence Text
Field size Long integer 300
Primary key Yes No
Function description Lexicon item No. Storage lexicon item
24
Research on the Building Method of Domain Lexicon. . .
Table 24.4 Structure of weight Listing Data type Weight_ID Number Weight_word Text Weight_weight Number
Field size Long integer 255 Double precision
Primary key Yes No No
Table 24.5 Structure of domain_words Data Listing type Field size
Primary key
Domain_words_ID Domain_words_word Average_weight
Number Text Number
Yes No No
Usefrequency
Number
Long integer 255 Double precision Long integer
Addfrequency
Number
Long integer
No
No
317
Function description Lexicon item No. Storage lexicon item Stored weight value
Function description Lexicon item No. Storage lexicon item Storage average weight of feature items Record use frequency of feature items Record add times of feature items
Domain lexicon information table (domain_words) is used to store the feature items extracted from the document segmentation, word frequency, weight analysis, weight sorting and selecting. In the table, average_weight, usefrequency and addfrequency represent the change of average weight, the using frequency and the add frequency of feature items of domain lexicon respectively. They are set to optimize and the training and using processes. If their values are lower, we can remove the corresponding feature items from the domain lexicon Table 24.5.
24.4.2 Data Definition Specification In order to improve data access efficiency, we use the uniform naming specification which can effectively eliminate the redundant data in the database to meet varies user applications and requirements. 1. Data table name naming specification: (1) Choose meaningful words in English or Pinyin, mixed case letters, such as the word information table, words. (2) If it is needed to express several words or Hanyu Pinyin, we use “_” as the separator between the words or Pinyin, such as the domain lexicon table domain_words. 2. Data table field naming specification Naming rules: The basic meaning adds field information. All field names spell in lower case letters or lower case Pinyin. Such as the id field of words table: words_id.
318
24.5
S. Qu and S. Xu
Experimental Results and Analysis
Based on the planed functions of the domain lexicon and the techniques of text mining, we build the domain lexicon under the Windows environment. The OO ideas and NetBeans visualization approaches are combined with our system. Based on the classed document collections from Chinese Department of Fudan University, we select a document collections of A domain as the background document collections, select another document collections of B(different from A) domain as the foreground document collections. The domain lexicon is built by extracting the domain topic words of the foreground document. Let’s take the building of military domain lexicon as an example to analysis and explain. We select 100 pieces of educational documents as the background document, extract domain topic words from a military document named “中国陆军航空兵”, gain four military words, namely “直升机”, “陆航”, “部队” and “集团军”. This reflects the document topic in some degree. Because the “陆军航空兵” usually appears as the name “陆航”, so “陆航” is extracted. Thus, we can add “陆航” to the domain lexicon artificially. In the follow study, we may further improve the domain lexicon by setting thesaurus table. The building of the domain lexicon is implemented based on extracting topic words of domain documents. Its quality may be evaluated though the accuracy shown in Eq. 24.5 domain words precision ¼
topicNO: wordNO:
(24.5)
In the formula, domain_words_precision represents the precision of the domain lexicon. Topic_NO. represents the amount of the representative lemmas of the domain lexicon. Word_NO. represents the sum lemmas of the domain lexicon. Through training 76 pieces of military documents from Chinese Department of Fudan University, we gain 321 pieces of lemmas, in which 211 pieces is no repeated, and 13–16 pieces are of no obvious representation significance. The lemma accuracy rate of the military domain lexicon reaches 92.41–93.83%. The trained military domain lexicon is shown as in Fig. 24.3. The trained military domain lexicon is shown as in Fig. 24.3. As can be seen from Fig. 24.3, although “held” and “reports” are of greater weight and bigger adding times, it is eisegetical for taking them as a military vocabulary. This is mainly due to that the document is interdisciplinary and the selected document classification is incomplete. Thus, although the extracted feature item is the document’s topics words, not all these topic words are correspondingly belong to the document’s domain. In the following using process, based on the average weight of domain lemmas, we may optimize the domain lexicon by adding the frequency and the using frequency. If only background documents reach to a certain account, a better extraction accuracy of feature items can be obtained. The extracted lemma account and
24
Research on the Building Method of Domain Lexicon. . .
319
Fig. 24.3 Trained military lexicon
Table 24.6 Feature items and there accuracy extracted from different background documents Background documents numbers 100 300 500 Extraction of feature items 321 320 323 Characteristics of non-duplicate entries 211 208 206 Not representative of the term 13–16 13–16 13–16 Accuracy 0.924 ~ 0.938 0.923 ~ 0.937 0.922 ~ 0.936
domain lemma accuracy are shown in Table 24.6. These data are obtained by disposal 76 pieces military domain under given 100, 300, 500 pieces background documents. As is shown in Table 24.6, the accuracies of 100, 300, 500 pieces background documents are of little difference. The accuracies of 300 and 500 pieces background documents are lower than that of 100 pieces. When the account of background documents are selected as 300 and 500 pieces, the program runs awfully slowly, only very few feature items appear and disappear, weight values are of some subtle changes. So, 100 pieces of representative background documents are sufficient to ensuring the accuracy and efficiency of the domain lexicon, rather than more is better. For verifying whether the word segmentation and topic extraction efficiency can be improved by using the domain lexicon, we do experiments by using the domain lexicon as experiment lexicon. We arbitrarily select ten pieces military documents from document collections and 100 pieces military documents from other Internet
320
S. Qu and S. Xu
Fig. 24.4 Comparison of topic words extraction between common lexicon and domain lexicon based on the training document named “中国陆军航空兵”. (a) Topic words extraction base on common lexicon. (b) Topic words extraction based on military lexicon
Fig. 24.5 Topic words extraction result comparison of common lexicon and military lexicon in open test document. (a) Topic words extraction of base on common lexicon. (b) Topic words extraction of base on military lexicon
documents for testing. As the results, some test results of a training sample and the open test samples are shown in Figs. 24.4 and 24.5. Figure 24.4 is a comparison table of topic words extraction between common lexicon and domain lexicon based on the training document named “中国陆军航空 兵”. Figure 24.4a is the result of topic words extraction for the document based on common lexicon. These extracted topic words reflect the document topic generally. The running time is 43 s on the local computer. Figure 24.4b is the result of topic words extraction for the document based on the domain lexicon. The running time is 14 s on the local computer. The efficiency and weight values are improved greatly, more convenient for distinguishing. Figure 24.5 is a comparison table of topic words extraction between common lexicon and domain lexicon based on the open test document named “中国舰队完美行动为航舰队问世铺垫”. Figure 24.5a is the result of topic words extraction for the document based on common lexicon. These extracted topic words reflect the document topic generally. The running time is 41 s on the local computer. Figure 24.5b is the result of topic words extraction for the document based on the domain lexicon. The running time is 16 s on the local computer. The running efficiency of the domain lexicon is improved greatly for the open test document. At the same time, more meaningful topic words can be extracted. Figure 24.5 is based on the open test document “China Fleet aircraft carrier fleet to come out perfect foreshadowing action” were on the field of general vocabulary and thesaurus for keywords were extracted. Figure 24.5a is the document for keywords based on a common lexicon extraction results, the process conditions in the plane of
24
Research on the Building Method of Domain Lexicon. . .
321
Table 24.7 Topic words Type of lexicon to extract extraction efficiency contrast topic words between common lexicon and One based on the common lexicon domain lexicon One based on the domain lexicon
Table 24.8 Comparison of the accuracy of the domain lexicon
Structure type of domain lexicon
Absolute running time under local host 39 ~ 45 s 13 ~ 17 s
The accuracy of the domain lexicon (more than 1,000 items)
Base on traditional TF * IDF 83.72% ~ 83.74% algorithm’s domain lexicon Base on pseudo feedback 86% ~ 88% model’s domain lexicon Base on association rules and 92% ~ 94% improved TF*IDF’s domain lexicon
absolute running time 41 s, the extraction of key words fundamental to the document should be the theme. Figure 24.5b is a thesaurus of the document that based on domain keywords extracted results, the process in the plane of the absolute running time of 16 s. For the open test text, the field of thesaurus is not only greatly improve operation efficiency, but also more accurate to extract more meaningful keywords. The extraction efficiency comparison about common lexicon and domain lexicon is shown in Table 24.7. These experiment results are obtained at local computer through testing a large number of documents repeatedly. As can be seen from Table 24.7, in the same conditions, the extraction time of topic words of domain lexicons is shorten greatly comparing with that of common lexicons. While the topic words extraction efficiency is improved greatly. Table 24.8 is a comparison table of domain lexicon accuracy for several familiar building methods. According to Table 24.8, the accuracy of the built domain lexicon based on association rules and improved TF*IDF is higher than that of traditional TF*IDF and pseudo feedback model (Huang et al. 2008). According to a large number of open text test, we find some problems. The words covering of the domain lexicon need to improve continue. The representation of individual words is not of strong inbeing. For some documents from distant domains or of ambiguous classification, the extracted topic words may be incomplete. In response to these problems, we need to select mass document collections of obvious characteristics and train sequentially.
24.6
Conclusion and Future Work
By training classed Chinese document collections, we build domain lexicons (Qu and Xu 2010). Using domain lexicons to extract topic words from given documents can greatly improve the efficiency and accuracy of words segmentation,
322
S. Qu and S. Xu
we can build a domain lexicon by selecting some given document collections, finding out the domain topic words compared to another document collections, recording the average weight change,frequency of adding and using . Above all, The proposed extraction algorithm fully considers the length of the text, feature item length, feature item position, and compound words recognition and so on, improves traditional TF*IDF algorithm, identifies compound words by using association rules. Therefore, our algorithm can obtain good extraction results, and can be applied to other aspects of text mining. Acknowledgment This work was supported in part by National Nature Science Fund under Grant 60573065 and State “863” Plan funded Project under Grant 2002AA4Z3240.
References Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. Proceedings of the 1994 international conference very large data bases (VLDB’94), Santiago, Chile, pp 487–499 Auen J (1991) Natural language understanding. Cummings Publishing Company, Benjamin Dai W (2008) Research on text classification and clustering based on genetic algorithms. Science Press, Beijing DU J, Xiong H (2010) Algorithm to recognize unknown Chinese words based on BBS corpus. Comp Eng Design 31(3):630–631 Feldman R, Sanger J (2009) The text mining handbook. Posts & Telecom Press, Beijing Holt JK, Chung SM (2002) Mining association rules using inverted hashing and pruning. Inf Process Lett 83:211–220 Hu X (2008) Application of maximum matching method in Chinese segmentation technology. J Anshan Normal Univ 10(2):42–45 Huang Y, Gong C, Xu H, Cheng X (2008) A domain dictionary generation algorithm based on Pseudo feedback model. Journal of Chinese Information Processing 22(1):111–115 Huang W, GAO B, Liu Y, Yang K (2010) Word combination based Chinese word segmentation methodology. Sci Technol Eng 10(1):85–89 Juanzi L, Qi’na F, Kuo Z (2007) Keyword extraction based on tf/idf for Chinese news document. Wuhan Univ J Natural Sci 12(5):917–921 Kodratoff Y (1999) Knowledge discovery in texts: a definition, and applications. Proceedings of the ISMIS’ 99, Warsaw Liu C, (2009) Research on Chinese segmentation method based on optimization maximum matching, Yanshan University Liu H (2010) Research on Chinese word segmentation techniques. Comp Dev Appl 23(3):1–3 Liu Y, Wang Z, Wang C (2010) Model of Chinese words segmentation and part-of-word tagging. Comput Eng 36(4):17–19 Salton G, Buckley B (1988) Term-weighting approaches in automatic text retrieval. Inform Process Manag 24(5):513–523 Salton G, Wang A, Yang CS (1975) A vector space model for automatic indexing. Comm ACM 18(11):613–620 Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
24
Research on the Building Method of Domain Lexicon. . .
323
Qu S, Xu-Simon (2010) Research on the building method of domain lexicon combining association rules and improved TF*IDF, Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2010 (WCECS 2010), vol 1. San Francisco, 20–22 Oct 2010, pp 473–479 Su F, Wang D, Dai G (2004) A rule-statistic model based on tag and an algorithm to recognize unknown words. Comp Eng Appl 15:43–45, 91 Sun B, (2004) Modern Chinese text word segmentation technology. Peking Institute of Computational Linguistics. http://icl.pku.edu.cn/bswen/nlp/report1-sementation.html Xiong Z, Li G, Xiaoli Chen C, Chen W (2008) Improvement and application to weighting terms based on text classification. Comp Eng Appl 44(5):187–189
Chapter 25
Combining Multiscale and Multi Directional Analysis for Edge Detection Using a Statistical Thresholding K. Padma Vasavi, E.V. Krishna Rao, M. Madhavi Latha, and N. Udaya Kumar
25.1
Introduction
An edge may be described as a meaningful discontinuity in the intensity level of the image (William 2002). Edge detection is the process of rightly labeling a pixel to be an edge pixel. It is considered as a very important low level image processing operation which is useful to perform various visual processing tasks like three dimensional reconstructions, stereo motion analysis, image segmentation and compression. The edge detection technique is by far used in a wide range of applications. In Sonar imaging, edge detection is used to identify the moving enemy targets under water. It is also used for various pattern recognition tasks like fingerprint recognition, face recognition, character recognition etc. Many edge detection techniques like the Sobel’s, Prewitt’s, Robert’s and Friechen local gradient masks are available in the image processing literature. Their major drawbacks are high sensitivity to noise and disability of discrimination of edges versus textures. The Canny edge detector not only detects the edges but also tries to connect neighboring edge points into a contour line. Though, it is considered to be fast, reliable, robust and generic its accuracy is not satisfactory (Aydin et al. 1996). However, the human beings do not analyze a scene at a single scale or resolution. They tend to do so starting from a coarser scale and over time they start seeing the finer details. Similarly, for a computer vision algorithm it is desirable that edges exist at both coarse and fine scales and the localization of these edges are decided at finer scale. That is why; S. Mallat in his work with S. Zhong introduced the concept of multi scale contours. In this model, every edge point of an image is characterized by a whole chain of scale – space plane. Turgut Aydin et al. considered the problem of directional and multi scale edge detection using m-based wavelets (Do et al. 2003). Wavelets are good at getting zero dimensional singularities, but two
K.P. Vasavi (*) Dept of ECE, Shri Vishnu Engg College for Women, Bhimavaram, India e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_25, # Springer Science+Business Media, LLC 2011
325
326
K.P. Vasavi et al.
dimensional piece wise smooth signals that resemble images possess one dimensional singularity that represent smooth regions separated by edges. Wavelets are good at isolating the discontinuity across the edge but cannot catch the smoothness along the edge. So, more powerful bases are needed in higher dimensions to solve this issue. There have been a number of approaches in providing finer decomposition. Some notable transforms include 2-D Gabor wavelets, the steerable pyramids, the directional filter bank, brushlets, curvelets and contourlets. There are several motivations for employing the directional filter banks. The directional representation implemented by the Directional Filter Bank (DFB) is useful for applications exploiting computational aspects of visual perception (Nguyen et al. 2007). While the DFB can produce directional information, it lacks multi scale feature property in contrary to wavelet transform. One way of achieving multi scale decomposition is to combine a Laplacian Pyramid with DFB (Chan et al. 2003). In comparison to the other compact representation techniques such as wavelet and sub band coding, the Laplacian Pyramid (LP) has the advantage that it provides greater freedom in designing the decimation and interpolation filters (Rath et al. 2006). Thresholding plays an important role in the edge detection of images. This is justified because errors at this point are propagated throughout the detection system. Therefore, to obtain a robust thresholding measure, is a major problem in image segmentation. Thresholding is of two types namely Global and local thresholding. In global thresholding, one threshold value is selected for the entire image, which is computed from the global information. However, global thresholding results in a poorer segmentation of the image when the illumination of the background is uneven. Therefore, there is a need for a local threshold value that changes dynamically over the image and this technique is called local thresholding (Burt et al. 1983). In this paper a novel edge detection scheme which uses a statistical thresholding based on the sub band decomposition of images with Laplacian Pyramid at an appropriate scale and a Directional Filter Bank to provide directional information is proposed. The rest of the paper is organized as follows: Sect. 25.2 deals with the Laplacian Pyramid Decomposition. Section 25.3 discusses the Directional Filter Banks. Section 25.4 provides the information of the statistical thresholding scheme. In Sect. 25.5 the details of the methodology are furnished. In Sect. 25.6 experimental procedure and the results obtained are presented. In Sect. 25.7 the conclusion of the paper is provided.
25.2
Laplacian Pyramid
A pyramid is a multiscale representation that is constructed with a recursive method that leads naturally to self-similarity (Do and Vetterli 2001). In the pyramidal scheme of multi resolution decomposition, every analysis operator that brings a signal xj from level j to the next coarser level j + 1 reduce information. This information can be stored in a detail signal at level j, which is the difference between xj and the
25
Combining Multiscale and Multi Directional Analysis for Edge Detection. . .
327
Fig. 25.1 Laplacian Pyramid Decomposition
approximation ^x obtained by applying the synthesis operator to xj+1. Laplacian Pyramid (LP) is a non-orthogonal pass- band pyramid structure which was proposed by Burt and Adelson (1983). The advantage of LP decomposition over the other multiscale decomposition techniques is that the image is expanded to 4/3 of the original size and that the same filter can be used for all pyramids. In each filter step, with the kernel before it is subtracted from the previous low pass image. The sequence of low pass images is called the “Gaussian Pyramid”, while the sequence of band pass images is called the Laplacian Pyramid. The Laplacian pyramiddecomposition is shown in Fig. 25.1. Here, 1-D signals are considered for the sake of convenience in the notation. Initially, the input signal x is passed through the decimation filter h to obtain the low pass filtered signal. Then it is down sampled to obtain the coarse signal c1. This coarse signal is up sampled and the filtered using an interpolation filter g. This produces a prediction signal p1. The first level of the detailed signal is given by the prediction error d1. The process is repeated on the coarse signal till the final resolution is reached. Considering an input signal of N samples, the coarse and detail signals can be derived as c ¼ Hx
(25.1)
d ¼ x GC ¼ ðIN GH) x
(25.2)
where, IN denotes identity matrix of order N and H and G denote the decimation and interpolation matrices which have the following structure 3 2 6 h(2) 6 6 6 4
h(1) h(2)
h(0) h(1)
7 7 h(0)7 7 5
(25.3)
328
K.P. Vasavi et al.
Fig. 25.2 Standard reconstruction structure for LP
2 6 g(0) 6 6 6 6 6 4
3 g(1)
g(2) g(0)
g(1)
g(2) g(0)
7 7 7 7 g(1) 7 7 5
(25.4)
Here, the superscript T denotes the matrix transpose operation. If the coarse signal has resolution K then the matrices H and G are of dimension KXN and NXK respectively. Given an LP representation, the standard reconstruction method builds the original signal simply by iteratively interpolating the coarse signal and adding the detail signals at each level up to the final resolution. The standard reconstruction method is shown in Fig. 25.2. Considering an LP with only one level of decomposition the original signal can be reconstructed as X ¼ Gc þ d
25.3
(25.5)
Directional Filter Banks
The motivation for choosing a Directional Filter Bank (DFB) in this paper has come from the research in visual perception which shows that cells having directional selectivity are found in the retinas and visual cortices of the entire major vertebrate classes. So, an edge detection scheme which uses DFB for directional selectivity is used in this paper. The DFB was first proposed by Bamburger and Smith. The Directional Filter Bank partitions the frequency plane in to a set of 2n wedge shaped pass band regions using n-levels of iterated tree structured filter banks as shown in Fig. 25.3 (Padma Vasavi et al. 2008). The multi directional analysis is achieved by using a tree structure implementation. The blocks in the binary decomposition tree are made up of two extensions of
25
Combining Multiscale and Multi Directional Analysis for Edge Detection. . .
329
Fig. 25.3 Direction decomposition using DFB
Fig. 25.4 Modulated QFB
QFB e jπn2
Fig. 25.5 Skewed QFB
e- jπn2
R
QFB
R
the Quincunx filter bank (QFB): the modulated QFB is shown in Fig. 25.4 and the " # skewed QFB as shown in Fig. 25.5. 1 1 , the Using a quincunx lattice, i.e., the down sampling matrix equals to 11 low pass and the band pass filters are given by H0 ðG0; G1 Þ¼
H1 ðZ0 ; Z1 Þ ¼
1 ðZ0 Þ2N þ b Z0 Z1 1 þ Z0 2
1 ðZ0 Þ2N1 þ Z1 H0 ðZ0 ; Z1 Þ 0 bðZ0 Z1 Þ þ b Z0 Z1
(25.6)
(25.7)
where, b (z0, z1) is a 1-D low pass filter. As the low pass information spreads in to multiple directional sub bands sparse image representation cannot be obtained using DFB. Therefore, a Laplacian pyramid is combined with the DFB so as not to leave any potential characteristic features in an image.
330
25.4
K.P. Vasavi et al.
Statistical Thresholding
In this section, initially the partial derivatives of the image along the horizontal and vertical and vertical directions are determined. Then, a non-maximal suppression algorithm interpolates the magnitude of gradients at hypothetical pixels that lie along the direction perpendicular to the edge direction at pixel (x, y) in a 3 x 3 neighborhood around it. If the magnitude of the gradient at (x, y) is not a maximum value among the interpolated magnitudes then it is declared as a non-edge point and is suppressed. In order to authenticate a pixel to be an edge pixel, the variations in the neighborhood of the pixel also need to be incorporated. Also, the choice of a threshold varies from image to image as the variations in the gray levels of the neighborhood of the pixel vary from image to image. Therefore, the gradient magnitudes should be standardized using statistic principles to identify a pixel to be an edge pixel. The variation in the gradient magnitude is determined using its variance – covariance matrix given by X
s ðx, yÞ ¼ 11 s21
s12 s22
(25.8)
Where, Xm Xn x i2 1 f x ðx,yÞ i¼1 j¼1 h 2ppm h2
(25.9)
Xm Xn y j2 1 s22 ðx; yÞ ¼ f y ðx,yÞ i¼1 j¼1 h 2ppm h2
(25.10)
s11 ðx; yÞ ¼
Xm Xn x i2 y j2 1 f x ðx,yÞ f y ðx,yÞ (25.11) s12 ðx; yÞ ¼ i¼1 j¼1 h h 2pmn h2 Where fx(x, y) and fy(x, y) are the partial derivatives of the image f(x, y) along the horizontal and vertical directions respectively. Then the standardization parameter S(x, y) for each pixel is defined as X 1 T Sðx; yÞ ¼ f x f y ðx, yÞ f x f y
(25.12)
Where, S (x, y) denotes the standardized gradient magnitude at pixel (x, y). The value of S (x, y) is determined for each pixel and if its value is found to be sufficiently large then the pixel (x, y) is considered to be an edge pixel.
25
Combining Multiscale and Multi Directional Analysis for Edge Detection. . .
25.5
331
Proposed Edge Detection Algorithm
Most edge detection algorithms specify a spatial scale at which the edges are detected. Typically, edge detectors utilize local operators and the effective area of these local operators defines this spatial scale. At the small scales corresponding to finer image details, edge detectors find intensity jumps in small neighborhoods. At the small scale, some of these edge responses originate from noise or clutter within the image and these edges are clearly not desirable. More interesting edges are the ones that also exist at larger scales corresponding to coarser image details. When the scale is increased, most noise clutter is eliminated in the detected edges but as a side effect the edges at larger scales are not as well localized as the edges at smaller scales. To achieve good localization and good detection of edges, a multi scale approach is needed. Therefore, multi scale decomposition is achieved in this paper, by applying a Laplacian pyramid. Then the directional information is extracted by means of applying the directional filter banks. Also, whether a pixel is an edge pixel or not depends on the gray values of the pixel and the surrounding pixels. After estimating the gradient vector, one should not use the magnitudes of the derivatives alone for determining the eligibility of a pixel to be an edge pixel, though it has been done in that way with many edge detectors. The variations in the neighborhood of a pixel need to be incorporated in this analysis. Threshold values from image to image may vary since the variations in the gray values in the neighborhood of pixels may vary from image to image. In order to automatically vary the threshold standardization of gradient magnitudes is to be done relative to the surrounding pixels gradient magnitude, and then it is to be tested whether the obtained value is large or not. A natural way of doing such standardization in any process is to use appropriate statistical principles. A way of accomplishing the above objective is dealt in our methodology. Our method of standardizing the gradient strength at each pixel locally before thresholding results in the removal of the ambiguity, and thereby produces reliable, robust and smooth edges. For each directional sub band, the statistical thresholding scheme mentioned in Sect. 25.4 is adapted to extract the edges along each direction. Then all the edge maps at each directional sub band are integrated to get the final edge map (Padma Vasavi et al. 2010).
25.5.1 Algorithm Edge_Detection Inputs: An image I of size 512x512 N ¼ Number of levels of multiscale decomposition M ¼ Number of directional decompositions Output: An edge map of size 512x512 Step 1: Preprocessing: Remove the noise in the image by using a Bi- variate filter and adjust the illumination by histogram equalization
332
K.P. Vasavi et al.
Step 2: Multiscale Decomposition: Decompose the given image in to low pass and band pass sub images using Laplacian Pyramids to get multi scale decomposition Step 3: Directional Decomposition: Apply the Directional Filter Bank to get the directional sub band images Step 4: Statistical Thresholding: For each sub band i. Obtain the partial derivatives along the horizontal and vertical directions. ii. Estimate the variance and co-variance matrices using Eqs. 25.8 –25.11 iii. Determine the standardization parameter using Eq. 25.12 iv. Obtain the edge map using the statistical thresholding Step 5: Integration of sub band edge images: Integrate the edge maps of all the sub images to get the final edge map. Step 6: Performance Evaluation: Estimate the performance of edge detector using Eqs 25.13–25.15
25.6
Experiments and Results
In this section, experiments have been performed both with general images and real time medical images, more specifically the dental x-ray images. To verify the performance of the proposed edge detection method, comparisons are made with the Canny edge detector and edge detection using Gabor transforms. The Canny edge detector is chosen because it is more admired as a most favorable edge detector characterized by its good localization of edges and minimal response. Gabor wavelet based edge detection is chosen for comparison because of late most of the real time applications like finger print recognition, medical image segmentation are employing this method. In the experiments using our proposed method, all the images are considered to be of size 512 x 512. Each image is decomposed in to 4 scales and 16 directional sub images. The proposed algorithm is tested by taking a ‘Meyer’ Laplacian Pyramidal wavelet and the ‘meyerh2’, ‘5/3’, ‘9/7’ directional filter banks. All the experiments are conducted on an Intel core i3-370 M processor with 2.4 GHz clock and 3 GB RAM.
25.6.1 Edge Detection Figure 25.6 shows the results of “Spokes” and “Bike” image which was processed by the proposed edge detection method with ‘Meyer’ LP and the ‘meyerh2’, ‘5/3’, ‘9/7’ directional filter banks. The performance of all the DFBs’ is almost the same but, the ‘9/7’ DFB exhibits slightly better performance when compared to the other two methods. So, for the remaining experiments, edge maps of proposed method with ‘9/7’ DFB are compared against the two other edge detection techniques: the
25
Combining Multiscale and Multi Directional Analysis for Edge Detection. . .
333
Fig. 25.6 (a) Spokes image (b) Edge map with 9/7 (c) 5/3 (d) Meyerh2 DFBs (e) Bike image (f) Edge map with 9/7 (g) 5/3 (h) meyerh2 DFBs
Fig. 25.7 (a) Eight image (b) Edge map of proposed method with Meyer LP and 9/7 DFB (c) Canny method (d) Gabor wavelet method
canny edge detection technique and the edge detection based on Gabor wavelets. The Canny edge detector is chosen because it is more admired as a most favorable edge detector characterized by its good localization of edges and minimal response. The edge technique based on the Gabor wavelets is selected for comparison because it is the most widely used method in the real time applications like finger print recognition, medical image segmentation, robotic vision. Now, Fig. 25.7 is considered for evaluating the performance of the proposed method. It is the image of the numeric ‘8’, which has the curved directional features to be detected. It is clearly observed that our proposed method has produced a clean and smooth edge map. Both the Canny edge detector and the Gabor wavelet based edge detector techniques have shown up spurious edges, especially at the curvilinear regions owing to their disability in detecting the directional information. Next the “eye” image in Fig. 25.8 is considered. This image is strongly exhibiting the superiority of the proposed method over the othertwo techniques. A lot of discontinuities are observed in the Canny’s edge maps and the Gabor edge
334
K.P. Vasavi et al.
Fig. 25.8 (a) Eye image (b) proposed method (c) Canny method (d) Gabor wavelet method
Fig. 25.9 (a) Boy image subjected to 1 dB psnr noise level (b) Proposed method (c) Canny’s method (d) Gabor wavelet method
Fig. 25.10 (a) Flower image with 1 dB psnr noise level (b) Proposed method (c) Canny method (d) Gabor wavelet method
maps especially near the eye lashes, closing portion of the eye, the eye lid and the eye ball, whilst these features are clearly identified by the proposed technique.
25.6.2 Noise Sensitivity To compare the performance of the edge detectors with respect to noise, the images subjected to a Gaussian noise of 1 dB noise level are taken to consideration. Figure 25.9 shows the results of performance of various edge detectors for the “Boy” image laid open to same amount of noise level. The results depict the fact that the performance of the proposed method is much better when compared to the other two methods as a large amount of non-edges are detected to be true edges by the remaining two methods. The proposed method has proven its robustness under noisy conditions also. Another illustration for noise sensitivity of the proposed algorithm is shown in Fig. 25.10 for the “flower” image
25
Combining Multiscale and Multi Directional Analysis for Edge Detection. . .
335
Fig. 25.11 (a) Dental X-ray image1 (b) proposed method (c) Canny method and (d) Gabor method respectively (e) Dental X-ray image2 (f) Proposed method (g) Canny method and (h) Gabor method respectively
25.6.3 Edge Detection of Dental X-Ray Images The process of edge detection is useful in processing the dental x-ray images, specifically in determining the root morphology and the length to which the root has extended, for the purpose of providing root canal treatment. Generally, a manual analysis of an expert is made to determine these two features. However, the human analysis has two disadvantages: intra subject variability and inter subject variability in analyzing the features. Therefore, we tried to implement our proposed algorithm to determine these two features and were quite successful. The results of edge detection of dental x-ray images are furnished in Fig. 25.11. The root pulp is the central part of the dent image. Though the outline of the tooth is identified clearly by the Canny and the Gabor wavelet based edge detection methods, they could not distinguish the pulp outline. So, these methods cannot be used for analysis in root canal treatment. Whilst our proposed edge detection method has clearly outlined the pulp and also its morphology. By using this method it is easy to measure the length of the pulp, which makes the root canal treatment much simpler.
25.6.4 Quantitative Analysis The performance of an edge detector can be made in two different methods: the subjective and the objective methods. Subjective methods of performance
336
K.P. Vasavi et al.
Table 25.1 Quantitative evaluation of proposed method MSE S. No 1 2 3 4 5 6
Image Spokes Bike Eight Eye Boy Flower
LP Meyer
DFB 9/7
Proposed method 0.148 0.165 0.134 0.129 0.126 0.339
Canny method 0.211 0.216 0.225 0.195 0.143 0.445
PSNR(dB) Gabor method 0.207 0.195 0.201 0.188 0.143 0.429
Proposed method 28.20 27.97 28.41 28.51 28.56 26.41
Canny method 27.44 27.39 27.12 27.64 28.27 25.65
Gabor method 27.47 27.61 27.49 27.98 28.27 25.28
evaluation of edge detectors use human judgment to estimate the performance of edge detectors. Alternatively, objective methods use signal to noise ratio and mean square error between the edge detectors image and the original one to determine the performance of edge detectors. Even though the objective methods are extensively used, they need not necessarily be associated with our discernment of image quality. For illustration, an image with a low error as determined by an objective measure may actually look much worse than an image with a high error metric (Mohamed et al. 2006). Commonly used objective measures are the root-meansquare error, erms, the root-mean square signal-to-noise ratio, SNRrms, and the peak signal-to-noise ratio, SNRpeak as in Eqs. 25.13–25.15 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 1 X N1 X u 1 M (E(x,y) ðX; yÞÞ2 erms ¼t MN x¼0 y¼0
SNR rms
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u M1 P N1 P u u ð E ðx; y ÞÞ 2 u x¼0 y¼0 ¼u u M1 P t P N1 ½ E ( x, y) O ð x, y ) 2 x¼0
(25.13)
(25.14)
y¼0
ðL 1Þ 2
SNR peak ¼ 10 log10 1
MN
M1 P
N1 P
x¼0
y¼0
(25.15)
½E ð x, y Þ O ð x, y ) 2
Where O(x, y) is the original image, E(x, y) is the reconstructed image and L is the number of gray level equal to 256 Table 25.1.
25.7
Conclusion
A novel edge detection scheme which uses Laplacian Pyramids for multi scale decomposition, Directional Filter Banks for obtaining the directional information is used in this paper. Furthermore, a statistical thresholding is used to validate a pixel
25
Combining Multiscale and Multi Directional Analysis for Edge Detection. . .
337
to be an edge. The results are tested for various general and medical images. Also, a comparison is made with the two popular edge detection techniques and the results obtained by the proposed method are found to be appreciable both in terms of qualitative and quantitative analysis.
References Aydin T, Yemez Y, Anarim E, Sankur B (Sept 1996) Multi directional and multi scale edge detection via m-band wavelet transform. IEEE Trans Image Process 5(9):1370–1377 Burt PJ, Adelson EH (Apr 1983) The Laplacian pyramid as a compact image code. IEEE Trans Commun 31(3):532–540 Chan WY, Law NF, Siu WC (2003) Multi scale feature analysis using directional filter banks. IC ICS-PCM, Singapore Do MN (Jan 2003) The finite ridgelet transform for image representation. IEEE Trans Image Process 12(1):16–28 Do MN, Vetterli M (Oct 2001) Pyramidal directional filter banks and curvelts. In: Proceedings of the IEEE, International Conference on Image Process (ICIP 2001), Patras, Greece Nguyen TT, Oriantara S (Mar 2007) A class of multi resolution directional filter banks. IEEE Trans Signal Process 55(3):949–961 Padma Vasavi K, Udaya Kumar K, Krishna Rao EV, Madhavi Latha M (2010) A novel statistical thresholding in edge detection using Laplacian pyramid and directional filter banks. Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2010 (WCECS 2010), San Francisco, USA, 20–22 Oct 2010, pp 589–593 Padma Vasavi K, Udaya Kumar K, Krishna Rao EV, Madhavi Latha M (2008) An adaptive statistical thresholding using GLCM in wavelet domain. IRACE, Hyderabad Rath G, Guillemot C (Sept 2006) Representing Laplacian pyramids with varying amount of redundancy. In: Proceedings of the 14th European signal processing conference (EUSIPCO 2006), Florence, Italy Roushdy M (Dec 2006) Comparative study of edge detection algorithms applying on the grayscale noisy image using morphological filter. GVIP J 6(4):17–23 William KP (2002) Digital image processing, 3rd edn. Wiley Interscience, New York
Chapter 26
Soft Vector Quantization with Inverse Power-Function Distributions for Machine Learning Applications Mohamed Attia, Abdulaziz Almazyad, Mohamed El-Mahallawy, Mohamed Al-Badrashiny, and Waleed Nazih
26.1
Introduction
Given a codebook of centroids1; i.e. set of centers of classes/clusters ci 2
(26.1)
. . .where dðq1 ; q2 Þ is any legitimate distance measure between q1 ; q2 2
8j;1jL
2 dðq; cj Þ
(26.2)
1 All the material presented in this paper is independent of the algorithm used for inferring that codebook; e.g. k-means, LBG . . . etc.
M. Attia (*) HLT consultant for The Engineering Company for the Development of Computer System, RDI, 12613 Giza, Egypt Consultant for Luxor Technologies Inc, L6L6V2 Oakville, Ontario, Canada Visiting assistant professor in the College of Computing and IT, Arab Academy for Science & Technology – AAST, Heliopolis Campus, Cairo, Cairo e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_26, # Springer Science+Business Media, LLC 2011
339
340
M. Attia et al.
The total quantization noise energy over a population of points2 in this space of size s versus that codebook of centroids (Duda and Hart 2000; Gray 1984; Jain et al. 2008) is hence given by: E2VQ ¼
s X i¼1
min
8j;1jL
dðqi ; cj Þ
2 (26.3)
VQ in the form of Eq. 26.1 is a hard-deciding operation following the winnertakes-all policy which may not be quite fair especially with rogue points which are almost equidistant from more than one centroid in the codebook (Attia et al. 2010; Jain et al. 2008; K€ ovesi et al. 2001; Seo et al. 1589). With machine-learning/ classification systems that include a hard-deciding VQ module, the quantized observations (or observation sequences) corresponding to some inputs during the training phase may be significantly different from those corresponding to the same inputs in the runtime that may have only experienced just a slight variance in the observation space! Regardless to the subsequent deployed machine-learning methodology, that difference will inevitably cause some deterioration in the performance of such systems. In order to boost the robustness of the run-time performance in the presence of all kinds of variances; e.g. noise, in the inputs to these systems, soft VQ is proposed so that there is a non-zero chance of the belonging of any given point to each centroid in the codebook. Intuitively, the closer is the point to some centroid than the other ones in the codebook; the higher is the probability of the attribution of this point to that centroid. Soft VQ in this sense will shake up the over-fitting of the training by introducing smoother and more expressive distributions of quantized observations in the statistical learning models, which will in turn be more capable to cope with run-time variances than those resulting from hard-deciding VQ. Formally, soft VQ may in general be formulated as: f dðq; c Þ i SoftVQ f ðdi Þ !iÞ ¼ L Pðq ¼P L P f dðq; cj Þ f dj j¼1
(26.4)
j¼1
The function f ðdi Þ must obey the following conditions: 1. f ðdi Þ 08di 0 2. f ðdi Þ is continuous 8di 0
2
Typically, any adaptive methodology for inferring the codebook works in the offline phase on a sample population that is assumed to have the same statistical properties of the phenomenon being modeled.
26
Soft Vector Quantization with Inverse Power-Function. . .
341
3. f ðdi Þ is a monotonically decreasing function 8di 0 SoftVQ
SoftVQ
!iÞ ¼ 1 ^ Pðq !j 6¼ iÞ ¼ 0 4. di ¼ 0 ) Pðq It is crucial to note that the quantization noise energy due to the soft VQ of each given point q is given by:
e2SoftVQ ¼
L X SoftVQ dj2 Pðq !jÞ ¼
L P j¼1
j¼1
dj2 f ðdj Þ
L P f dj
(26.5)
j¼1
2 In Eq. 26.5: each dj2 dmin ¼ dðq; ci0 Þ2 8j; 1 j L is weighted by probabilities 0, and together with Eqs. 26.2 and 26.3, we conclude: e2VQ e2SoftVQ ) 1 r
e2SoftVQ e2VQ
)1
E2SoftVQ E2VQ
r
(26.6)
This means that the price we pay for a soft VQ is a higher harmful quantization noise energy that may hinder any machine-learning method, which in turn indicates the necessity to compromise that price with the gains of avoiding over-fitting for a more robust performance to inputs’ variance. The inverse power-function distribution for soft VQ is defined in the next section of this chapter, and then Sect. 26.3 is devoted to a detailed analytic investigation of the quantization noise energy resulting from this distribution relative to that of the typical hard VQ. In Sect. 26.4, the experimental setup with the best performing – according to the published literature (Attia et al. 2009; El-Mahallawy et al. 2008) – OCR system for type-written cursive scripts – that happened to deploy discrete HMMs – is described, and the experimental results are analyzed to see how good they match our claims on the benefits of our proposed soft VQ scheme for machine learning systems with one or more VQ modules.
26.2
Inverse Power-Function Distribution
In addition to satisfying the four conditions mentioned above, it is much desirable for the design of the function f(x) to have the following features: 1. Simplicity. 2. Having tuning-parameters that control the probability attenuation speed with increasing distance. 3. Minimum ravg over all the possible emergences of q’s.
342
M. Attia et al.
While its realization of the third desirable feature is subject to a detailed analysis over Sect. 26.3, the inverse power-function realizes all the necessary conditions and the first two desirable features above. It is defined as: f dj ¼ djm ; m>0
26.3
(26.7)
Noise Energy of Our Soft VQ vs. Hard VQ
Substituting the formula of Eq. 26.7 in Eq. 26.5 gives: L P
e2SoftVQ ¼
j¼1 L P j¼1
dj2m
(26.8)
ðdjm Þ
. . .then, substituting 26.2 and 26.8 in 26.6, we get: L P
ðdj2m Þ
j¼1 L
P r
e2SoftVQ e2VQ
¼
ðdjm Þ
j¼1
2 dmin
m2 L P dmin ¼
j¼1
dj
L P
j¼1
dmin dj
m
(26.9)
Let us define: aj
dmin 1; aj 2 ½0; 1 dj
(26.10)
. . .then Eq. 26.9 can be re-written more conveniently as: L P
1þ r¼
j¼1 j6¼i0
1þ
ajm2
L P j¼1 j6¼i0
;1 j L am j
(26.11)
26
Soft Vector Quantization with Inverse Power-Function. . .
343
For 0<m<2; it is obvious that the numerator grows indefinitely faster than the denominator for arbitrarily infinitesimal values of some ak ; k 2 O f1; 2; :::; Lg so that: P m2 1 þ xN þ lim ak 8k2O ak !d!0 0<m<2 ¼ lim r P ak !08k2O 0<m<2 1 þ xD þ lim am k ak !d!0
8k2O
2m
¼
1 þ xN þ o lim ð1dÞ d!0
¼
1 þ xD þ o 0
0<m<2
1 þ xN þ 1 ¼1 1 þ xD þ 0
(26.12)
. . .where o ¼ SizeOf ðOÞ and 0<xN ; xD
1þ
L L P j¼1 j6¼i0
;1 j L
(26.13)
a2j
. . .and one can easily infer that: max rjm¼2 ¼
lim
aj !08j6¼i0 ;1jL
rjm¼2 ¼ L
(26.14)
As the size of the codebook used with non-trivial problems is typically a large number, the worst case of Eq. 26.14 still indicates a huge ratio of soft VQ noise energy vs. that of hard VQ which can still ruin machine learning especially as this worst case occurs at the dominant situation of points being so close to one centroid only and far from the others! For m>2; considering Eq. 26.10, the following three special cases of Eq. 26.1 can easily be noticed: 1þ lim r ¼
m!1
L P
ð lim ajm2 Þ j¼1 m!1 j6¼i0
1þ
L P
ð lim am j Þ
j¼1 m!1 j6¼i0
¼
1þ‘þ0 ¼1 1þ‘þ0
(26.15)
344
M. Attia et al.
. . .where ‘ is the number of aj ’s that are exactly equal to one. This shows that the quantization noise energy of our proposed soft VQ with the power m growing larger is approaching the one of the hard-deciding VQ; however its distributions are also turning less smooth and more similar to those of the hard-deciding VQ. 1þ lim
8aj !0; j6¼i0
r¼
L P
ð lim ajm2 Þ
j¼1 aj !0 j6¼i0 L P
1þ
¼
ð lim am j Þ
1 þ ðL 1Þ 0 ¼1 1 þ ðL 1Þ 0
(26.16)
j¼1 aj !0 j6¼i0
. . .which occurs only when q ¼ ci0 . 1þ lim r ¼
8aj !1
L P
ð lim ajm2 Þ
j¼1 aj !1 j6¼i0
1þ
L P
¼
ð lim
j¼1 aj !1 j6¼i0
am j Þ
1 þ ðL 1Þ 1 ¼1 1 þ ðL 1Þ 1
(26.17)
. . .which occurs when dðq; ci Þ is exactly the same 8i; 1 i L. Only for these special cases r ¼ 1 otherwise r>1. It is crucial to calculate the maximum value of r; i.e. the worst case, which - according to Eq. 26.6 – is an upper bound of the ratio between the total quantization noise energy of the proposed soft VQ to that of the conventional hard VQ. To obtain rmax , the ðL 1Þdimensional sub-space within aj6¼i0 2 ½0; 18j; 1 j L has to be searched for those ^ aj6¼i0 where that maximum is realized. This can be done analytically by solving the following set of ðL 1Þ equations: @r ¼ 0; 1 k L @ak 8k6¼i0
(26.18)
For the sake of convenience, let us re-write Eq. 26.11 as: r¼
X X Ak þ akm2 ajm2 ; Bk 1 þ am j m ; Ak 1 þ Bk þ ak 8j6¼i ; j6¼k 8j6¼i ;j6¼k 0
0
(26.19)
26
Soft Vector Quantization with Inverse Power-Function. . .
345
Then: @r ðm 2Þ ^ akm3 m ^akm1 ¼0) ¼ @ak 8k6¼i0 Ak þ ^ akm2 8k6¼i0 Bk þ ^am k 8k6¼i0
(26.20)
. . .that reduces into: Ak þ ^ akm2 m2 2 ^ Þ a ¼ r ¼ ð max k m Bk þ ^ ak 8k6¼i0 m 8k6¼i0
(26.21)
In order for Eq. 26.21 to hold true, all ^ ak6¼i0 must be equal so that: ^ ak j8k6¼i0 ¼ ^ a
(26.22)
. . .that reduces Eq. 26.19 into: A ¼ 1 þ ðL 2Þ ^ am2 ; B ¼ 1 þ ðL 2Þ ^am ) rmax ¼
Aþ^ am2 1 þ ðL 1Þ ^ am2 m2 ¼ Þ ^a2 m m ¼ ð Bþ^ a 1 þ ðL 1Þ ^ a m
ð26:23Þ
Re-arranging the terms of 26.23, we get the polynomial equation: ^am þ
m 1 m2 1 ^ a2 ¼ 0; m>2; L 2; ^a 2 ½0; 1 2 L1 2 L1
(26.24)
For any m>2 that is an even number, Eq. 26.24 can be shown to have one and only one real solution in the interval ^ a 2 ½0; 1 through the following three-step proof: ^ ¼ ^a2 ; c ¼ 1 , and re-write Eq. 26.24 as: 1. Put b L1 ^ ¼b ^m=2 þ gðbÞ
m ^m2c¼ 0 cb 2 2
m2 1 <0 2 L1 ^ ¼ 1Þ ¼ 2 L 2 þ m m þ 2 ¼ L >0 gðb 2 ðL 1Þ L1
^ ¼ 0Þ ¼ 2. gðb
^ has roots 2 ½0; 1 ∴gðbÞ ^ dgðbÞ m ^m2 1 m 3. ∵ ¼ b >0 þ ^ 2 2 ðL 1Þ dðbÞ ^ a monotonically increasing function. ∴gðbÞis ^ has only one root 2 ½0; 1. 4. From steps 2 and 3, gðbÞ A closed-form solution of Eq. 26.24 is algebraically extractable only for ðm2 Þ 2 f2; 3; 4; 5g (Jacobson et al. 2009).
346
M. Attia et al.
When m ¼ 4, for example, Eq. 26.24 turns into a quadratic equation whose solution results into (Attia et al. 2010): ^a
2
m¼4
pffiffiffi 1 L1 a2 m¼4 ¼ pffiffiffi ; rmax jm¼4 ¼ ; lim ^ L!1 L1 L 1 L1 1 pffiffiffi ¼ pffiffiffi ; lim rmax jm¼4 ¼ L 2 2 L 1 L!1
(26.25)
When m ¼ 6, we get a cubic equation whose solution (Attia et al. 2010; Jacobson et al. 2009) results into: ^ a2 m¼6
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3 3 1 1 1 þ 1 þ L1 1 1 þ L1 m2 ffiffiffiffiffiffiffiffiffiffiffi p ¼ ¼ ; r ^ a2 m¼6 j max m¼6 3 m L1
a2 m¼6 lim ^ L!1
rffiffiffi rffiffiffi 2 3 L 3 2 ; lim rmax jm¼6 ¼ ¼ L L!1 3 2
(26.26)
For ðm2 Þ>5: one can only derive an expression for ^a2 and rmax with any even degree m at L ! 1; i.e. with a large codebook. From Eq. 26.25 and Eq. 26.26 one can speculate the generalization that: a¼ lim ^
L!1
2L m2
1 m
; lim rmax L!1
2 m2 2L m ¼ m m2
(26.27)
Substituting that guess in the terms of Eq. 26.24 gives: , ! T3 2 L 1 2 ðL 1Þ 1 ¼ lim lim ¼ 1 L!1 T1 L!1 m2 m2 T2 m ¼ lim L!1 T1 m2 T2 m ¼ lim L!1 T3 m2
,
lim
L!1
,
2L m2
lim
L!1
2L m2
m2
m2
¼0
¼0
. . .which confirms the validity of Eq. 26.27 as an approximation at L>>1. Below, Table 26.1 summarizes the approximate expressions of ^a and rmax with large codebooks, and Fig. 26.1 illustrates the accurate curve of Eq. 26.1 with L ¼ 1024>>1 at different even values of the power m. It is clear that rmax gets lower with increasing values of m>2 in accordance with Eq. 26.27.
26
Soft Vector Quantization with Inverse Power-Function. . .
347
Table 26.1 Relative noise energy of soft VQ with large codebooks L¼1,024 M 2 4 6 8 m 1
lim ^a
lim rmax
L!1
L!1
0 1 L4
L 1 pffiffiffi L 2 rffiffiffi 2 3 L 3 2 rffiffiffi 3 4 L 4 3
16 L 2 18 L 3 1 2L m m2 1
m2 2 L 2=m m m2 1
L¼2,048
^ a
rmax
^a
rmax
0 0.177
1,024 16
0 0.149
2,048 22.63
0.354
5.333
0.315
6.720
0.482
3.224
0.442
3.834
1
1
1
1
Fig. 26.1 Quantization noise energy of the proposed soft VQ relative to that of hard VQ at different values of the power m
26.4
Experimental Setup and Results
This section presents the experimentation we conducted to attest the benefits we claimed – in the introduction above – of our proposed soft VQ for machine-learning systems that include a VQ module. For this challenge, we have chosen the task of Optical Character Recognition (OCR) for Arabic script (Attia et al. 2004) which is an instance of a broader family of cursive scripts including Persian, Urdu, Kurdish, etc. This is a 4-decade old tough pattern recognition problem as the connected characters in the script need to be both identified and segmented simultaneously. It is evident in the literature that the most
348
M. Attia et al.
effective methodology for dealing with such a problem is the HMM-based one (Attia et al. 2009; Bazzi et al. 1999; El-Mahallawy et al. 2008; Gouda et al. 2004; Kanungo et al. 1999; Khorsheed 2007; Mohamed and Gader 1996; Rashwan et al. 2007). Among the many recent attempts made on this problem, we have selected CleverPage# to experiment with.3 CleverPage# is an ASR4-like HMM5-based Arabic Omni Type-Written OCR system that is reported to achieve the highest accuracy in this regard (Attia et al. 2009; El-Mahallawy et al. 2008). The most characterizing innovation in this OCR system that puts CleverPage# ahead of its rivals is its lossless recognition features which are autonomously normalized horizontal differentials encoded in 16-component vectors sequence. Each features vector is computed to differentially encode the pixels pattern in each single-pixel width slice (i.e. frame) included within the right-to-left horizontally sliding window. Given a sequence of such features, one can retrieve the shape of the digitally scanned type-written script with only a scaling factor along with an inevitable limited digital distortion. Each single-pixel frame of a given Arabic word is assumed to have a limited maximum number of vertical dark segments (4 segments are found to be enough). Each of these segments is coded into four components differentially representing the topological and agglomerative properties of the segment as follows: 1st component: The segment’s center of gravity with respect to that of the previous frame. 2nd component: The segment’s length relative to that of the previous frame. 3rd and 4th components: The orders of the most bottom, and the top segments in the preceding frame which are in 8-connection with the current segment. Special negative codes are given in case of non-connected segments. Empty segments are padded by zeros. The dimensionality of this features vector is 16 ¼ 4 segments 4 components per segment. While half of these components are sharply discrete in nature, the others are analog, which made it quite hard to find Gaussian mixtures that properly represent that kind of hybrid data. So, discrete HMMs rather than GMMs have been resorted to as the recognition vehicle of this system (Attia et al. 2009; El-Mahallawy et al. 2008; Rashwan et al. 2007). This made CleverPage# an excellent candidate for our experimentation especially as its producer6 was generous enough to allow us set up two versions of this discrete HMM-based Arabic OCR system with L ¼ 2,048: one with hard-deciding
3 See http://www.RDI-eg.com/technologies/OCR.htm for a details on CleverPage#, and see also the last section of the web page http://www.RDI-eg.com/technologies/papers.htm for downloading references (Attia et al. 2004), (El-Mahallawy et al. 2008), and (Rashwan et al. 2007) in the references list at the end of this chapter. 4 Automatic Speech Recognition 5 Hidden Markov Models 6 RDI; www.RDI-eg.com
26
Soft Vector Quantization with Inverse Power-Function. . .
349
Table 26.2 Relative noise energy of soft VQ with large codebooks Error rate of assimilation test Error rate of generalization test WERA Hard VQ 3.08%
Soft VQ 3.71%
CERA Hard VQ 0.77%
Degradation due to soft VQ –20.45%
Soft VQ 0.93%
WERG Hard VQ 16.32%
Soft VQ 13.98%
CERG Hard VQ 4.08%
Soft VQ 3.50%
Enhancement due to hard VQ +14.34%
VQ, and the other with our proposed soft VQ with m ¼ 8. This is an empirical selection of the value of m that compromises between both minimal quantization noise energy of soft VQ relative to that of hard VQ, along with soft enough distributions of our soft VQ scheme. The training data of both setups covers ten different popular Arabic fonts with six sizes per each font. It contains 25 LASER-printed pages per each size of each font with all the pages scanned at 600dpi B&W along with the correct digital text content of each page (El-Mahallawy et al. 2008). Then we challenged both versions with two sets of test data: assimilation test data and generalization test data. The assimilation test data consists of 5 test pages for each of the 60 (font, size) pairs represented in the training phase. Of course the pages themselves do not belong to the training data. These assimilation test pages are produced in the same conditions as the training ones. This provides the favorable runtime conditions of least runtime variance from the training conditions. The generalization test data, on the other hand, are sample pages picked randomly from some Arabic paper-books that are scanned also at 600 dpi.7 Obviously, there is no control on the fonts or sizes used in these pages. The tilting distortion and the printing noise are also quite apparent. Of course, this provides the less favorable runtime conditions of more considerable variance from the training conditions. Table 26.2 below compares the measured word error rates of both versions with each of the two test data sets. In this table, CER stands for Character Error Rate while WER stands for Word Error Rate. Under the assumption of a single character error per word - which is valid in our case - the relation WER h CER; CER<<1 holds true with h 4 for Arabic (Attia et al. 2009). While WER is the rate perceived by the OCR end user, many researchers and vendors prefer to use CER to optimistically state the performance of their systems. While the OCR version with hard VQ produced smaller error rates than the version with soft VQ in the assimilation test, the soft VQ version outperformed the hard VQ version in the generalization test.
7 This data set as well as the corresponding output of both versions are downloadable at the link: http://www.rdi-e.g.com/Soft_Hard_VQ_OCR_Generalization_Test_Data.RAR
350
M. Attia et al.
As the models built upon the training of the hard VQ version were more overfitted to the training data than those built with soft VQ, it was easier for the former to recognize the “similar” inputs from assimilation test data with a narrower error margin. On the other hand, the more “flexible” models built with soft VQ were more capable to absorb the much more variance in the inputs from the generalization test data than those built with hard VQ. This observed behavior nicely matches our claimed benefits of our proposed soft VQ for machine-learning as mitigating over-fitting and rendering their performance more robust with runtime variances like noise.
26.5
Conclusion and Future Work
This chapter discussed the virtues of soft vector quantization over the conventional hard-deciding one esp. for machine-learning applications. It then proceeded to propose a soft VQ scheme with inverse power-function distributions. The quantization noise energy of this soft VQ compared with that of hard VQ is then analytically investigated to derive a formula of an upper bound on the ratio between the quantization noise energy in both cases. This formula reveals the proper values of the power where it is safe to use with such distributions without ruining the stability of machine-learning. To attest the claimed benefits of our proposed soft VQ for machine-learning, we have experimented with a recent Omni Type-Written OCR for cursive scripts whose recognition error margin with Arabic printed text is reported to be the minimum. This OCR has an ASR-like HMM-based architecture with lossless recognition features vector combining both analog and sharply discrete components, which necessitated the usage of discrete HMMs hence the deployment of VQ. We setup two versions of this OCR system: one with the conventional hard VQ and the other with the proposed soft VQ. We then challenged each version with two sets of test data; assimilation test data and generalization test data. While the OCR version with hard VQ realized smaller error rates than the version with soft VQ in the assimilation test, the latter outperformed the former in the generalization test. These results match our claimed positive impact of our proposed soft VQ on machine-learning systems as mitigating over-fitting and rendering their performance more robust with runtime variances; e.g. noise.
References Attia M (2004) Arabic orthography vs. Arabic OCR; Rich heritage challenging a much needed technology, multilingual computing & technology magazine. www.Multilingual.com, USA Attia M, Rashwan M, El-Mahallawy M (2009) Autonomously normalized horizontal differentials as features for HMM-based omni font-written OCR systems for cursively scripted languages.
26
Soft Vector Quantization with Inverse Power-Function. . .
351
IEEE Inter Conf Signal Image Process Appl (ICSIPA09), Kuala Lumpur, Malaysia, Nov 2009. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber¼5478619 Attia M, Almazyad A, El-Mahallawy M, Al-Badrashiny M, Nazih W (2010) Post-clustering soft vector quantization with inverse power-function distribution, and application on discrete HMM-based machine learning. Lecture notes in engineering and computer science: Proc. The World Cong Eng Comp Sci 2010 (WCECS 2010), 20–22 Oct 2010, San Francisco, pp 574–580 Bazzi I, Schwartz R, Makhoul J (1999) An omnifont open-vocabulary OCR system for English and Arabic. IEEE Trans Pattern Anal Machine Intel 21(6):495–504 Duda RO, Hart PE (2000) Pattern classification and scene analysis, 2nd edn. Wiley, New York El-Mahallawy MSM (Apr 2008) A large scale HMM-based omni front-written OCR system for cursive scripts, PhD thesis, Dept. of Computer Engineering, Faculty of Engineering, Cairo University Gouda AM (2004) Arabic handwritten connected character recognition, PhD thesis, Dept. of Computer Engineering, Faculty of Engineering, Cairo University Gray RM (1984) Vector quantization. IEEE Signal Process Mag 1:4–29 Jacobson N (2009) Basic algebra, vol 1, 2nd ed. Dover. ISBN 978-0-486-47189-1, 2009 Jain AK (2008) Data clustering: 50 years beyond K-means http://biometrics.cse.msu.edu/ Presentations/FuLectureDec5.pdf. Plenary talk at the IAPR’s 19th international conference on pattern recognition, Tampa, Florida. http://www.icpr2008.org/ Kanungo T, Marton G, Bulbul O (1999) OmniPage vs. Sakhr: Paired model evaluation of two Arabic OCR products. Proc SPIE Conf Document Recogition Retrieval VI, pp 109–121 Khorsheed MS (2007) Offline recognition of omnifont arabic text using the HMM toolKit (HTK). Elsevier’s Pattern Recognition Lett 28:1563–1571 K€ovesi B, Boucher J-M, Saoudi S (2001) Stochastic K-means algorithm for vector quantization. Pattern Recognition Lett-Elsevier 22(6–7):603–610 Mohamed M, Gader P (1996) Handwritten word recognition using segmentation-free hidden Markov modeling and segmentation-based dynamic programming techniques. IEEE Trans Pattern Analysis Machine Intel 18(5):548–554 Rashwan M, Fakhr M, Attia M, El-Mahallawy M (Dec 2007) Arabic OCR system analogous to HMM-based ASR systems; Implementation and evaluation. J Eng Appl Sci, Cairo University 54(6):653–672, www.Journal.eng.CU.edu Seo S, Obermayer K (2003) Soft learning vector quantization. ACM’s Neural Computation 15 (7):1589–1604, MIT Press
Chapter 27
Anfis-Based P300 Rhythm Detection Using Wavelet Feature Extraction on Blind Source Separated Eeg Signals Juan Manuel Ramirez-Cortes, Vicente Alarcon-Aquino, Gerardo Rosas-Cholula, Pilar Gomez-Gil, and Jorge Escamilla-Ambrosio
27.1
Introduction
This article presents a revised and extended version of a paper presented at the World Congress on Engineering and Computer Science 2011, International Conference on Signal Processing and Imaging Engineering (Ramı´rez-Cortes et al. 2010). In recent years, there has been a growing interest in the research community on signal processing techniques oriented to solve the multiple challenges involved in Brain Computer Interfaces (BCI) applications (Paul et al. 2008; Theodore et al. 2007; Bashashati et al. 2007). Brain Computer Interfaces (BCIs) are systems which allow people to control some devices using their brain signals. An important motivation to develop BCI systems, among some others, would be to allow an individual with motor disabilities to have control over specialized devices such as computers, speech synthesizers, assistive appliances or neural prostheses. A dramatic relevance arises when thinking about patients with severe motor disabilities such as locked-in syndrome, which can be caused by amyotrophic lateral sclerosis, high-level spinal cord injury or brain stem stroke. In its most severe form people are not able to move any limb. BCIs would increase an individual’s independence, leading to an improved quality of life and reduced social costs. Among the possible brain monitoring methods for BCI purposes, the EEG constitutes a suitable alternative because of its good time resolution, relative simplicity and noninvasiveness when compared to other methods such as functional magnetic resonance imaging, positron emission tomography (PET), magnetoencephalography or electrocorticogram systems. There are several signals which can be extracted from the EEG in order to develop BCI systems, including the slow cortical potential (Bashashati et al. 2007), m and b
J.M. Ramirez-Cortes (*) Department of Electronics and Department of Computational Science, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro No. 1 Tonantzintla, Puebla 72840, Mexico e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_27, # Springer Science+Business Media, LLC 2011
353
354
J.M. Ramirez-Cortes et al.
rhythms (Royer et al. 2009; Delaram et al. 2009), motor imagery (Thomas et al. 2009), static-state visually evoked potentials (Zhu et al. 2010; Christian et al. 2009), or P300 evoked potentials (David 2005; Seno et al. 2010; Brice et al. 2006). P300 evoked potentials occur with latency around 300 ms in response to target stimuli that occur unexpectedly. In a P300 controlled experiment, subjects are usually instructed to respond in a specific way to some stimuli, which can be auditory, visual, or somatosensory. P300 signals come from the central-parietal region of the brain and can be found more or less throughout the EEG on a number of channels. The P300 is an important signature of cognitive processes such as attention and working memory and an important clue in the field of neurology to study mental disorders and other psychological disfunctionalities (Kun et al. 2009). In this work, an experiment on P-300 rhythm detection using wavelet-based feature extraction, and an ANFIS algorithm is presented. The experiment has been designed in such a way that the P300 signals are generated when the subject is exposed to some visual stimuli, consisting of a sequential group of slides with a landscape background. Images of a ship are inserted using a controlled non-uniform sequence, and the subject is asked to press a button when the ship unexpectedly appears. The EEG signals are preprocessed using an Independent Component Analysis (ICA) algorithm, and the P300 is located in a time-frequency plane using the Discrete Wavelet Transform (DWT) with a sub-band coding scheme. The rest of the paper is organized as follows: Sect. 27.2 presents the theory associated to the wavelet sub-band coding algorithm. Section 27.3 describes Independent Component Analysis (ICA) as part of the pre-processing stage. Section 27.4 reports the evoked potential experiment and the proposed method on P300 signal detection. Section 27.5 describes the ANFIS model and its application to the EEG signals. Section 27.6 presents obtained results, and Sect. 27.7 presents some concluding remarks, perspectives, and future direction of this research oriented to the implementation of a BCI system.
27.2
Discrete Wavelet Transform
The Discrete Wavelet Transform (DWT) is a transformation that can be used to analyze the temporal and spectral properties of non-stationary signals. The DWT is defined by the following equation (Priestley 2008): Wð j; kÞ ¼
XX j
f ðxÞ2j=2 cð2j x kÞ
(27.1)
k
The set of functions cj;k ðnÞis referred to as the family of wavelets derived from cðnÞ, which is a time function with finite energy and fast decay called the mother wavelet. The basis of the wavelet space corresponds then, to the orthonormal functions obtained from the mother wavelet after scale and translation operations.
27
Anfis-Based P300 Rhythm Detection Using Wavelet Feature Extraction. . .
355
Fig. 27.1 Two-level wavelet filter bank in the sub-band coding algorithm
The definition indicates the projection of the input signal into the wavelet space through the inner product, then, the function f(x) can be represented in the form: f ðxÞ ¼
X
dj ðkÞcj;k
(27.2)
j;k
where dj(k) are the wavelet coefficients at level j. The coefficients at different levels can be obtained through the projection of the signal into the wavelets family as expressed in Eqs. 27.3 and 27.4. D
E X D E dl f ; fj;kþl f ; cj;k ¼
(27.3)
l
D E E 1 X D cl f ; fj1;2kþl f ; fj;k ¼ pffiffiffi 2 l
(27.4)
The DWT analysis can be performed using a fast, pyramidal algorithm described in terms of multi-rate filter banks. The DWT can be viewed as a filter bank with octave spacing between filters. Each sub-band contains half the samples of the neighboring higher frequency sub-band. In the pyramidal algorithm the signal is analyzed at different frequency bands with different resolution by decomposing the signal into a coarse approximation and detail information. The coarse approximation is then further decomposed using the same wavelet decomposition step. This is achieved by successive high-pass and low-pass filtering of the time signal and a down-sampling by two (Pinsky et al. 2009), as defined by the following Eqs. 27.5 and 27.6: aj ðkÞ ¼
X
hðm 2kÞ ajþ1 ðmÞ
(27.5)
gðm 2kÞ ajþ1 ðmÞ
(27.6)
m
dj ðkÞ ¼
X m
Figure 27.1 shows a two-level filter bank. Signals aj(k), and dj(k) are known as approximation and detail coefficients, respectively.
356
J.M. Ramirez-Cortes et al.
This process may be executed iteratively forming a wavelet decomposition tree up to any desired resolution level. In this work the analysis was carried out up to the 11 decomposition level (16 s windows with sampling frequency of 128 sps) applied on the signals separated from the ICA process described in the next section.
27.3
Preprocessing of Eeg Signals Using Independent Component Analysis
Independent Component Analysis (ICA), an approach to the problem known as Blind Source Separation (BSS), is a widely used method for separation of mixed signals (Amar et al. 2008; Keralapura et al. 2011). The signals xi ðtÞare assumed to be the result of linear combinations of the independent sources, as expressed in Eq. 27.7. xi ðtÞ ¼ ai1 si ðtÞ þ ai2 s2 ðtÞ þ þ ain sn ðtÞ
(27.7)
x ¼ As
(27.8)
or in matrix form:
where A is a matrix containing mixing parameters and S the source signals. The goal of ICA is to calculate the original source signals from the mixture by estimating a de-mixing matrix U that gives: _
s ¼ Ux
(27.9)
This method is called blind because both the mixing matrix A and the matrix containing the sources S are unknown, i.e., little information is available. The demixing matrix U is found by optimizing a cost function. Several different cost functions can be used for performing ICA, e.g. kurtosis, negentropy, etc., therefore, different methods exist to estimate U. For that purpose the source signals are assumed to be non-gaussian and statistically independent. The requirement of non-gaussianity stems from the fact that ICA relies on higher order statistics to separate the variables, and higher order statistics of Gaussian signals are zero (John 2008). EEG consists of measurements of a set of N electric potential differences between pairs of scalp electrodes. Then the N-dimensional set of recorded signals can be viewed as one realization of a random vector process. ICA consists in looking for an overdetermined (N P) mixing matrix A (where P is smaller than or equal to N) and a P-dimensional source vector process whose components are the most statistically independent as possible. In the case of the P300 experiment
27
Anfis-Based P300 Rhythm Detection Using Wavelet Feature Extraction. . .
357
described in this paper, ICA is applied with two objectives; denoising the EEG signal in order to enhance the signal to noise ratio of the P-300, and separating the evoked potential from some artifacts, like myoelectric signals derived from eye-blinking, breathing, or head motion.
27.4
Experimental Setup and Proposed Methodology for P-300 Signal Detection
In this work the EPOC headset, recently released by the Emotiv Company, has been used (Emotiv Systems Inc.). This headset consists of 14 data-collecting electrodes and 2 reference electrodes, located and labeled according to the international 10–20 system (John 2008). Following the international standard, the available locations are: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8 and AF4. The EEG signals are transmitted wirelessly in the frequency of 2.4 GHz to a laptop computer. This experiment consists of presenting a non-persistent image to cause a P300 response from the user. The block diagram of the system to evoke and capture P300 signals, and a picture of the described setup are shown in Figs. 27.2 and 27.3, respectively. The subject is resting in a comfortably position during the testing. A simple graphical application shows in the screen a starship attacking a neighborhood in a fixed time sequence not known by the subject, as represented in Table 27.1. Recognition of the ship by the subject, when it suddenly appears in the screen, is expected to generate a P300 evoked potential in the brain central zone. The serial port is used for sending time markers to the Emotive testbench, in synchrony with the moments when the ship appears in the screen. The Testbench application provided by Emotiv System Co., is used to capture raw data from the 14 electrodes, as shown in Fig. 27.4.
Stimulus application
Serial Comm
Capture System
Subject using EEG headset
Bluetooth
Fig. 27.2 Block diagram of the experimental setup used during the P300 signals detection
358
J.M. Ramirez-Cortes et al.
Fig. 27.3 Headset and stimulus used for the experiment on P300 signal detection
Table 27.1 Event time examples Event Time difference 1 4000 2 3000 3 4000 4 3000 5 5500 6 3000 7 4000 8 4500
sequence Time (mS) 4000 7000 11000 14000 19500 22500 26500 31000
Fig. 27.4 Block diagram of the proposed system for ANFIS-based P-300 signal detection
27
Anfis-Based P300 Rhythm Detection Using Wavelet Feature Extraction. . .
359
Fig. 27.5 ANFIS structure
The operations proposed to detect the P300 rhythm are summarized in the block diagram of Fig. 27.5. First, a band-pass filter selects the required frequency components and cancels the DC value. Then, ICA blind source separation is applied with the purpose of denoising the EEG signal and separating the evoked potential from artifacts, like myoelectric signals derived from eye-blinking, breathing, or head motion, as well as cardiac signals. The P300 is further located in time and scale through a wavelet sub-band coding scheme. This information is further fed into an Adaptive Neurofuzzy Inference System (ANFIS), as described in the next section.
27.5
Adaptive Neurofuzzy Inference System
Adaptive Neuro Fuzzy Inference Systems (ANFIS) combine the learning capabilities of neural networks with the approximate reasoning of fuzzy inference algorithms. Embedding a fuzzy inference system in the structure of a neural network has the benefit of using known training methods to find the parameters of a fuzzy system. Specifically, ANFIS uses a hybrid learning algorithm to identify the membership function parameters of Takagi-Sugeno type fuzzy inference systems. In this work, the ANFIS model included in the MATLAB toolbox has been used for experimentation purposes. A combination of least-squares and backpropagation gradient descent methods is used for training the FIS membership function parameters to model a given set of input/output data through a multilayer neural network. ANFIS systems have been recently used for optimization, modeling, prediction, and signal detection, among others (Douglas et al. 2004; Chang and Chang 2006; Subasi 2007). In this paper, the ANFIS system is proposed to be used for the detection of the P-300 rhythm in an EEG signal, for BCI applications. Frequency bands with the most significant energy content, in the
360
J.M. Ramirez-Cortes et al.
Fig. 27.6 Control surfaces of input B1 and B2 related to the output
Fig. 27.7 Gaussian membership functions corresponding to the input B1
range of the P-300 signal, are selected from the wavelet decomposition, as the input for the ANFIS system. These bands are 8–4, 4–2, 2–1, and 1–0.5 Hz, which are considered as the linguistic variables B1, B2, B3 and B4, respectively. The ANFIS structure is depicted in Fig. 27.6. Figure 27.7 shows the control surfaces corresponding to inputs B1 and B2 related to the output. Figure 27.7 shows the input Gaussian membership functions for input B1.
27
Anfis-Based P300 Rhythm Detection Using Wavelet Feature Extraction. . .
361
Fig. 27.8 ANFIS output and triangle pulses
The ANFIS is used to map the P300 signal composition to a triangle pulse occurring simultaneously during the training stage. Figure 27.8 shows the ANFIS output following triangle pulses after a 400 epochs training. A trained ANFIS is further used during a verification stage, using the EEG signals obtained from eight test subjects performing the same experiment with 10 trials of 16 s each.
27.6
Results
The captured signals were analyzed using a time window of 16 s, with a sampling frequency of 128 samples per second. Figure 27.9 shows the 14 electrodes raw signals obtained from the emotive headset. As described before, a band-pass filtering stage is applied to the raw data. Figure 27.10 shows information from the electrodes T8, FC6, F4, F8 and AF4 signals, after the filter is applied. The P300 signals are predominant in the brain central area, thus the P300 is typically measured from the Pz, Cz, Fz electrodes. The Emotive headset does not include specific electrodes over the brain central area, however, the headset can be positioned in such a way that the electrodes AF3, AF4, F3, and F4, are able to collect the EEG signals relevant to the P300 experiment described in this work. The EEG signals obtained from the 14 electrodes are then processed through the ICA algorithm. The 14 channels are shown in Fig. 27.11. Typically, the P300 signals are embedded in artifacts, and they appear in two different channels; in this case channel 2 and 3. After the blind source separation applied to electrodes AF3, AF4, F3, and F4 signals, it can be noticed that P300 signals are visible on channel 2, while the others separated channels show some artifacts such as the myoelectric signal from blinking, which is predominant in AF3 and AF4 electrodes, cardiac rhythm, and system noise. The signals obtained after the ICA separation, are shown in Fig. 27.12.
362
Fig. 27.9 Raw data obtained from the EEG headset
Fig. 27.10 Prefiltered EEG signals
Fig. 27.11 Fourteen channels entered to the ICA algorithm
J.M. Ramirez-Cortes et al.
27
Anfis-Based P300 Rhythm Detection Using Wavelet Feature Extraction. . .
Fig. 27.12 Separated signals obtained from the ICA algorithm
Fig. 27.13 Scalogram of signal obtained from channel 2
Fig. 27.14 ANFIS output showing detection of P-300 events
363
364
J.M. Ramirez-Cortes et al.
Table 27.2 Results obtained on the P300 rhythm detection
Result Detected Undetected Detected taking account false positive events
Rate 85% 15% 60%
A time-scale analysis in the wavelet domain was then performed in order to locate the energy peaks corresponding to the P300 rhythm. DWT sub-band coding with 11 decomposition levels, using a Daubechies-4 wavelet was applied to channel 2, as shown in Fig. 27.13. It can be seen that the P300 peaks are easily distinguished in the wavelet domain. The energy peaks in the scalogram of Fig. 27.13, are located in the bands 0.5–1 Hz and 1–2 Hz, as expected. It was noted that P300 rhythms were distinguished better in the EEG signals corresponding to the eight first events in the experiment. After that time lapse, the experiment became tedious for most of the users, with the consequence of generating low-level P300 signals, undetectable in the experiments. Figure 27.14 shows a typical obtained signal, corresponding to the detection of P300 rhythms, as the output of the ANFIS system. Table 27.2 summarizes the total detection accuracy obtained with the proposed system.
27.7
Concluding Remarks
This paper presented an experiment on P300-rhythm detection based on ICA-based blind source separation, wavelet analysis, and an ANFIS model. The results presented in this paper are part of a project with the ultimate goal of designing and developing brain computer interface systems. These experiments support the feasibility to detect P300 events using the Emotiv headset through an ANFIS approach. The proposed method is suitable for integration into a brain-computer interface, under a proper control paradigm. DWT coefficients could be used further as input to a variety of classifiers using different techniques, such as distance-based, k-nearest neighbor or Support Vector Machines (SVM).
References Bashashati A, Fatourechi M, Ward RK, Birch GE (2007) A survey of signal processing algorithms in brain-computer interfaces based on electrical brain signals. J Neural Eng 4(2):R32–R57 Berger TW, Chapin JK, Gerhardt GA, McFarland DJ, Principe JC, Soussou WV, Taylor DM, Tresco PA (2007) WTEC panel report on international assessment of research and development in brain-computer interfaces. World Technology Evaluation Center, Inc., Baltimore Rebsamen B, Burdet E, Guan C, Zhang H, Teo CL, Zeng Q, Ang M, Laugier C (2006) A braincontrolled wheelchair based on P300 and path guidance. Proceedings of IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, 1101–1106
27
Anfis-Based P300 Rhythm Detection Using Wavelet Feature Extraction. . .
365
Chang F-J, Chang Y-T (2006) Adaptive neuro-fuzzy inference system for prediction of water level in reservoir. Adv Water Res 29(1):1–10 Mandel C, Luth T, Laue T, Rofer T, Graser A, Krieg-Bruckner B (2009) Navigating a smart wheelchair with a brain-computer interface interpreting steady-state visual evoked potentials. Proceedings of the IEEE/RSJ International Conference On Intelligent Robots and Systems, St Louis, MO, USA, pp 1118–1125 David E (2005) Linden, “The P300: where in the brain is it produced and what does it tell us?”. The Neuroscientist 11(6):563–576 Jarchi D, Abolghasemi V, Sanei S (2009) Source localization of brain rhythms by empirical mode decomposition and spatial notch filtering. 17th European signal processing conference (EUSIPCO 2009), Glasgow, Scotland, 24–28 Aug Emotiv Systems Inc. Researchers. http://www.emotiv.com/researchers/ Kachenoura A, Albera L, Senhadji L, Comon P (Jan 2008) ICA: A potential tool for BCI systems. IEEE Signal Process Mag 25(1):57–68 Keralapura M, Pourfathi M, Sirkeci-Mergen B (2011) Impact of contrast functions in fast-ICA on twin ECG separation. IAENG Int J Comput Sci 38(1):38–47 Li K, Sankar R, Arbel Y, Donchin E (2009) P300-based single trial independent component analysis on EEG signal. Lecture notes, foundations of augmented cognition. Neuroergonomicsand Operational Neuroscience, vol 16, Springer, pp 404–410 Pinsky MA (2009) Introduction to fourier analysis and wavelets, Graduate studies in mathematics, vol 102. American Mathematical Society, Providence Priestley MB (2008) Wavelets and time-dependent spectral analysis. J Time Series Anal 17 (1):85–103 Ramı´rez-Cortes JM, Alarcon-Aquino V, Rosas G, Gomez-Gil P, Escamilla-Ambrosio J (2010) P-300 rhythm detection using ANFIS algorithm and wavelet feature extraction in EEG signals. Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2010 (WCECS 2010), San Francisco, USA, 20–22 Oct 2010 Royer AS, He B (2009) Goal selection vs. Process control in a brain-computer interface based on sensorimotor rhythms. J Neural Eng 6(1), 016005 Sajda P, Muller KR, Shenoy KV (Jan 2008) Brain computer interfaces. IEEE Signal Proc Mag 16:16–28 Semmlow JL (2008) Biosignal and medical image processing, 2nd ed. CRC Press/Taylor and Francis Group, New York Bernardo dal Seno, Matteucci M, Mainardi L (2010) Online detection of P300 and error potentials in a BCI speller. Comput Intelligence Neurosci 307254, pp 1–5 Subasi A (2007) Application of adaptive neuro-fuzzy inference system for epileptic seizure detection using wavelet feature extraction. Comput Biol Med 37(2):227–244 Thomas KP, Guan C, Chiew Tong Lau AP, Vinod AP, Ang KK (2009) A new discriminative common spatial pattern method for motor imagery brain computer interfaces. IEEE Trans Biomed Eng 56(11):2730–2733 Vieira DAG, Caminhas WM, Vasconcelos JA (March 2004) Extracting sensitivity information of electromagnetic device models using a modified ANFIS topology. IEEE Trans Magnetics 40 (2):1180–1183 Zhu D, Bieger J, Garcia-Molina G, Aarts RM (2010) A survey of stimulation methods used in SSVEP-based BCIs, Computational intelligence and neuroscience. Hindawi Publishing Corporation 702357:1–12
Chapter 28
A Hyperbola-Pair Based Road Detection System for Autonomous Vehicles Othman O. Khalifa, Imran M. Khan, and Abdulhakam A.M. Assidiq
28.1
Introduction
In the last two decades, a great deal of research in the domain of transport systems has been conducted to improve safety conditions via the entire or partial automation of driving cars. Unlike the development of Unmanned Air-Vehicles (UAVs), or autonomous robots, a “driverless” car will oftentimes, in reality, have to operate in the presence of an actual driver. This is because most people would prefer to drive their own vehicles on short trips with only driver assist systems active; while leaving long distance driving up to an autonomous system which would take integrate the driver assist features into a complete vehicle control system. The primary motivation of course behind having driver assist systems is to make driving safer, whilst a secondary consideration is to approach the development of autonomous passenger vehicles with a “driver-friendly” attitude. An integrated systems approach is mostly likely to yield viable on-road autonomous vehicle solutions in the future, with each subsystem independently aiding the driver under daily driving conditions, and also prepared to take over full control during long haul trips. Such an approach dictates the design and optimization of standalone driver assist components that can later be networked together.
O.O. Khalifa (*) Department of Electrical and Computer Engineering, Faculty of Engineering, International Islamic University Malaysia, Jalan Gombak, 53100 Kuala Lumpur, Malaysia e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_28, # Springer Science+Business Media, LLC 2011
367
368
O.O. Khalifa et al.
Fig. 28.1 Typical road detection system
28.2
System Overview
One of the most important subsystems required in any autonomous vehicle prototype is the road detection and lane following system. This system is required for the vehicle to stay within its lane and prevent collisions due to inappropriate lane departure. The most commonly investigated technology in the context of road detection involves computer vision based systems. Two main challenges in the design of such systems are the requirement of a robust algorithm that not only works well for a variety of environmental situations, but is also capable of operating in real-time for hardware deployment (Hulse et al. 1998). Typically, a road detection computer vision system would have to deal with a significant amount of rapidly changing noise structures which may disturb the process of detection such as the presence of other vehicles on the same lane, partially occluded road markings ahead of the vehicle, or the presence of shadows caused by trees or buildings. Difficult weather conditions such as rain, fog or overcast skies may further deter the system’s ability to correctly identify road markings. Coping with these situations requires robust image processing algorithms. However, complex algorithms that need to be applied to the already vast amount of incoming video data would slow down the rate of lane detection significantly. A typical road detection system, as shown in Fig. 28.1, would focus on achieving the two-fold requirements of robust and fast road lane detection in video sequences by using a combination of good segmentation techniques in order to reduce the amount of processing required and robust lane modeling for accuracy.
28.3
Related Work
Researchers have approached the issue of road detection from various directions, and for many different applications. Broadly, the applications include vehicle control systems, lane departure warning systems and driver assistance systems. Each proposed technique and application has its own constraints. For example, developing a road detection system for vehicle control would require continuous, real-time processing of video data in order to extract the lane model parameters. However, a lane departure warning system could afford to process at a lower frame rate, and merely needs to detect when an estimate of the lane edge approaches the image centre line. Due to these different application constraints, a comparative method of evaluation of developed systems is difficult. Some studies provide only visual
28
A Hyperbola-Pair Based Road Detection System for Autonomous Vehicles
369
proof of road detection on single frames (Dickmanns and Mysliwetz 1992), while others derive error histograms and standard deviations for model fitting accuracy (Park et al. 2301). However, this confusion over standard accuracy metrics is not confined to road detection systems, but exists in several video processing applications. This is especially true when the ground truth has no objective measure (e.g. color segmentation), or when it is too tedious to identify (manual road detection in each frame of a long video sequence). It remains a point of speculation whether standard metrics will ever be agreed upon, given the subjective nature of these video processing applications (Wang et al. 1998).
28.3.1 Segmentation and Feature Extraction The purpose of segmentation and feature extraction is to identify the regions of interest in an image so that the remainder of the data can be ignored and processing sped up. Since white or yellow lane markings are sharply contrasted with the black background of a road, a majority of systems use edge detection methods to segment the image (Dickmanns and Mysliwetz 1992; Lee and Kwon 2005; Kwon and Lee 2002; Kang and Jung 2003). Feature analysis of the surrounding pixels further helps to identify the most likely edges that make up the lane markings. Considering the orientation of edges would be helpful due to the fact that most lane edges will exhibit local parallelization (Lee and Kwon 2005; Kwon and Lee 2002; Kang and Jung 2003); as would an analysis of the image gradient since the highest contrast is likely at lane markings (Ma et al. 2000; Heimes and Nagel 2002). Whilst simpler and faster techniques based on adaptive thresholding of pixel intensity (Bertozzi and Broggi 1998), and even texture analysis (Kluge and Thorpe 1995) has been used, they are not very robust and often fail in the presence of shadows or occlusions.
28.3.2 Road Models All road detection systems attempt to model the road and lane markings so that these parameters can be used to estimate lane departure. Nearly all road models are geometric in nature and most systems use deformable templates in order to fit the most likely lane edges to the road model. The simplest of models describe the lane markings as being either linear or piecewise linear (Lee and Kwon 2005; Kwon and Lee 2002). However, this obviously does not work well on roads with a high degree of curvature. For such situations, parabolic or piecewise parabolic models (Kluge and Thorpe 1995; Kreucher and Lakshmanan 1990) have shown to be promising, at least
370
O.O. Khalifa et al.
with a singly curved road. These models require only parameter calculation for the position and curvature of the parabola. Circular models have also been developed, in which the lanes are considered as concentric circles in the flat (ground) plane (Ma et al. 2000; Kluge and Thorpe 1995; Pomerleau and Jochem 1996; Taylor et al. 1999). When viewed on the image, these concentric circles are seen as parabolic in shape. Parameters for circular models are much easier to obtain, but rely on precise camera calibration with real world coordinates. However, calibration becomes very difficult as roads are rarely flat but rather undulate; thereby causing continuous mismatch between the image and real world coordinates. Roads with more than one inflexion point require higher order curve models. The Catmull-Rom spline (Wang et al. 1998) structure offers third and higher order degree curves by interpolating between “control-points” in an image that can be found by scanning only the inflexion of the edges in the lower half of the image. Clothoids (Dickmanns and Mysliwetz 1992; Nedevschi et al. 2004) allow complex curvature to be estimated directly from the curvature angle itself. Work on this class of curves has been extended to three dimensional models which take into account the road pitch and roll angles as well (Dickmanns and Mysliwetz 1992). Some novel systems do not use geometric modeling, but rather employ preferential clustering algorithms directly on edges to form estimates of road shape and curvature. Neural networks (Pomerleau et al. 1995; Baluja 1996) and particle filtering (Southhall et al. 2001; Apostoloff et al. 2003) are two techniques for learning road curvature that have received serious attention. However, the general outcome from these limited experiments seems to be that clustering algorithms require too much computation and training, and are easily confused when environmental conditions affect the input feature set.
28.4
Proposed Algorithm
In the proposed system, a camera is fixed on the front windshield to capture the road scene. The algorithm first converts the image to a grayscale. Next, due to the presence of noise in the image, the Finlayson-Hordley-Drew (FHD) algorithm (Graham et al. 2001) is applied to make edge detection more accurate. After this, the edge detector is used to produce an edge image by a threshold canny filter. The edge image sent to the line detector after detecting the edges which will produce the right and left lane boundary segments. The projected intersection of these two line segments is determined and is referred to as the horizon (Fig. 28.2). The lane boundary scan uses the information in the edge image detected by the Hough transform to perform the scan. The scan returns a series of points on the right and left side. Finally, a pair of hyperbolas is fitted to these data points to represent the lane boundaries. For visualization purposes the hyperbolas are displayed on the original color image. The algorithm structure is shown in Fig. 28.1.
28
A Hyperbola-Pair Based Road Detection System for Autonomous Vehicles
371
Fig. 28.2 Overview of algorithm
28.4.1 Noise Reduction and Edge Detection As presence of noise in our system will affect edge detection, noise removal is very important. The F.H.D. algorithm has been shown to remove strong shadows from a single image. The basic idea is that since shadows have a distinguished boundary, removing the shadow boundary from the image derivatives and reconstructing the image should remove the entire shadow. Lane boundaries are defined by sharp contrast between the road surface and painted lines or some type of non-pavement surface and obviously form edges in the image. Thus, a Canny edge detector was employed in determining the location of lane boundaries (Fig. 28.3).
28.4.2 Line Detection and Segmentation The line detector used is a standard Hough transform that limits the search space to 45 for each side. The rationale behind this restriction is that lane boundaries rarely ever approach lower angles such as a horizontal line. Further, the input image is split in left and right halves, with the horizon line being formed by the intersection of the lines detected in the sub-images (Fig. 28.4).
372
O.O. Khalifa et al.
Fig. 28.3 Canny edge detection
Fig. 28.4 Restricted Hough transform
28.4.3 Lane Boundary Scan The lane boundary scan phase uses the edge image the Hough lines and the horizon line as input. The scan begins at the bottom of the image, where the projected Hough lines intersect the image border. From this starting point, the search begins a number of pixels towards the center of the lane and looks for the first edge pixel within a certain distance. This occurs iteratively until reaching the centre of the image. An extra buffer zone of the search will help facilitate the following of outward curves of the lane boundaries (Fig. 28.5).
28
A Hyperbola-Pair Based Road Detection System for Autonomous Vehicles
373
Fig. 28.5 Horizon and detected boundaries
28.4.4 Hyperbola Model Fitting Chen and Wang (Chen et al. 2006) devised a method of determining and estimating the lane boundaries of a road by assuming that the lane boundaries themselves were pairs of hyperbolas in the left and right sub-images. They developed this algorithm from the hyperbola-pair model first proposed by (Kluge et al. 1994) and specified a hyperbola-pair described by Eq. 28.1, passing through point (u, v) in the image plane where the horizon line is h, with eccentricity k and curvature b(l) and b(r)for the left and right hyperbola pairs respectively. u¼
k þ bðlÞ ðv hÞ þ bðrÞ ðv hÞ þ h vh
(28.1)
Since the interpolation needs to be done for a large number of points, this model is cast into matrix form as in Eqs. 28.2–28.4 and solved numerically using a least squares technique. Convergence of the solutions will be the best proof of a correctly identified hyperbola-pair as both hyperbolas are on the image, and are thus assumed to have the same eccentricity and curvature. AX ¼ B
(28.2)
374
O.O. Khalifa et al.
2 A ¼
1 vi ðlÞ h 6 1 6 viþ1 ðlÞ h 6 6 1 4 vi ðrÞ h 1 viþ1 ðlÞ h
X ¼ ½ k
bðlÞ
vi ðlÞ h viþ1 ðlÞ h 0 0 bðrÞ
3 1 7 17 7 vi ðrÞ h 1 7 5 viþ1 ðrÞ h 1 0 0
T
c
B ¼ ½ u1 ðlÞ um ðlÞ ; v1 ðrÞ vm ðrÞ T
(28.3)
(28.4) (28.5)
The model parameters will be found after Eq. 28.2 has been solved iteratively for all edge pixels that were detected in the left and right sub-image from the restricted Hough transform.
28.5
Results
For testing purposes, the algorithm was implemented by interfacing the MATLAB 2010Ra Image Acquisition Toolbox with the high data rate Prosilica GX1910 on a Sony Vaio laptop with Intel ® Core 2 Duo 2.8 GHz processor and 6 GB RAM. All raw and processed video was recorded while the algorithm was working in realtime at highways and normal roads, with dashed markings; and straight and curved roads in different environmental conditions. Figure 28.6a shows normal road driving conditions; Fig. 28.6b depicts an occluded scene with shadows; Fig. 28.6c shows the output during night time; and Fig. 28.6d shows output during in rainy weather. It has been previously mentioned that accuracy in the context of road detection is a difficult metric to determine. Thus, only the detection rate of the system is given in Table 28.1. The detection rate of the algorithm is evaluated by noting the number of frames in which the hyperbola-pair parameters converge as compared to the total number of frames in the video sequence. It is quite clear from Table 28.1 that higher resolution images, although offering the best detection rate, exponentially increase the processing time required.
Fig. 28.6 Algorithm output in different conditions
28
A Hyperbola-Pair Based Road Detection System for Autonomous Vehicles
375
Table 28.1 Processing times and detection rates at various image resolutions Image resolution Processing times Detection rate 128 x 128 20 ms 65% 160 x 160 26 ms 69% 314 x 235 33 ms 73% 448 x 336 38 ms 87% 640 x 480 46 ms 89% 800 x 600 1.1 s 94% 1024 x 768 1.8 s 96%
Table 28.2 Detection rates for various lane types during day Road type Number of frames Success Failure Detection rate Highway 120 102 18 85% Single lane 120 110 10 91.6% Multi-lane 120 97 23 80.8%
Table 28.3 Detection rates for various lane types during night Road type Number of frames Success Failure Detection rate Highway 120 94 26 78% Single lane 120 100 10 83.3% Multi-lane 120 81 39 67.5%
An appropriate trade-off between processing time and image resolution seems to occur at a resolution of 640 x 480, and it is at this resolution that the remainder of the experiments were conducted. Furthermore, this resolution offers reasonably fast processing that can be used for real-time applications. Table 28.2 shows the detection rate for different lane types during day, and Table 28.3 shows the detection rate for different lane types during night. From the tables, it is clear that overall the algorithm works better during day than night and is better at detection the lanes on highways rather than multi-lane roads. In particular, multi-lane detection during night is quite poor due to the fact that edges are more difficult to extract from the poorly illuminated scene of a multi-lane road. Lane detection on highways is still quite reasonable, as they are better lit.
28.6
Conclusion and Summary
This paper tackles the issue of road and lane detection within the context of driver assist systems for autonomous vehicles. The requirements for these systems are analyzed and the challenges present are investigated in light of current techniques employed in the literature. The different processing steps involved are expounded,
376
O.O. Khalifa et al.
with particular emphasis placed on the restricted Hough transform method for segmentation and data reduction; and the hyperbola-pair based model for road detection and estimation. The proposed lane detection algorithm can be applied in variously painted roads (highway/single/multi lane marking), due to the robustness of the hyperbola-pair model. The results of several experiments were obtained and analyzed which provide an insight into the inherent trade-off between processing time and accuracy in the algorithm. Additionally, the robustness of the system during day and night conditions was also experimented. There remain some short-comings primarily in multi-lane lane detection during night and Hough line horizon overlaying with the lane boundary points. This overlay largely depends on the camera orientation although an algorithm to tackle this problem can be considered as a future direction of this work. Acknowledgement The authors of this paper would like to thank the Research Management Center at IIUM for their financial support under the eScience fund.
References Apostoloff N, Zelinsky A (2003) Robust vision based lane tracking using multiple cues and particle filtering. Proceedings of the ieee intelligent vehicals symposium, Columbus, OH, pp 558–563 Baluja S (1996) Evolution of an artificial neural network based autonomous land vehicle controller. IEEE Trans Syst 26(3):450–463 Bertozzi M, Broggi A (1998) GOLD: A parallel real-time stereo vision system for generic obstacle and lane detection. IEEE Trans Image Process 7(1):62–81 Chen Q, Wang H (2006) A real-time lane detection algorithm based on a hyperbola-pair model. Intelligent vehicles symposium, Jun 13–25 Dickmanns ED, Mysliwetz BD (1992) Recursive 3-D road and relative ego-state recognition. IEEE Trans Pattern Anal Mach Intel 14(2):199–213 Graham D, Finlayson SD, Hordley D, Mark DS (2001) Removing shadows from images, School of Information Systems, pp 1–14, (unpublished) Heimes F, Nagel HH (2002) Towards active machine-vision –based driver assistance for urban areas. Int J Comp Vis 50(1):5–34 Hulse MC (1998) Development of human factor guidelines for advanced traveler information systems and commercial vehicle operations: identification of the strengths and weaknesses of alternative information display formats. Federal Highway Administration, Washington DC, pp 96–142 Kang DJ, Jung MH (2003) Road lane segmentation using dynamic programming for active safety vehicles. Pattern Recogn Lett 24(16):3177–3185 Khalifa OO, Khan IM, Assidiq AAM, Abdulla A-H, Khan S (2010) A hyperbola-pair based lane detection system for vehicle guidance. Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science (WCECS 2010), San Francisco, USA, 20–22 Oct 2010, pp 585–588 Kluge K (1994) Extracting road curvature and orientation from image edge points without perceptual grouping into features. Proceedings of the IEEE intelligent vehicles symposium, pp 109–114 Kluge K, Thorpe C (1995) The YARF system for vision-based road following. Math Comput Model 22(4–7):213–233
28
A Hyperbola-Pair Based Road Detection System for Autonomous Vehicles
377
Kreucher C, Lakshmanan S (1990) LANA: A lane extraction algorithm that uses frequency domain features. IEEE Trans Robot Automation 15(2):343–350 Kreucher C, Lakshmanan S, Kluge K (1998) A driver warning system based on the LOIS lane detection algorithm. Proceedings of the IEEE International Conference on Intelligent Vehicles, Stuttgart, Germany, pp 17–22 Kwon W, Lee S (2002) Performance evaluation of decision making strategies for an embedded lane departure warning system. J Robot Syst 19(10):499–509 Lee S, Kwon W (2005) Robust lane keeping from novel sensor fusion. Proc IEEE Int Conf Robotics Automation 4:3704–3709 Ma B, Lakshamanan S, Hero AO (2000) Simultaneous detection of lane and pavement boundaries using model-based multi-sensor fusion. IEEE Trans Intel Transport Syst 1(3):135–147 Nedevschi S et al (2004) 3D lane detection system based on stereovison. IEEE intelligent transportation systems conference, Washington, DC, 3–6 Oct 2004 Park JW, Lee JW, Jhang KY (2003) A lane-curve detection based on an LCF. Pattern Recogn Lett 24(13):2301–2313 Pomerleau D (1995) Neural network vision for robot driving. In: Arbib M (ed) The hand-book of brain theory and neural networks. MIT Press, Cambridge, MA Pomerleau D, Jochem T (1996) Rapidly adapting machine vision for automated vehicle steering. IEEE Expert-Special Issue Intel Syst Appl 11(2):19–27 Southhall B, Taylor CJ (2001) Stochastic road shape estimation. Proceedings of the international conference on computer vision, pp 205–212 Taylor C, Kosecka J, Blasi R, Malik J (1999) A comparative study of vision-based lateral control strategies for autonomous highway driving. Int J Robot Res 18(5):442–453 Wang Y, Shen D, Teoh EK, Wang H (1998) A novel lane model for lane boundary detection, IARP workshop on machine vision application, 17–19 Nov 1998
Chapter 29
Application of the Real-Time Concurrent Constraint Calculus M. Gerardo and M. Sarria
29.1
Introduction
The rtcc calculus (Sarria and Rueda 2008; Sarria 2010) is a ccp-based formalism (Saraswat 1993), extension of the ntcc calculus (Palamidessi and Valencia 2001). rtcc is obtained from ntcc by adding constructs for specifying strong preemption and delay declarations, and by extending the transition system with support for resources, limited time and true concurrency. This calculus allows modeling real-time and reactive behaviour. In reactive systems, time is conceptually divided into discrete intervals (or time units). In a time interval, a process receives a stimulus from the environment, it computes (reacts) and responds to the environment. A reactive system is shown in Fig. 29.1. To model real time, we assume that each time unit is a clock-cycle in which computations (internal transitions) involving addition of information to the store (tell operations) and querying the store (ask operations) take a particular amount of time dependent on the constraint system. A discrete global clock is introduced and it is assumed that this clock is synchronized with the physical time (i.e. two successive time units in this calculus correspond exactly to two moments in the physical time). We also assume that the environment provides the exact duration of the time unit. That is, processes may not have all the time they need to run, instead, if they do not reach their resting point in a particular time, some (or all) of their computations not done will be discarded before the time unit is over. The duration will be then the available time that processes have to execute. We will take this available time as a natural number; this allows to think of time as a discrete sequence of minimal units that we will call ticks.
M. Gerardo (*) AVISPA Research Group., Pontificia Universidad Javeriana, Cali, Colombia e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_29, # Springer Science+Business Media, LLC 2011
379
380
M. Gerardo and M. Sarria
i1
o1
i2
o2
i3
o3
P1
P2
P3
k1
k2
k3
...
Fig. 29.1 Reactive system
Now, since the temporal behaviour of a real-time system depends not only on delays due to process synchronization, but also on the availability of shared resources (as it is said in Bre´mond-Gre´goire and Lee 1997), we assume that the environment also provides a number r of resources. Each process P takes some of these. When P is finished, it releases them. Then in the case of rtcc the stimulus ii provided by the environment of the reactive system is a tuple consisting of a constraint representing the initial store, the available number of resources and the duration of the time unit, and the response oi of the process is another tuple consisting of a constraint representing the final store, the maximum number of resources used in calculations and the time spent in them. Formally, we can say that for each Pi there is an stimulus hdi, ri, tii and a response hd0i, r0i, t0i i in the time unit ki.
29.2
The Calculus
Here we describe the syntax and the operational semantics for rtcc. We begin by introducing the notion of constraint system, very important in ccp-based calculi.
29.2.1 Constraint System The rtcc processes are parameterized in a constraint system which specifies what kind of constraints handle the model. Formally, it is a pair (S, D) where S is a signature (a set of constants, functions and predicates) and D is a first order theory over S (a set of first-order sentences with at least one model). Given a constraint system, the underlying language L of the constraint system is a tuple (S; V; S), where V is a set of variables, and S is a set with the symbols :; ^; _; ); 9; 8 and the predicates true and false. A constraint is a first-order formulae constructed in L. A constraint centails a constraint d in D, notation c 6Dd, iff c ) d is true in all models of D. The entailment relation is written 6 instead of 6D if D can be inferred from the context.
29
Application of the Real-Time Concurrent Constraint Calculus
381
For a constraint system D, the set of elements of the constraint system is denoted by ∣D∣ and ∣D∣0 represents its set of finite elements. The set of constraints in the underlying constraint system will be denoted by C.
29.2.2 Process Syntax Processes communicate with each other by posting and reading partial information (constraints) about the variables of the system they model. This partial information resides in common store of constraints. Henceforth the conjunction of all posted constraints will be simply called the store. Proc is defined as the set of all rtcc processes. The Processes P, Q, . . . ∈ Proc are built from constraints c 2 C and variables x 2 V in the underlying constraint system by the following syntax: X when ci do Pi j P k Q j local x in P P; Q; . . . ::¼ tellðcÞ j i2I j j
unless c next P j catch c in P finally Q j next P delay P for d j !P j ? P
Intuitively, the process tell (c ) adds constraint c to the store within the current time unit. The ask process when c do P is generalized with a non-deterministic choice of the form ∑i ∈ I when ci do Pi (I is a finite set of indices). This process, in the current time unit, must non-deterministically choose one of the Pj (j ∈ I) whose corresponding guard constraint cj is entailed by the store, and execute it. The non-chosen processes are precluded. Two processes P and Q acting concurrently are denoted by the process P k Q. In one time unit P and Q operate in parallel, communicating through the store by telling and asking information. The “ k ” operator is defined as left associative. The process local x in P declares a variable x private to P (hidden to other processes). This process behaves like P, except that all information about x produced by P can only be seen by P and the information about x produced by other processes is hidden to P. The weak time-out process, unless c next P, represents the activation of P the next time unit if c cannot be inferred from the store in the current time interval (i.e. d 6 c ). Otherwise, P will be discarded. The strong time-out process, catch c in P finally Q, represents the interruption of P in the current time interval when the store can entail c ; otherwise, the execution of P continues. When process P is interrupted, process Q is executed. If P finishes, Q is discarded. The execution of a process P can be delayed in two ways: with delay P for d1 the process P is activated in the current time unit but at least d ticks after the beginning of the time unit, whilst with next P the process P will be activated in the next time interval. The operator “!” is used to define infinite behaviour. The process !P represents P k next P k next (next P) k . . ., (i.e. !P executes P in the current time
382
M. Gerardo and M. Sarria
unit and it is replicated in the next time interval). An arbitrary (but finite) delay is represented with the operator “ ⋆ ”. The process ⋆ P represents an unbounded but finite P þ next P þ nextðnext PÞ þ . . . , (i.e. it allows to model asynchronous behaviour across the time intervals). The guarded-choice summation process ∑i ∈ I when ci do Pi is actually the abbreviation of when ci1 do Pi1 þ . . . þ when cin do Pin where I ¼ {i1, . . ., in}. The symbol “ + ” is used for binary summations (similar to the choice operator from CCS Milner 1980). If there is no ambiguities, the “when c do ” can be omitted when c ¼ true, that is, ∑i ∈ I Pi. The process that do nothing is skip. The inactivity process is defined as the empty summation ∑i ∈ Ø Pi. This process is similar to process 0 of CCS and STOP of CSP (Hoare 1985). Furthermore, terminated processes will always behave like skip. We write ∏i ∈ I Pi, where I ¼ {i1, . . ., in} to denote the parallel composition of all the Pi, that is, Pi1 k . . . k Pin . When process Q is skip, the “finally Q ” part in process catch c in P finally Q can be omitted, that is, we can write catch c in P. A nest of delta delay processes such as delay (delay P for d1) for d2 can be abbreviated to delay P for d1 + d2. Notation next nP (where next is repeated n times) is written to abbreviate the process next (next (. . . (next P) . . . )). A bounded replication and asynchrony can be specified using summation and product. !I P and ⋆IP are defined as abbreviations for ∏i ∈ I nexti P and ∑i ∈ I nexti P, respectively. For example, process ![m, n]P means that P is always active between the next m and m + n time units.
29.3
Operational Semantics
The operational semantics can be formally described by means of a transition system conformed by the set of processes Proc, the set of configurations G and transition relations ! and ) . A configuration g is a tuple hP, d, ti where P is a process, d is a constraint in C representing the store, and t is the amount of time hri
left to the process to be executed. The transition relations ! ¼ f!; r 2 Zþ g and ) are the least relations satisfying the rules in Tables 29.1 and 29.2. r The internal transition rule hP; d; ti ! hP0; d0; t0 i means that in one internal time using r resources process P with store d and available time t reduces to process P0 with store d0 and leaves t0 time remaining. We write hP, d, ti ! hP0 , d0 , t0 i (omitting the “r ”) when resources are not relevant. ði;oÞ The observable transition rule P ¼ ¼) Q means that process P given an input i from the environment reduces to process Q and outputs o to the environment in one time unit. Input i is a tuple consisting of the initial store c, the number of resources
29
Application of the Real-Time Concurrent Constraint Calculus
Table 29.1 Internal transition rules of rtcc
Table 29.2 Observable transition rule of rtcc
383
384
M. Gerardo and M. Sarria
available r within the time unit and the duration t of the time unit. Output o is also a tuple consisting of the resulting store d, the maximum number of resources r0 used by processes and the time spent t0 by all process to be executed. An observable transition is constructed from a sequence of internal transitions. It is assumed that internal transitions cannot be directly observed. Now we are going to explain the transitions rules in Tables 29.1 and 29.2. A tell process adds a constraint to the current store and terminates, unless there is not enough time to execute it (in this case it remains blocked). The time left to other processes after evolving is equal to the time available before the transition less the time spent by the constraint system to add the constraint to the store. The time spent by the constraint system is given by functions FT ; FA : jDj0 jDj0 ! Z f0g (FT (c, d) approximates the time spent in adding constraint c to store d, and FA (c, d) estimates the time querying if the store d can entail a constraint c). In addition, execution of a tell operation requires one resource. The rule for a choice says that the process chooses one of the processes whose corresponding guard is entailed by the store and execute it, unless it has not enough time to query the store in which case it remains blocked. Computation of the time left is as for the tell process. The store in this operation is not modified. It consumes one resource unit. The first rule of parallel composition says that both processes P and Q executes concurrently if the amount of resources needed by both processes separately is less than or equal to the number of resources available. The resulting store is the conjunction of the output stores from the execution of both processes separately. This process terminates iff both processes do. Therefore, the time left is the minimum of those times left by each process. The second and third rules affirm that in a parallel process, only one of the two processes can evolve due to the number of resources available. To define the rule for locality, following (de Boer et al. 1995), we extend the construct of local behaviour to local x, c in P to represent the evolution of the process. Variable c is the local information (or store) produced during the evolution. Initially, c is empty, so we regard local x in P as local x, true in P. The rule for locality says that if P can evolve to P0 with a store composed by c and information of the “global” store d not involving x (variable x in d is hidden to P ), then the local... in P process reduces to a local ... in P0 process where d is enlarged with information about the resulting local store c0 without the information on x (x in c0 is hidden to d and, therefore, to external processes). In a weak time-out process, if c is entailed by the store, process P is terminated. Otherwise it will behave like next P. This will be explained below with the rule for observations. For a strong time-out, a process P ends its execution (and another process Q starts) if a constraint c is entailed by the store. Otherwise it evolves but asking for entailment of constraint persists. The two rules for delaying state that a process delay P for d delays the execution of P for at least d ticks. Once the delay is less than the current internal time (T represents the duration of the time-unit given by the environment), the process
29
Application of the Real-Time Concurrent Constraint Calculus
385
reduces to P (i.e. it will be activated). In each transition this process does not consume any resource. The replication rule specifies that the process P will be executed in the current time unit and then copy itself (process !P ) to the next time unit. The rule for asynchrony says that a process P will be delayed for an unbounded but finite time, that is, P will be executed some time in the future (but not in the past). The rule that allows to use the structural congruence relation defined below states that structurally congruent configurations have the same reductions. Finally, the rule for observable transitions states that a process P evolves to R in one time unit if there is a sequence of internal transitions starting in configuration hP, c, ti and ending in configuration hQ, d, t0 i. Process R, called the “residual process”, is constituted by the processes to be executed in the next time unit. The latter are obtained from Q by applying the future function defined as follows: Let F : Proc ! Proc be defined by 8 > R > > > > > > > > < FðQ Þ k FðQ Þ 1 2 FðQÞ ¼ > catch c in FðRÞ finally S > > > > > local x in FðRÞ > > > : skip
if Q ¼ next R or Q ¼ unless c next R if Q ¼ Q1 k Q2 if Q ¼ catch c in R finally S if Q ¼ local x; c in R Otherwise
To simplify the transitions, a congruence relation is defined. Following (Saraswat 1993), we introduce the standard notions of contexts and behavioural equivalence. Informally, a context is a phrase (an expression) with a single hole, denoted by [ ], that can be plugged in with processes. Formally, processes context C is defined by the following syntax: C ::¼ ½
j
when c do C þ M
j
CkC
j
local x in C
j j j
unless c next C delay C for d !C
j j j
catch c in C finally C next C ?C
where M stands for summations. : Two processes P and Q are equivalent, notation P ¼ Q, if for any context C, : : P ¼ Q implies C½P ¼ C½Q. Let be the smallest equivalence relation over processes satisfying: 1. P Q if they only differ by a renaming of bound variables 2. P k skip skip k P P
386
3. 4. 5. 6. 7.
M. Gerardo and M. Sarria
PkQ QkP next skip skip local x in skip skip local x y in P local y x in P local x in next P next (local x in P) We extend to configurations by defining hP, c, ti hQ, c, ti iff P Q.
29.4
Application
In this section we illustrate the application of the rtcc calculus by modeling two real-time scenarios.
29.4.1 Railway Level Crossing The “railway level crossing” is a problem proposed by S. Schneider in Schneider (2000) thus: One road and one railway line cross each other, and as usual there is a gate which can be lowered to prevent cars crossing the railway. If the gate is raised, then cars can freely cross the track. Trains can cross the road regardless of whether the gate is up or down. There should never be a train and a car on the crossing at the same time.
This problem can be modeled with the following main processes: Car, Train, Gate. Process Train supply the stream of trains: def
Train ¼ next Train þ tellðnear ¼ trueÞ k next tellðout ¼ trueÞ k next2 Train A train moves constantly until it is approaching to the crossing; we model this by choosing non-determiniscally continue with the process Train the next time unit (meaning the train is far from the crossing) or posting the constraint near ¼ true into the store. In moment it is approaching it sends the signal (post the constraint) in order to get the gate start down. Once the train pass the crossing entirely (we assume it takes a time unit) it sends another signal in order to get the gate start up. Process Car supply the stream of cars: def
Car ¼ catch gate ¼ down in ð! tellðmove ¼ trueÞÞ finally tellðstop ¼ trueÞ k unless gate ¼ down next Car
29
Application of the Real-Time Concurrent Constraint Calculus
387
A car is moving until the gate is down. In this case it has to stop and wait until the gate is up again in order to move. To simplify the model we assume uniform linear motion (no acceleration) for both cars and trains. The following process models the gate: def
Gate ¼ ! when near ¼ true do delay tellgate ¼ downÞ for 20 k ! when out ¼ true do delay tellgate ¼ upÞ for 20 When the train gives a signal, Gate captures it and performs the action of liftingup or lifting-down the gate. We model the duration of this actions by delaying the signals given to the cars for 20 ticks. The reader may notice that there would be time units where neither near ¼ true nor out ¼ true cannot be inferred from the store. This is not a security issue since this means that train is not coming or passing so cars may cross the railway (since the gate ¼ down signal is not in the store). To model the whole system we simply launch the process Train k Car k Gate This real-time problem was previously modeled in Alpuente et al. (2006) using another real-time ccp-based calculus called tccp (de Boer et al. 2000).
29.4.2 Factor Oracle Interactive systems are those systems in which their components exchange elements. An improvisation situation involving more than one machine, for instance, can be seen as an interactive system where a machine performs some action, the others learn his method and features, organize them in a model, and generate and perform another action consistent with the one of the first machine. A factor oracle is a finite state acyclic automaton introduced in Allauzen et al. (1999) for string matching on fixed texts. If a word is a sequence w ¼ s1s2. . . sn of letters (belonging to an alphabet S), then the factor oracle is a data structure representing at least all the factors of w. A sequence p ∈ S * is a factor of w iff w can be written w ¼ qpr, with q, r ∈ S* . The factor oracle is built from a word w ¼ s1s2. . . sn by creating n + 1 nodes (one for every si with i ∈ [1. . n] plus the initial state), and inserting an arrow labelled with symbol si linking the node i with the node i + 1 (these are called factor links). Depending on the structure of the word, other arrows from state i to a state j (0 i < j n) are added to the automaton constituting the remaining factors. Finally, backwards links (called suffix links) are added to the automaton from a state i to a state j (0 j < i n) holding the possibility of traverse all the factors in
388
M. Gerardo and M. Sarria a a
0
a
1
b
2
b
3
b
4
a
5
a
6
a
b
Fig. 29.2 Factor oracle automaton for w ¼ abbbaa.
a single path (suffix links connect repeated patters of w). Figure 29.2 shows the factor oracle of the word abbbaa. The dot-lined arrows represents the suffix links. In Allauzen et al. (1999) an algorithm for constructing the automaton on-line was also presented. This allows to build a factor oracle by reading the letters of a word one by one from left to right. With the on-line construction algorithm many applications can be modeled. For example in a concurrent learning/improvisation situation it is possible to construct a machine improviser. We can think of this improviser as a system consisting in three phases running concurrently: learning, improvisation and performance. In the learning phase the on-line construction algorithm is used to build the factor oracle during the performance of a machine. A word can denote many structures, the simplest is, perhaps, a sequence of actions, one action per time. Then every action the machine performs is an input for the algorithm and, ultimately, a new state in the oracle. In the improvisation phase, the system has to make a choice of what path should be traversed in the current automaton. The automaton used must be stable, that is, all factor and suffix links until the last state added must be present. We assume that the choice is non-deterministic in order to make a good improvisation. Finally, in the performance phase the sequence of actions of the path choose is performed. This model in rtcc is built with processes for each phase. The learning phase is modeled with a process posting constraints which define the automaton. Variable si denotes the label of the factor link connecting a state i 1 to a new state i. Process Learni adds the new state i and build the factor and suffix links with this new state. def
Learni ¼ ! tellðdi1;si ¼ iÞ k BuildFOi ðSi1 Þ Variable dk;si denotes the state reached by following a factor link si from state k in the automaton. Variable Si denotes the state reached by following a suffix link from state i. The process BuildFOi adds all the factor links taking into account the state i, and also adds the suffix links from it.
29
Application of the Real-Time Concurrent Constraint Calculus
389
def
BuildFOi ðkÞ ¼ when k 0 do unless si 2 f k nextð! tellðsi 2 f k Þ k ! tellðdk;si ¼ iÞ k BuildFOi ðSk ÞÞ k when k ¼ 1 do ! tellðSi ¼ 0Þ k when k 0 ^ si 2 f k do ! tellðSi ¼ kÞ Variable fk denotes the set of labels of all currently existing factor links going from k. Note that the automaton is built from latter states to the first ones. This can be a problem because the system could be too slow and there is the possibility of not building the whole factor oracle (recall that the duration of a time unit is limited and in each time unit i the automaton is bigger than the one in time unit i 1, so the store is also bigger which, normally, increase the time spent for posting constraints). We can avoid this by taking into account the current tick and execute the process if there is time, but if time is up then the construction of the factor oracle must continue in the next time unit. def
BuildFOi ðkÞ ¼ when k 0 do unless si 2 f k nextð! tellðsi 2 f k Þ k ! tellðdk;si ¼ iÞ k delay tellðtimeup ¼ 1Þ for t k catch timeup ¼ 1 in BuildFOi ðSk Þ finally nextðBuildFOi ðkÞÞ k when k ¼ 1 do ! tellðSi ¼ 0Þ k when k 0 ^ si 2 f k do ! tellðSi ¼ kÞ Variable timeup is the signal that specifies the nearly end of the time unit; we assume that t is the maximum tick in the time unit allowing to post the constraint timeup ¼ 1 and querying it. A machine is modeled as a process performing some action p every time unit and giving the signal for building the automaton at state j (the number of the current action). def
Machinej ¼
X
ð! tellðsj ¼ pÞ k tellðgo ¼ jÞ k nextðMachinejþ1 ÞÞ
p2S
The following improvisation process Impro defines the improvisation phase from state k of the automaton by choosing non-deterministically whether to perform an action (denoting by outputting the symbol sk + 1) or to follow a suffix link Sk and then perform an action (also chosen non-deterministically and denoting with symbol s 2 f Sk ). When Sk ¼ 1 there is only one choice: to perform the action symbolized by sk + 1.
390
M. Gerardo and M. Sarria def
ImproðkÞ ¼ when Sk ¼ 1 do nextðtellðout ¼ skþ1 Þ k Improðk þ 1ÞÞ k ðwhen Sk 0 do ðnextðtellðout ¼ skþ1 Þ k Improðk þ 1ÞÞÞ þ when Sk 0 do X nextð when s 2 f Sk do ðtellðout ¼ sÞ k ImproðdSk ;s ÞÞÞÞ s2S
Since the learning and the improvisation processes can run concurrently, we must guarantee that the improvisation works with a subgraph completely built. For this we define a process that synchronize the two phases on Si. def
Synci ¼ when Si1 1 ^ go i do ðLearni k nextðSynciþ1 ÞÞ k unless Si1 1 ^ go i nextðSynci Þ Finally, the whole system is modeled by launching all processes and initializing the first state. The parameter of the process System represents the number of actions that must be performed before starting the improvisation phase. def
Systemn ¼ ! tellðS0 ¼ 1Þ k Machine1 k Sync1 k ! when go ¼ n do ImproðnÞ A model using factor oracles for musical interactive systems was proposed in Assayag and Dubnov (2004), and built using the ntcc calculus in Rueda et al. (2006). The model presented here is similar to that model (finally, rtcc is an extension of ntcc and it remains all constructs), except for the fact that the model in this section is more realistic in the sense that it always consider the execution time of processes.
29.5
Concluding Remarks
In this paper we described the operational semantics of the rtcc calculus. This calculus belongs to the ccp family and is an strict extension of the ntcc calculus. rtcc extends ntcc to allow modeling systems with real-time behaviour. In order to guarantee real-time behaviour the operational semantics has a more realistic notion of time than any other ccp -based formalism and includes a transition system with support for expressing amount of resources and time allowance. We also illustrated the potentiality of the rtcc calculus with two applications: a railway level crossing problem and the factor oracle. Previously in Perchy and Sarria (2009) we showed the musical expressiveness of the rtcc calculus by modeling musical dissonances.
29
Application of the Real-Time Concurrent Constraint Calculus
391
References Allauzen C, Crochemore M, Raffinot M (1999) Factor oracle: a new structure for pattern matching. In: Conference on current trends in theory and practice of informatics. Springer-Verlag, London, pp 295–310 Alpuente M, Gallardo MM, Pimentel E, Villanueva A (2006) Verifying real-time properties of tccp programs. J Univers Comput Sci 12(11):1551–1573 Assayag G, Dubnov S (2004) Using factor oracles for machine improvisation. Soft Comput 8 (9):604–610 Bre´mond-Gre´goire P, Lee I (1997) A process algebra of communicating shared resources with dense time and priorities. Theor Comput Sci 189(1–2):179–219 de Boer FS, Gabbrielli M, Meo MC (2000) A timed concurrent constraint language. Info Comput 161(1):45–83 de Boer FS, Pierro A Di, Palamidessi C (1995) Nondeterminism and infinite computations in constraint programming. In: Selected papers of the workshop on topology and completion in semantics. Theoretical Computer Science, vol 151. Chartres, Elsevier, pp 37–78 Hoare CAR (1985) Communicating sequential processes. In: Prentice-Hall international series in computer science. Prentice Hall, Englewood Cliffs Milner R (1980) A calculus of communicating systems. In: Lecture notes in computer science. Springer-Verlag, Berlin/New York Palamidessi C, Valencia F (2001) A temporal concurrent constraint programming calculus. In: Seventh international conference on principles and practice of constraint programming. Lecture notes in computer science, vol 2239. Springer-Verlang, London, pp 302–316 Perchy S, Sarria G (2009) Dissonances: brief description and its computational representation in the rtcc calculus. In: 6th sound and music computing conference (SMC2009), Porto, Portugal Rueda C, Assayag G, Dubnov S (2006) A concurrent constraints factor oracle model for music improvisation. In: XXXII conferencia Latinoamericana de informa´tica CLEI 2006, Santiago de Chile Saraswat VA (1993) Concurrent constraint programming. In: ACM doctoral dissertation award. MIT, Cambridge, MA Sarria G (2010) Improving the real-time concurrent constraint calculus with a delay declaration. In: Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science WCECS 2010. San Francisco, California. Newswood Ltd, Hong Kong, pp 9–14 Sarria G, Rueda C (2008) Real-time concurrent constraint programming. In: 34th Latinamerican conference on informatics (CLEI2008), Santa Fe, Argentina Schneider S (2000) Concurrent and real-time systems: the CSP approach. Wiley, Chichester/ New York
Chapter 30
Teaching and Learning Routing Protocols Using Visual Educational Tools: The Case of EIGRP Jesu´s Expo´sito Marquina, Valentina Trujillo Di Biase, and Eric Gamess
30.1
Introduction
Visual learning is the use of graphics, images and animations to enable and enhance learning. It is a proven method in which ideas, concepts, data, and other information are associated with images and animations, resulting in an easier and more effective approach of transmitting skills. Students can understand theoretical concepts much easier if they can see them, or interact with them as in the real life. Visual learning uses methods that help students to open their minds and think graphically. For these reasons, the GUI (Graphical User Interface) is one of the most important parts of any didactic tool. It is the boundary between the application and users. It can be seen, it can be heard, and it can be touched. The piles of software code are invisible, hidden behind the screen, speaker, keyboard, and mouse. According to Galitz (2007), the goal of interface design is to make the working with a computer easy, productive, and enjoyable. These characteristics present in a teaching and learning application makes it extremely powerful and efficient to bring knowledge to users. As networking systems are becoming more complex, new curricula and learning tools are needed to help students to acquire solid skills about networking technology. In this paper, we discuss our teaching and learning experiences of routing protocols, specifically with EIGRP (Doyle and Carrol 2005; Pepelnjk 2000) (Enhanced Interior Gateway Routing Protocol), using visual educational applications, and we report how these applications can enhance, ease, and make the experience much more natural.
J.E. Marquina (*) Escuela de Computacio´n, Universidad Central de Venezuela, Los Chaguaramos 1040, Caracas, Venezuela e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_30, # Springer Science+Business Media, LLC 2011
393
394
J.E. Marquina et al.
To do so, we present some well-known tools (Packet Tracer, Dynamips/GNS3, and OPNET IT Guru) and a new application (Easy-EIGRP). We focus on those tools’ support to teach some advanced EIGRP concepts, such as: successors, feasible successors, composed metric, routing table, and topology table. The rest of the paper is organized as follows. In Sect. 30.2, we briefly introduce the Enhanced Interior Gateway Routing Protocol. Section 30.3 presents the CLI, a popular method to configure Cisco’s routers. Sections 30.4, 30.5, 30.6 and 30.7 present visual applications (Packet Tracer, Dynamips/GNS3, OPNET IT Guru, and Easy-EIGRP, respectively) for the teaching and learning of EIGRP. Conclusions and future work are discussed in Sect. 30.8.
30.2
Enhanced Interior Gateway Routing Protocol
As its name suggests, EIGRP is an enhanced version of IGRP (Interior Gateway Routing Protocol), an obsolete routing protocol that was developed by Cisco Systems. EIGRP (Doyle and Carrol 2005; Pepelnjk 2000) is an advanced distancevector protocol that implements some characteristics similar to those of link-state protocols. Some Cisco documentation refers to EIGRP as a hybrid protocol. EIGRP advertises its routing table to its neighbors as distance-vector protocols do, however it uses the Hello Protocol and forms neighbor relationships similar to link-state protocols. EIGRP sends partial updates when a metric or the network topology changes. It does not send full routing-table updates in periodic fashion as distancevector protocols do. EIGRP is a classless protocol that permits the use of VLSMs (Variable Length Subnet Masks) and supports CIDR (Classless Inter-Domain Routing) for a scalable allocation of IP addresses. EIGRP used five types of packets (Hello, Acknowledgment, Update, Query, and Reply) identified by the protocol number 88 in the IP header: • Hello: EIGRP sends hello packets in the neighbor discovery and recovery process. These packets are multicast to 224.0.0.10 and use unreliable delivery. • Acknowledgment (ACK): This packet acknowledges the reception of an update, query, or reply packet. It is a hello packet with no data. ACKs are unicast and use unreliable delivery. • Update: EIGRP uses update packets to propagate routing information about subnets. A router unicasts update packets to newly discovered neighbors; otherwise, it multicasts update packets to 224.0.0.10 when a link or metric changes. • Query: EIGRP sends query packets to find an alternate route to a subnet. Query packets can be unicast or multicast. • Reply: EIGRP sends reply packets to respond to query packets. Reply packets usually provide a feasible successor to the sender of the query. They are always unicast to the sender of the query packet. EIGRP uses a composite metric where bandwidth, delay, load, and reliability are weighted by scale values, also known as K-values (K1, K2, K3, K4, and K5).
30
Teaching and Learning Routing Protocols. . .
395
It is one of the most complex routing protocol’s metric, and it is seldom understood by students. K-values are integer values that can vary between 0 and 255. For K5 ¼ 0: K2 bandwidth þ K3 delay metric ¼ 256 K1 bandwidth þ 256 load For K5 6¼ 0:
K2 bandwidth metric ¼ 256 K1 bandwidth þ þ K3 delay 256 load
K5 K4 þ reliability
DUAL (Diffusing Update Algorithm) is the algorithm used to obtain loopfreedom at every instant throughout a route computation. It allows all routers involved in a topology change to synchronize at the same time. Routers that are not affected by topology changes are not involved in the recomputation. The DUAL finite state machine tracks all subnets advertised by all neighbors. DUAL selects routes to be inserted into a routing table based on feasible successors. A successor is a neighboring router used for packet forwarding that has a least cost path to a destination that is guaranteed not to be part of a routing loop. When there are no feasible successors but there are neighbors advertising reachability to the affected destination, a recomputation must occur. This is the process where a new successor is determined. When a topology change occurs, DUAL will check for feasible successors. If there are feasible successors, it will use the best it finds in order to avoid any unnecessary recomputation.
30.3
Cisco Command-Line Interface
The CLI (Command-Line Interface) is the oldest user interaction style. It requires users to type commands into a terminal emulator where all results are shown in plain text. The command-line style is powerful, and it is the preferred methods used by expert administrators to configure Cisco’s devices. However, the CLI is not very suitable to teach routing concepts to novice students since (1) they will have to remember the commands, (2) the syntax of commands can be complex, and (3) it is not possible to visualize dynamic changes in real-time (e.g., the variation of the throughput or the discovery/recovery of neighbors). It is also very prone to typing errors that can lead to novice user frustration, resulting in an application with poor usability and pedagogy to teach and learn. It is obvious that the CLI is not focused on the teaching and learning of networking topics. It is just an interface to manage Cisco’s devices, therefore, it
396
J.E. Marquina et al.
does not possess the necessary features to ease the teaching and learning of routing protocols like EIGRP. Important concepts, such as the finite state machine, are not present in the CLI to study and understand the operational details of the protocol. Warnings about new and lost adjacencies to neighbors are limited to a screen printing without animations, icons, maps and clouds, as in other visual applications.
30.4
Cisco Packet Tracer
Cisco Packet Tracer is a powerful didactic application that allows students to experiment with networking concepts through a virtual laboratory. It is freely distributed to CCNA (Deal 2008) (Cisco Certified Networking Associate) and CCNP (Edwards et al. 2005) (Cisco Certified Networking Professional) students as an integral part of the Cisco Networking Academy comprehensive learning experience. Packet Tracer provides simulation, visualization, authoring, assessment, and collaboration capabilities to facilitate the teaching and learning of complex networking technologies by visually simulating virtual networking environments (Frezzo et al. 2009; Janitor et al. 2010). With Packet Tracer, students can easily build their network topologies in a visual way by dragging, placing, connecting, and clustering virtual network devices such as hubs, switches, routers, workstations and servers. Once placed in the workspace, students can customize their virtual networking devices. For example, they are allowed to add additional cards (e.g., WIC-2 T, NM-1FE-TX, etc.) to modular routers such as a Cisco 2811. If a router does not allow users to add or remove extension cards while it is powered on, Packet Tracer will force the students to power off the router before performing the change, just to remind them that they can damage the router if they do not follow a strict procedure. To connect virtual networking devices, Packet Tracer offers a wide variety of connections, such as straight-through and cross-over UTP cables. If students do not use the correct connection, the experiment will not work properly and troubleshooting will be necessary. Network devices (switches or routers) can be configured by students just by double-clicking their icon and entering the commands (in the same way they will enter them in real devices) in the CLI tab of the window that will appear. At the moment of the writing of this paper, the last version of Packet Tracer (version 5.3) supports RIP, OSPF, EIGRP and BGP (limited to basic EBGP). Not all the EIGRP commands are implemented in this version, but most of the usual ones are. For example, users can verify which interfaces of a router are running EIGRP (show ip eigrp interfaces), or can see the important tables maintained by the protocol such as the neighbor table (show ip eigrp neighbors), the topology table (show ip eigrp topology), and the complete topology table (show ip eigrp topology all-links). Packet Tracer also supports the customization of the K-values (metric weights 0 K1 K2 K3 K4 K5) of the metric, but authentication and most of the EIGRP debugging commands are not implemented for now.
30
Teaching and Learning Routing Protocols. . .
397
Starting with the release of Packet Tracer 5.0, the Multiuser Capability (Smith and Bluck 2010) was introduced which allows students to cross-connect their Packet Tracer applications together and create one big topology. So now it is possible to create a challenging EIGRP scenario where each student is responsible for its own part of the topology, while they are trying to achieve a goal together – a working big EIGRP simulated network. This kind of practice will be almost impossible with real routers for reasons of cost. Packet Tracer has two operational modes: real-time and simulation mode. The real-time mode simulates a real environment, with the same speed of the simulated networks and protocols, in a similar way of real situations. In simulation mode, Packet Tracer displays the actual data exchange between devices. Each packet, or frame, that carries some data is displayed as a small envelop moving on links between devices. Users can set a filter to limit the study to a particular protocol such as EIGRP and can see all the relevant information of the EIGRP packets (Hello, Acknowledgment, Update, Query, and Reply) based on the different layers of the OSI model, similarly to using a packet analyzer (sniffer) in a real network.
30.5
Dynamips and GNS3
Dynamips1 is a free open-source emulator for Cisco Systems routers than runs on traditional PC with Windows, Linux or MacOS X. It can emulate Cisco 1700 series, 2600 series (2610 to 2650XM, and 2691), 3600 series (3620, 3640, and 3660), 3700 series (3725 and 3745), and 7200. In other words, Dynamips allows students to create virtual routers that run a real Cisco IOS (Internetwork Operating System) by using the PC resources. Since the IOS is a commercial product, students will have to legally get an IOS copy to use Dynamips. GNS32 is a front-end for Dynamips, that is, it is a graphical application that allows users to visually create their network topology based on Cisco Systems routers just by dragging and clicking as they do in Packet Tracer. Most of the WICs (WAN Interface Cards) and NMs (Network Modules) are emulated by Dynamips, so students can customized their virtual routers as needed. Since Dynamips runs true IOSs, it supports all the EIGRP commands that are implemented in the IOS. That is, it does not have the limitations of Packet Tracer, however users are restricted to smaller topologies due to the resources needed by each virtual router. In general, to run a network scenario with ten Cisco 3745 routers, a PC with an up-to-date processor and at least 4 GB of RAM is recommended. With Dynamips, virtual routers can interact with real routers allowing students to expand their testbed to a bigger topology by adding virtual routers. Similarly to Packet Tracer, it is possible to create a challenging EIGRP scenario based on virtual routers that run on different PCs, and where each student is responsible for its own part of the topology.
1 2
http://www.ipflow.utc.fr/index.php/Cisco_7200_Simulator http://www.gns3.net
398
J.E. Marquina et al.
Another important feature for the training of students that offers Dynamips is the capture of network traffic. To do so, from GNS3, students just have to right click the link where they want to capture and choose the Capture item in the context menu. Immediately, Wireshark (Gamess and Veracoechea 2010) (a popular free packet analyzer) will appear and students will get a copy of all the EIGRP packets (Hello, Acknowledgment, Update, Query, and Reply) sent by routers in the link.
30.6
OPNET IT Guru
Network simulators are widely used by network administrators and researchers to plan, design, secure, analyze, test, debug, improve, and fine-tune networks. There are many network simulators around and Andrea Rizzoli3 and Sally Floyd4 are maintaining up-to-date lists of such tools. Some network simulation tools (ns-2, QualNet Developer and OPNET IT Guru) are very popular in the academic world, for their type of license and their abundant documentation (Gamess and Veracoechea 2010). ns-2 (Issariyakul and Hossain 2009) (Network Simulator 2) is an open source network simulator that is mainly used in the simulation of TCP variants and ad-hoc networking research. ns-2 does not support EIGRP and only offers a limited GUI (called nam) that allows users to start, stop, and step forward and backward the simulation. QualNet Developer and OPNET IT Guru are commercial network simulation tools developed by Scalable Network Technologies5 and OPNET Technologies,6 respectively. Both tools support EIGRP and have an excellent GUI. Their prices can be very high and vary depending on the extra modules required for the simulation. With the QUP (QualNet University Program), researchers and professors can acquire 1-seat research licenses and multi-seat teaching licenses of QualNet Developer at substantially reduced prices. There is a limited edition of OPNET IT Guru (called OPNET IT Guru Academic Edition) that can be downloaded free of charge by students and professors at OPNET Technologies6 website. Due to the free OPNET IT Guru Academic Edition, we will limit our study to this network simulation tool. Similarly to Packet Tracer and Dynamips/GNS3, OPNET IT Guru users can easily build their network topologies in a visual way by dragging, placing, connecting, and clustering virtual network devices. Almost all network settings and EIGRP commands are supported by OPNET IT Guru. That is, once the simulated network has been visually drawn, users can specify the IP address and subnet mask of each interface of routers. OPNET IT Guru also offers a method to auto-assign IP addresses and subnet masks. Then, users can create different EIGRP processes to manage
3
http://www.idsia.ch/~andrea/sim/simnet.html http://www.icir.org/models/simulators.html 5 http://www.scalable-networks.com 6 http://www.opnet.com 4
30
Teaching and Learning Routing Protocols. . .
399
several ASs (Autonomous Systems), and specify the EIGRP interfaces and the passive interfaces. Also, users can enable or disable auto-summarization, change the K-values (K1, K2, K3, K4, and K5) used in the computation of the metric, specify the variance, generate a default-route, redistribute other routing protocols (RIP, OSPF, ISIS, Static, Directly Connected, etc.) into EIGRP, establish input or output filters for route filtering, etc. OPNET IT Guru Academic Edition do not use the CLI to configure routers, instead users do it through menus and dialog boxes. Once the setting is finished, the simulation can be run and results (routing table, EIGRP topology table, etc.) are generated. To verify connectivity or to draw the path followed by packets between two devices, users can simulate ping commands or visualize the path used by packets in an easy-way. OPNET IT Guru Academic Edition is seldom used by CCNA or CCNP students, since the configuration is not done through the CLI. Moreover, users can not interact with the simulator during simulation time, so they must define all the simulated evens before running the simulation and study the effect of the events with the collected results at the end of the simulation. However, with OPNET IT Guru, users can make a deep study of the performance of the network under a specified load, which is not possible with Packet Tracer and Dynamips/GNS3.
30.7
Easy-EIGRP
Easy-EIGRP (Expo´sito et al. 2010) is an implementation of the EIGRP protocol developed in Java. It can be installed in a PC with several NICs (Network Interface Cards) to transform it in an EIGRP router. Even if it is a true, but limited implementation of the EIGRP protocol, its main goal is to be used as a didactic application in introductory and advanced network courses to support the teaching and learning of EIGRP. Unlike the previous studied tools (Packet Tracer, Dynamips/GNS3, and OPNET IT Guru), this application is totally focused on the teaching and learning of EIGRP through very intuitive and interactive GUIs, and gives students all the necessary feedback to know what actually happens at any given time, which results in a more effective and natural way to get or transmit the skills. To ease the teaching and learning process, Easy-EIGRP provides a set of five modules briefly described below.
30.7.1 EIGRP Settings This module offers users an interface for setting and configuring the network attributes of the PC as well as the environment variables of EIGRP. Figure 30.1 shows how the module is divided in three main sections: one that lists all the NICs detected in the PC, another section for setting network interfaces attributes (e.g., IP
400
J.E. Marquina et al.
Fig. 30.1 Easy-EIGRP’s settings module
address, subnet mask, bandwidth, delay, etc.), and a third one for configuring router variables such as ASN (Autonomous System Number), Maximum Retransmission Allowed, K-values (K1, K2, K3, K4, and K5), etc. This module offers images, icons and warning messages that guide users in the settings, preventing them from typing incorrect values. It is a much easier and attractive way to specify the settings than the traditional CLI where users will have to memorize a large amount of commands to configure devices, and therefore, it is more adequate for novice students.
30.7.2 DUAL Finite State Machine This module allows users to view every single step executed by the complex EIGRP’s finite state machine. Since the sequence of events in the finite state machine can occur very fast, this module provides options to customize the reproduction of any previous DUAL process (including forwarding, backwarding, pausing and stopping the animation). It also has a log section and a reply-status table through which users can keep track of events. It is important to note that the previous tools (Packet Tracer, Dynamips/GNS3, and OPNET IT Guru) do not offer information of the DUAL finite state machine, so Easy-EIGRP is a strong candidate to teach advanced EIGRP concepts. In this module, users can pick a prefix and
30
Teaching and Learning Routing Protocols. . .
401
Fig. 30.2 Easy-EIGRP’s DUAL finite state machine module
interact with its finite state machine. They can witness an animation composed of state changes with their own natural messages, allowing the understanding of the internal processes which were carried out and generally represent a fairly abstract task for students. Figure 30.2 shows the layout of the module with its different sections and how they interact with each other. For example, the log section and the animation of the finite state machine are synchronized, that is, when changing from one state to another the respective message is highlighted in the log section to show students when, how and who answered the query sent.
30.7.3 Partial Network Map This component provides a graphic and live updated view of the network obtained from the knowledge gained from neighbors. It also has a log section which describes events that occur in the network and is particularly well suited for the teaching and learning of the EIGRP metric since it shows details of its computation using the information advertised by neighbors (see Fig. 30.3). The simple GUI in this module was designed to dramatically improve the learning curve by providing immediate feedback after any change in the network or configuration. For example, when Easy-EIGRP discovers a neighbor, a new
402
J.E. Marquina et al.
Fig. 30.3 Easy-EIGRP’s partial network map
router icon will appear in the Partial Network Map with the learnt subnets. When Easy-EIGRP loses a neighbor, the corresponding router icon will disappear.
30.7.4 EIGRP Tables EIGRP handles four main tables (IP Routing Table, Neighbor Table, Topology Table and Complete Topology Table) that are fundamental for the functionality to the entire protocol. These tables are available in this module of Easy-EIGRP and are presented with the same format used by the Cisco CLI, to facilitate user migration. However, they are displayed with different colors to distinguish each of the entries and allow a more pleasant viewing. It is clear that for the visualization of these tables, users do not have to remember or enter any command; furthermore, they are updated automatically in real time with the loss or arrival of new neighbors, or any metric change.
30.7.5 Logger The main goal of this module is to collect and summarize all the information processed by Easy-EIGRP to develop student’s ability to predict the responses of the actual configuration and to learn much more quickly. This panel, like all the previous ones, is automatically updated in real time offering new users valuable information to quickly learn the protocol. Figure 30.4 displays how this complete and powerful debugging
30
Teaching and Learning Routing Protocols. . .
403
Fig. 30.4 Easy-EIGRP’s Logger module
panel can keeps track of every packet exchange for troubleshooting and learning purposes. Similar to a packet sniffer, users can see all the information of the EIGRP packets in a more detailed form, presented as a tree which can be expanded and collapsed. The same behavior is observed in the packet bytes panel that shows the data of the current packet (selected in the messages/packets panel) in a hexdump style. In addition, system debugging messages are also added by the module. The computation of the packet checksum can be seen as a simple task by network specialists, but it may be a nightmare for beginner students. In this module, users can get the step-by-step process to compute a packet checksum, just by double clicking on checksum’s field (for both, the IP and EIGRP headers).
30.7.6 Comparative Study After the implementation phase of Easy-EIGRP, an exploratory study was conducted to measure the level of acceptance of this tool in a group of 50 students with different knowledge in terms of networking concepts, ranging from no knowledge at all to advanced skills. The study was carried out in the School of Computer Science of our University (Universidad Central de Venezuela) and consisted in setting up some simple topologies with GNS3 (using the Cisco CLI) on one side,
404
J.E. Marquina et al.
Fig. 30.5 Results of the study
and with Easy-EIGRP on the other side. After doing the experiments, the students filled a survey to assess certain features of the tools such as learning, usability, ease of configuration, application feedback, etc., in a range of 1–5 marks, where 1 mark was the lowest score acceptance and 5 marks the highest. After analyzing the survey, we obtained quite positive results (see Fig. 30.5) indicating that Easy-EIGRP is well accepted by students and greatly facilitates the teaching and learning of EIGRP. Additionally, we evaluated each of the five modules of Easy-EIGRP separately, and received a very positive feedback from students (grades ranged between 4 and 5 marks).
30.8
Conclusions and Future Work
Networks are becoming more and more important in today’s life. Then, it is an essential area in the training of computer science students. With the needs and trends of new networking technologies, routing protocols are using complex algorithms and concepts that are not easy to understand at first glance. To facilitate the teaching and learning process of EIGRP (a Cisco Systems routing protocol), visual tools can be used. In this paper, we presented three well-known tools (Packet Tracer, Dynamips/ GNS3, and OPNET IT Guru) and a new application (Easy-EIGRP) that can significantly help students to master EIGRP. All these tools allow users to do labs with several simulated routers in a single PC, that is, without the need of a real and expensive testbed. They have advanced GUIs to ease the settings, and animations to facilitate the understanding of ideas. Most of these tools enable a direct look inside of a “wire” that is interconnecting devices and carrying the PDUs of the protocols. So students can visually see and therefore more easily understand what is really going on in the network. Packet Tracer and Dynamips/GNS3 are focused to Cisco’s devices configuration,
30
Teaching and Learning Routing Protocols. . .
405
so CCNA and CCNP students are more likely to use them. With OPNET IT Guru, students can also learn how EIGRP works by doing simulations. These are no realtime interactions between users and the virtual devices, but OPNET IT Guru allows users to make a deep study of the performance of the network. Easy-EIGRP is an advanced solution for the teaching of EIGRP. It shows a significant quantity of information to students and assists them in the configuration and learning of EIGRP. With Easy-EIGRP, all the information are shown dynamically and it is well-suited for novice students to learn concepts such as the metric, successors, and feasible successors. Unlike the other applications, Easy-EIGRP does have a module for the EIGRP finite state machine, which makes it one of the strongest tools for the teaching and learning of EIGRP. For future work, we plan to further develop Easy-EIGRP to support IPv6 (Davies 2008; Deering and Hinden 1998), since IPv6 will become the predominant layer-3 protocols in tomorrow’s networks. We also plan to develop a didactic visual version of OSPF and BGP, and study how animations and graphics can also support the learning process of these two complex routing protocols.
References Davies J (Jan 2008) Understanding IPv6, 2nd ed. Microsoft Press Deal R (Apr 2008) Cisco certified network associate study guide, 3rd ed. McGraw-Hill Osborne Deering S, Hinden R (Dec 1998) Internet protocol, version 6 (IPv6) specification. RFC 2460 Doyle J, Carrol J (Oct 2005) Routing TCP/IP, vol I, 2nd ed. Cisco Press Edwards W, Jack T, Lammle T, Skandier T, Padjen R, Pfund A, Timm C (May 2005) CCNP: complete study guide. Sybex Expo´sito J, Trujillo V, Gamess E (2010) Using visual educational tools for the teaching and learning of EIGRP. Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2010 (WCECS 2010), San Francisco, USA, 20–22 Oct 2010, pp 169–174 Frezzo D, Behrens J, Mislevy R, West P, DiCerbo K (Apr 2009) Psychometric and evidentiary approaches to simulation assessment in packet tracer software. The fifth international conference on networking and services (ICNS 2009), Valencia, Spain Galitz O (Jun 2007) The essential guide to user interface design: an introduction to GUI design principles and techniques, 3rd edn. Wiley, New York Gamess E, Veracoechea C (Jul 2010) A comparative analysis of network simulation tools. The 2010 international conference on modeling, simulation and visualization methods (MSV’10). Part of WORLDCOMP’10, Las Vegas, Neveda Issariyakul T, Hossain E (Nov 2009) Introduction to network simulator NS2. Springer, New York Janitor J, Jakab F, Kniewald K (Mar 2010) Visual learning tools for teaching/learning computer networks. The sixth international conference on networking and services (ICNS 2010), Cancun, Mexico Pepelnjk I (Jan 2000) EIGRP network design solutions: the definitive resource for EIGRP design, deployment, and operation. Cisco Press, Indianapolis, USA Smith A, Bluck C (Mar 2010) Multiuser collaborative practical learning using packet tracer. The sixth international conference on networking and services (ICNS 2010), Cancun, Mexico
Chapter 31
Relevance Features Selection for Intrusion Detection Adetunmbi Adebayo Olusola, Oladele S. Adeola and Oladuni Abosede Daramola
31.1
Introduction
As Internet keeps growing with an exponential pace, so also is cyber attacks by crackers exploiting flaws in Internet protocols, operating system and application software. Several protective measures such as firewall have been put in place to check the activities of intruders which could not guarantee the full protection of the system; hence, the need for a more dynamic mechanism like intrusion detection system (IDS) as a second line of defense. Intrusion detection is the process of monitoring events occurring in a computer system or network and analyzing them for signs of intrusions (Bace and Mell 2001). IDSs are simply classified as host-based or network-based. The former operates on information collected from within an individual computer system and the latter collect raw networks packets as the data source from the network and analyze for signs of intrusions. The two different detection techniques employed in IDS to search for attack patterns are Misuse and Anomaly. Misuse detection systems find known attack signatures in the monitored resources. Anomaly detection systems find attacks by detecting changes in the pattern of utilization or behaviour of the system. Majority of the IDSs currently in use are either rule-based or expert-system based whose strengths depend largely on the dexterity of the security personnel that develops them. The former can only detect known attack types and the latter is prone to generation of false positive alarms. This leads to the use of an intelligence technique known as data mining/machine learning technique as an alternative to expensive and strenuous human input. These techniques automatically learn from
A.A. Olusola (*) Department of Computer Science, Federal University of Technology, PMB 704, Akure Ondo State, Nigeria e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_31, # Springer Science+Business Media, LLC 2011
407
408
A.A. Olusola et al.
data or extract useful pattern from data as a reference for normal/attack traffic behaviour profile from existing data for subsequent classification of network traffic. Intelligent approach was first implemented in mining audit data for automated models for intrusion detection (MADAMID) using association rule (Lee et al. 1999). Several others machine-learning paradigms investigated for the design of IDS include: neural networks learn relationship between given input and output vectors to generalize them to extract new relationship between input and output (Ajith et al. 2005; Byunghae et al. 2005; Mukkamala et al. 2002), fuzzy generalize relationship between input and output vector based on degree of membership (Ajith et al. 2005; Susan and Rayford 2000), decision tree learns knowledge from a fixed collection of properties or attributes in a top down strategy from root node to leave node (Ajith et al. 2005; Pavel et al. 2005; Quinlan 1993), support vector machine simply creates Maximum-margin hyper planes during training with samples from two classes (Byung-Joo and Il-Kon 2005; Mukkamala et al. 2002; Zhang et al. 2004). Rough sets produce a set of compact rules made up of relevant features only suitable for misuse and anomalous detection (Adetunmbi et al. 2007, 2008; Sanjay et al. 2005; Zhang et al. 2004). Bayesian approaches are powerful tools for decision and reasoning under uncertain conditions employing probabilistic concept representations (Axelsson 1999). Prior to the use of machine learning algorithms raw network traffic must first be summarized into connection records containing a number of within-connection features such as service, duration, and so on. Identification of important features is one of the major factors determining the success of any learning algorithm on a given task. Feature selection in learning process leads to reduction in computational cost, over fitting, model size and leads to increase in accuracy. Previous works in feature selection for intrusion detection include the work of (Kayacik et al. 2006; Sung and Mukkamala 2003). In this paper, attempt was made to investigate the relevance of each feature in KDD 99 intrusion detection dataset to substantiate the performance of machine learning and degree of dependency is used to determine the most discriminating features for each class. Therefore, the relevance of the forty one (41) features with respect to dataset labels was investigated. This paper is organized as follows: in Sect. 31.2 description of the intrusion detection evaluation dataset is presented followed by brief description of rough set and discretization technique employed in Sect. 31.3. Section 31.4 presents the experimental setup and results followed by conclusion in Sect. 31.5.
31.2
Intrusion Detection Dataset
The KDD Cup 1999 dataset used for benchmarking intrusion detection problems is used in our experiment (Cup 1999). The dataset was a collection of simulated raw TCP dump data over a period of 9 weeks on a local area Network. The training data
31
Relevance Features Selection for Intrusion Detection
409
Table 31.1 Class labels and the number of samples that appears in “10% KDD” dataset Original number Number of samples after Attack of samples removing duplicated instances Class Back 2,203 994 DOS Land 21 19 DOS neptune 107,201 51,820 DOS Pod 264 206 DOS Smurf 280,790 641 DOS teardrop 979 918 DOS Satan 1,589 908 PROBE ipsweep 1,247 651 PROBE Nmap 231 158 PROBE portsweep 1,040 416 PROBE normal 97,277 87,831 NORMAL Guess_passwd 53 53 R2L ftp_write 8 8 R2L Imap 12 12 R2L Phf 4 4 R2L multihop 7 7 R2L warezmaster 20 20 R2L warezclient 1,020 1020 R2L Spy 2 2 R2L Buffer_overflow 30 30 U2R loadmodule 9 9 U2R Perl 3 3 U2R Rootkit 10 10 U2R
was processed to about five million connection records from 7 weeks of network traffic and 2 weeks of testing data yielded around two million connection records. The training data is made up of 22 different attacks out of the 39 present in the test data.The known attack types are those present in the training dataset while the novel attacks are the additional attacks in the test datasets not available in the training data sets. The attacks types are grouped into four categories: (1) DOS: Denial of service – e.g. syn flooding (2) Probing: Surveillance and other probing, e.g. port scanning (3) U2R: unauthorized access to local super user (root) privileges, e.g. buffer overflow attacks (4) R2L: unauthorized access from a remote machine, e.g. password guessing The training dataset consisted of 494,021 records among which 97,277 (19.69%) were normal, 391,458 (79.24%) DOS, 4,107 (0.83%) Probe, 1,126 (0.23%) R2L and 52 (0.01%) U2R connections. In each connection are 41 attributes describing different features of the connection and a label assigned to each either as an attack type or as normal. Table 31.1 shows the class labels and the number of samples that appears in “10% KDD” training dataset.
410
31.3
A.A. Olusola et al.
Basic Concept of Rough Set
Rough Set is a useful mathematical tool to deal with imprecise and insufficient knowledge, reduce data set size, find hidden patterns and generate decision rules. Rough set theory contributes immensely to the concept of reducts. Reducts is the minimal subsets of attributes with most predictive outcome. Rough sets are very effective in removing redundant features from discrete data sets. Rough set concept is based on a pair of conventional sets called lower and upper approximations. The lower approximation is a description of objects which are known in certainty to belong to the subject of interest, while upper approximation is a description of objects which possibly belong to the subset (Komorowski et al. 1998). Definition 1: Let S ¼ hU; A; V; f i be an information system, where U is a universe containing a finite set of N objects fx1 ; x2 ; ::::xN g. A is a non-empty finite set of attributes S used in description of objects. V describes values of all attributes, that is, V ¼ a2A Va where Va forms a set of values of the a-th attribute. f : UxA ! V is the total decision function such that f ðx; aÞ 2 Va for every a 2 A and x 2 U. Information system is referred to as decision table (DT) if the attributes in S is divided into two disjoint sets called condition (C) and decision attributes (D) where A ¼ C [ D and C \ D ¼ f. DT ¼ hU; C [ D; V; f i
(31.1)
A subset of attributes B A defines an equivalent relation (called Indiscernibility relation) on U, denoted as IND(B). INDðBÞ ¼ fðx; yÞ 2 UxUj f ðx; bÞ ¼ f ðy; bÞ8b 2 Bg:
(31.2)
The equivalent classes of B-indiscernibility relation are denoted by [x]B. where ½xB ¼ fy 2 Ujðx; yÞ 2 INDðBÞg Definition 2: Given B A and X U. X can be approximated using only the information contained within B by constructing the B lower and B-upper approximations of set X defined as: BX ¼ fx 2 Xj½xB Xg BX ¼ fx 2 Xj½xB \ X 6¼ 0g
ð31:3Þ
Definition 3: Given attributes A ¼ C [ D and C \ D ¼ f. The positive region for a given set of condition attribute C in the relation to IND (D), POSC(D) can be defined as POSC ðDÞ ¼
[ x2D
CX
(31.4)
31
Relevance Features Selection for Intrusion Detection
411
where D* denotes the family of equivalence classes defined by the relation IND(D). POSC(D) contains all objects of U that can be classified correctly into the distinct classes defined by IND(D). Similarly, given attributes subsets B, Q A, the positive region contains all objects of U that can be classified to blocks of partition U/Q using attribute B. B is defined as: POSB ðQÞ ¼
[
BX
(31.5)
x2Q
Definition 4: Given attributes B, Q A, the degree of dependency of Q on B over U is defined as gB ðQÞ ¼
jPOSB ðQÞj jUj
(31.6)
The degree of dependency of an attribute dictates its significance in rough set theory.
31.3.1 Discretization Based on Entropy Entropy, a supervised splitting technique used to determine how informative a particular input attribute is about the output attribute for a subset, is calculated on the basis of the class label. It is characterized by finding the split with the maximal information gain (Jiawei and Micheline 2006). It is simply computed thus: Let D be a set of training data set defined by a set of attributes with their corresponding labels The Entropy for D is defined as: EntropyðDÞ ¼
m X
Pi log2 ðPi Þ
(31.8)
i¼1
where Pi is the probability of Ci in D, determined by dividing the number of tuples of Ci in D by |D|, the total number of tuples in D. Given a set of samples D, if D is partitioned into two intervals D1 and D2 using boundary T, the entropy after partitioning is EðD; TÞ ¼
jD 1 j j D2 j EntðD1 Þ þ EntðD2 Þ D D
(31.9)
where |Di| denotes cardinality. The boundaries T are chosen from the midpoints of the attributes values
412
A.A. Olusola et al.
Information gain of the split, Gain(D,T) ¼ Entropy(D) – E(D,T). In selecting a spilt-point for attribute A, pick an attribute value that gives the minimum information required which is obtained when E(D,T) is minimal.. This process is performed recursively on an attribute the information requirement is less than a small threshold (0). EntðSÞ EðT; SÞ > d
31.4
(31.10)
Experimental Setup and Results
The training set employed for this analysis is the “10% KDD” (kddcup_data_gz file) dataset. Since the degree of dependency is calculated for discrete features, continuous features are discretized based on entropy as discussed in the previous section. Prior to the discretization, redundant records from the dataset were removed since rough set does not require duplicate instances to classify and identify discriminating features. For this experiment a total of 145,738 records are used, detailed shown in Table 31.1. In this experiment, two approaches are adopted to detect how significant a feature is to a given class. The first approach is the computation of the degree of dependency for each class based on the available number of class instances in the data set. Thus, signifying how well the feature can discriminate the given class from other classes. Secondly, each class labels are mapped against others for each attribute. That is, generating a frequency table of a particular class label against others based on variations in each attribute and then a comparison made to generate the dependency ratio of predominant classes in order to detect all the relevant features distinguishing one class from another. Graphical analysis was also employed in the analysis in order to detect the relevant features for each class. The dependency ratio is simply computed thus Dependency ratio ¼
HVF OTH TIN TON
(31.11)
where HVF is the highest number of instance variation for a class label in attribute f. TIN is the total number of instances of that class in the dataset OTH is the number of instances for other class labels based on a particular or a set of Variations. TON is the total number of instances of class labels in the data set constituting OTH
31
Relevance Features Selection for Intrusion Detection
413
Table 31.2 Attribute with the highest degree of dependency that distinctly distinguish some class labels from the training data set Degree of Other distinct Attack dependency Selected features Feature name features back 0.9708 5 source bytes 6 neptune 0.0179 3 Service 39 teardrop 0.9913 8 wrong fragment 25 satan 0.0319 30 diff srv rate 27,3 portsweep 0.0264 4 Flag 30,22,5 normal 0.0121 6 destination bytes 5,3,10,11,1 guess_passwd 0.0189 11 failed logins – imap 0.3333 26 srv error rate – warezmaster 0.7500 6 destination bytes – warezclient 0.2686 10 Hot 5,1
Fig. 31.1 Maximum Degree of dependency of each feature distinctly distinguishing an attack
31.4.1 Result Discussions Results are presented in terms of the class that achieved good levels of discrimination from others in the training set and the analysis of feature relevancy in the training set. Analyses are based on degree of dependency and binary discrimination for each class. That is, for each class, a dataset instance is considered as in-class, if it has the same label; out-class, if it has a different label. Degree of dependency is computed for class labels based on number of instances of that class available in the dataset. Table 31.2 shows the highest degree of dependency of class labels depending on a particular feature which clearly distinguished a particular class label in the training data set. Figure 31.1 shows the Degree of dependency of each feature that distinctly distinguished some class labels while Fig. 31.2a and b clearly display the patterns of these chosen attributes. From Figs. 31.1 and 31.2b, Feature 8 selected for teardrop has the highest degree of dependency of 0.9913 based on variations 2 of that attribute shows that attribute
414
A.A. Olusola et al.
a
back vs others (attribute 5)
1.2
Frequency ratio
1 back.
0.8
Others.
0.6 0.4 0.2 0 Attribute Variation
b
Teardrop vs other attributes in 8
% Distribution
1.2 1.0 0.8
teardrop
0.6
others
0.4 0.2 0 0
1 Variations
2
Fig. 31.2 (a) Dependency of back attack and others on attribute 5. (b) Dependency of teardrop attack against other class labels on attribute 8
8 is sufficient to detect teardrop attack. In addition, Fig. 31.2b shows that the remaining 0.0097 can be detected based on variation 1. Another DOS attack type “back” with degree of dependence of 0.9708 on attribute 5 depicted in Fig. 31.2a. This also shows that the attribute is almost sufficient in detecting this attack type. Table 31.3 details the most relevant features selected for each class and their corresponding dependency ratio. Six out of the 23 classes chooses amount of data exchange (source and destination bytes) as the most discriminating features with DOS group having half of it. This is expected of denial of service and probe category of attacks where the nature of the attack involves very short or very long connections. Feature 7 which are related for land attack is selected as the most discriminating feature for land attack while for pod and teardrop feature 8 (wrong fragment) was selected as the most discriminating features for these attack types. Figure 31.3 shows the maximum dependency ratio for each feature as shown in Table 31.3 and it could be seen that dependency of land attack on feature 7 is almost 100%. In fact it is 100% because all the 19 records labeled land in the training dataset used variation 2 in feature 7 which was only used by land attack except for a record
31
Relevance Features Selection for Intrusion Detection
Table 31.3 The most relevant feature for each attack type and normal Most relevant Dependency Attack features Feature name Variations ratio Back 5 source bytes 66,64,60 0.9708 Land 7 Land 2 0.9999 Neptune 5 source bytes 0 0.9328 Pod 8 wrong fragment 1 0.9853 Smurf 5 source bytes 39 0.7731 Teardrop 8 wrong fragment 2 0.9913 Satan 30 diff srv rate 30 0.7648 dst host name src Ipsweep 36 port rate 13,14,15,17 0.8282 Nmap 5 source bytes 4 0.6448 portsweep 28 srv error rate 9 0.8057 Normal 29 same srv rate 28 0.8871 guess_passwd 11 failed login 1 0.9622 ftp_write 23 Count 1 0.7897 Imap 3 Service 60 0.9980 Phf 6 destination bytes 28 0.9976 Multihop 23 Count 1 0.7898 warezmaster 6 destination bytes 33 0.7500 warezclient 3 Service 13 0.6658 dst host srv Spy 39 serror rate 8 0.9997 buffer_overflow 3 Service 6 0.6965 dst host name loadmodule 36 srcport rate 29 0.6279 Perl 14 root shell 1 0.9994 Rootkit 24 srv count 1 0.7269
415
Class DOS DOS DOS DOS DOS DOS PROBE PROBE PROBE PROBE NORMAL R2L R2L R2L R2L R2L R2L R2L R2L U2R U2R U2R U2R
Fig. 31.3 Dependency ratio of each feature
labeled normal in the dataset. Also revealed is heavy dependence on feature “Service” (that is, feature 3) which shows that different services are exploited to perpetrate different types of attack. For instance, imap4, ftp_data and telnet are exploited to lunch imap, warezclient and buffer_overflow attack respectively.
416 Table 31.4 List of features for which the class is selected most relevant
A.A. Olusola et al.
Class label Back Land neptune Pod Smurf teardrop Satan ipsweep Nmap portsweep normal guess_passwd ftp_write Imap Phf multihop warezmaster warezclient spy buffer_overflow loadmodule perl rootkit
Relevant features 5,6 7 3.4,5,23,26.29,30,31,32,34,36,37,38,39 8 2,3,5,6,12,25,29,30,32,36,37,39 8 27 36 5 28 3,6,12,23,25,26,29,30,33,34,35,36,37,38,39 11,6,3,4 9,23 3,39 6,10,14,5 23 6,1 3,24,26 39,1 3,24,14,6 36,24,3 14,16,18,5 24,23,3
Table 31.4 details the most discriminating class labels for each feature. Normal, Neptune and Smurf are the most discriminating classes for most of the features which consequently make their classification easier. Moreover, these three classes dominating the testing dataset and this account to high detection rate of machine learning algorithm on them. Figure 31.3 shows how important a particular feature is to detection of an attack and normal. For some class label a feature is sufficient to detect an attack type while some requires combination of two or more features. For features with few representatives in the dataset such as spy and rootkit, it is very difficult detecting a feature or features that can clearly differentiate them because of the dominance of some class labels like normal and Neptune. This difficulty is in classifying attacks limited to two major groups, user to root and remote to local. The involvement of each feature has been analyzed for classification. Features 20 and 21 make no contribution to the classification of either an attack or normal. Hence these two features (outbound command count for FTP session and hot login) have no relevance in intrusion detection. There are other features that makes little significance in the intrusion detection data set. From the dependency ratios calculated, these features include 13, 15, 17, 22 and 40 (number of compromised conditions, su attempted, number of file creation operations, is guest login, dst host rerror rate respectively).
31
Relevance Features Selection for Intrusion Detection
31.5
417
Conclusion
In this paper, selection of relevance features is carried out on KDD’99 intrusion detection evaluation dataset. Empirical results revealed that some features have no relevance in intrusion detection. These features include 20 and 21 (outbound command count for FTP session and hot login) while features 13, 15, 17, 22 and 40 (number of compromised conditions, su attempted, number of file creation operations, is guest login, dst host rerror rate respectively) are of little significant in the intrusion detection. In our future work, additional measures including sophisticated statistical tools will be employed.
References Adetunmbi AO, Alese BK, Ogundele OS, Falaki SO (2007) A data mining approach to network intrusion detection. J Comp Sci Appl 14(2):24–37 Adetunmbi AO, Adeola OS, Daramola OA (2010) Analysis of KDD’99 Intrusion detection dataset for selection of relevance features. Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2010 (WCECS 2010), vol 1, San Francisco, USA, 20–22 Oct 2010, pp 162–168 Adetunmbi AO, Falaki SO, Adewale OS, Alese BK (2008) Intrusion detection based on rough set and k-nearest neighbour. Int J Comput ICT Res 2(1):60–66 Ajith A, Ravi J, Johnson T, Sang YH (2005) D-SCIDS: distributed soft computing intrusion detection system. J Network Comp Appl 28(1):1–19, Elsevier Axelsson S (1999) The base –rate fallacy and its implication for the difficulty of intrusion detection. In: Proceedings of the 6th ACM conference on computer and communication security, Singapore, pp 127–141 Bace R, Mell P (2001) Intrusion detection system, NIST special publications SP 800. November Byung-Joo K, Il-Kon K (2005) Machine Learning approach to real time intrusion detection system. In: Zhang S, Jarvis (eds) Lecture notes in artificial intelligence, vol 3809. Springer, Berlin, Heidelberg, pp 153–163 Byunghae C, kyung WP, Jaittyun S (2005) Neural networks techniques for host anomaly intrusion detection using fixed pattern transformation in ICCSA. LNCS 3481:254–263 Jiawei H, Micheline K (2006) Data mining concepts and techniques, 2nd ed. China Machine Press, Singapore, pp 296–303 Kayacik HG, Zincir-Heywood AN, Heywood ML (2006) Selecting features for intrusion detection: a feature analysis on KDD 99 intrusion detection datasets KDD Cup 1999 Data: Available: http://kdd.ics.uci.edu/databases/kddcup99/ Komorowski J, Pokowski L, Skowron A (1998) Rough sets: a tutorial. citeseer.ist.psu.edu/ komorowski98rough.html Lee W, Stolfo SJ, Mok K (1999) Data mining in work flow environments: experiments in intrusion detection. In: Proceedings of the 1999 conference on knowledge discovery and data mining, 15–18 Aug 1999, San Diego, CA Mukkamala S, Janoski G, Sung A (2002) Intrusion detection using neural networks and support vector machines. In: Proceedings of IEEE international joint conference on neural networks, pp 1702–1707
418
A.A. Olusola et al.
Pavel L, Patrick D, Christia S, Konrad R (2005) Learning intrusion detection: supervised or unsupervised? International conference on image analysis and processing, (ICAP). Italie 2005(3617):50–57 Quinlan JL (1993) C4.5 program for machine learning, Morgan Kaufmam, USA Sanjay R, Gulati VP, Arun KP (2005) A fast host-based intrusion detection system using rough set theory in transactions on rough sets IV. LNCS 3700(2005):144–161 Susan MB, Rayford BV (2000) Intrusion detection via fuzzy data mining. Proceedings of the 12th annual Canadian Information Technology Security Symposium, Ottawa, Canada, Jun 19–23, pp 109–122 Sung AH, Mukkamala S (2003) Identifying important features for intrusion detection using support vector machines and neural networks. IEEE proceedings of the 2003 symposium on applications and the Internet Zhang L, Zhang G, YU L, Zhang J, Bai Y (2004) Intrusion detection using rough set classification. J Zhejiang Univ Sci 5(9):1076–1086
Chapter 32
A Visual Application for Teaching and Learning the Advanced Concepts of the Diffusing Update Algorithm for EIGRP Valentina Trujillo Di Biase, Jesu´s Expo´sito Marquina, and Eric Gamess
32.1
Introduction
Routing protocols can be categorized, based on the information they exchange and the way they compute their routing tables, in: distance vector protocols and link state protocols. In a distance vector protocol, each node knows the shortest distance from a neighbor node to every destination network; however it does not know all the nodes between its neighbor and the final destination. This type of protocols sends periodic updates which include every destination entry of its routing table along with the corresponding shortest distance to it. Distance vector protocols use the Bellman-Ford algorithm for shortest path computation. The main disadvantages of this algorithm are routing table loops and counting to infinity. Some of the most important distance vector protocols known are: RIPv1 (Hedrick 1998) (Routing Information Protocol v1), RIPv2 (Malkin 1998) (Routing Information Protocol v2), and IGRP (Doyle and Carrol 2005) (Interior Gateway Routing Protocol). On the other hand, in link state protocols, every node knows the whole network topology thanks to the update flooding that happens every time a topology change is detected. Based on these updates, each node must compute the shortest path to a specific destination. Link state protocols use some variant of Dijkstra’s algorithm for the shortest path computation, which ensures that the counting to infinity problem is not going to be present; however, an up-to-date version of the entire topology is needed by every node, which may constitute excessive storage and communication overhead in a large, dynamic network (Garcı´a-Lunes-Aceves 1993). OSPF (Coltun et al. 1999, Moy 1989) (Open Shortest Path First) and IS-IS (ISO 2002, Oran 1990) (Intermediate System to Intermediate System) are some of the most commonly used link state routing protocols.
V.T. Di Biase (*) Escuela de Computacio´n, Universidad Central de Venezuela, Los Chaguaramos, Caracas 1040, Venezuela e-mail:
[email protected] S.‐I. Ao et al. (eds.), Intelligent Automation and Systems Engineering, Lecture Notes in Electrical Engineering 103, DOI 10.1007/978-1-4614-0373-9_32, # Springer Science+Business Media, LLC 2011
419
420
V.T. Di Biase et al.
EIGRP (Pepelnjk 2000) (Enhanced Interior Gateway Routing Protocol) is often categorized as a hybrid protocol since it includes concepts of both distance vector and link state protocols by advertising its routing table to its neighbors as distance vector protocols do, and by using the Hello Protocol as well as creating neighbor relationships, similarly to link state protocols. In addition, it sends partial updates when a metric or the network topology changes, instead of sending full routing table updates in periodic fashion as distance vector protocols do. EIGRP uses the DUAL (Diffusing Update Algorithm) algorithm which ensures that there will never be loops, not even temporary, as in the case of distance vector or link state protocols. This algorithm is a new and advanced approach of the distance vector algorithms which includes: loop-free warranty, arbitrary transmissions or delay processing operations, arbitrary positive link assumption and finite time calculations for finding the shortest path towards a destination on a topology change (Garcı´a-Lunes-Aceves 1993). DUAL uses diffusing computations with the aim of solving the shortest path problem and improving the performance and resource usage of the traditional algorithms. Easy-EIGRP (Expo´sito et al. 2010, Trujillo 2010) is a didactic implementation of EIGRP with an integrated graphical viewer for the DUAL finite state machine that allows users to understand, in an easier way, EIGRP’s complex processes (including local and diffusing computations). In this paper, we present our alternative way of teaching the advanced concepts of DUAL using Easy-EIGRP. The rest of the paper is organized as follows: In Sect. 32.2, the related work is viewed. DUAL is discussed in Sect. 32.3. Our approach of teaching the DUAL algorithm is presented and justified in Sect. 32.4 and finally, conclusions and future work are discussed in Sect. 32.5.
32.2
Related Work
Currently, there is a wide range of options when it comes to open source projects for TCP/IP based routing protocols. The main goal of these projects is to promote inventive solutions for network routing. Some of the most popular routing software suites are: Zebra (http://www.zebra.org.); Quagga (http://www.quagga.net.); XORP (Handley et al. 2002); BIRD (http://bird.network.cz.); Click (Kohler 2000); Vyatta (http://www.vyatta.org.), etc. It is noteworthy that none of the projects mentioned before offer support for EIGRP, furthermore, none of them (not even the Cisco Systems Command Line Interface) provide a graphical view of the DUAL finite state machine which is the main core of the protocol, and its understanding is essential for EIGRP specialists. For this reason we decided to develop our own didactic EIGRP routing solution which provides a fully graphical environment for the understanding of the protocol. In addition we included a complete and interactive GUI for the DUAL comprehension, stating in that way the innovative quality of Easy-EIGRP.
32
A Visual Application for Teaching and Learning. . .
32.3
421
Dual in Eigrp
DUAL is a convergence algorithm that replaces the Bellman-Ford algorithm used by other distance vector protocols. DUAL was proposed by E. W. Dijkstra and C. S. Scholten (1980). The main goal of the algorithm is to perform distributed shortest path routing while maintaining the network free from loops at every instant. The Diffusing Update Algorithm relies on protocols (such as the Hello Protocol and the Reliable Transport Protocol) and data structures (such as the neighbor table and the topology table) to provide consistent information, leading to optimum route selection. For DUAL to operate correctly, the following conditions must be met (Garcı´a-Lunes-Aceves 1993): • Within a finite time, a node detects the existence of a neighbor or the loss of connectivity with a neighbor. This is handled by the Neighbor Discovery/ Recovery mechanism implemented by EIGRP. • All messages transmitted over an operational link are received correctly and in the proper sequence within a finite time. The EIGRP’s RTP (Reliable Transport Protocol) has the responsibility to ensure that this condition is met. • All messages, changes in the cost of a link, link failures, and new-neighbor notifications are processed one at a time within a finite time and in the order in which they are detected. EIGRP uses the Hello Protocol to discover neighbors and to identify the local router to neighbors; when this happens, EIGRP will attempt to form an adjacency with that neighbor. Once the adjacency is established, the router will receive updates from its new neighbor which will contain all routes known by the sending router and the metric of those routes. For each neighbor, the router will calculate a distance based on the distance advertised by the neighbor and the cost of the link to it. The lowest distance to a specific destination is called Feasible Distance (FD) (Doyle and Carrol 2005). The Feasible Condition (FC) is a condition that is met when a neighbor’s advertised or reported distance (RD) to a destination is strictly lower that the router’s FD to that destination. Any neighbor that meets the FC would be labeled as a Feasible Successor (FS). The FS that provides the lowest distance to a destination is labeled as a Successor and is the next hop that the router sets in order to reach that destination. It is important to mention that there might be more than one Successor and that unequal cost balancing is also allowed by EIGRP. The FSs and the FC are the elements that ensure that loops will be avoided. Because FSs always have the shortest metric distance to a destination, a router will never choose a path that will lead back through itself (creating a loop), since such path would have a distance larger that the FD, therefore the FC would not be met. EIGRP works with a topology table, where all the known destinations are recorded. Each destination is registered along with its FD and the corresponding FSs. For each FS, its advertised distance and interface of connectivity will be recorded. The DUAL’s computations can be summarized into two processes (local computations and diffusing computations) as described below.
422
V.T. Di Biase et al.
32.3.1 Local Computations The local computations are carried out by an EIGRP router as long as it can resolve a change in the network topology without querying its neighbors, in other words, if the router can resolve a specific situation locally. For example, if an EIGRP router faces an increased metric from its Successor and the router has at least one additional FS for the same destination, the action is immediate: the new route is selected and updates are sent to all the neighbors to inform them about the change in the network topology. When performing local computations, the affected route will stay passive. It is important to mention that if an EIGRP router can resolve a topology variation with local computations, it does not mean its neighbors are going to have the same opportunity.
32.3.2 Diffusing Computations When an EIGRP router cannot find an alternate route (no alternate route exists, or the new best route still goes through the affected Successor), it starts a diffusing computation by asking all its neighbors about an alternate route. A diffusing computation is performed in a series of steps: 1. 2. 3. 4.
The affected route is marked active in the topology table. A reply-status table is created to track the replies expected from the neighbors. A query is sent to the neighbors. Responses are collected and stored in the topology table. At the same time, the corresponding entry in the reply-status table is updated. 5. The best response is selected in the topology table and the new best route is installed in the routing table. 6. If necessary, an update is sent to the neighbors to inform them of the changed network topology (Pepelnjk 2000). Every time a query is sent to a neighbor, an independent timer is started for this neighbor in order to guaranteed network convergence in a reasonable time, which constitutes one of EIGRP principles. EIGRP ensures one more time that any loop will be avoided thanks to the use of the route’s status flag. If, for example, a router receives a query from a neighbor which is performing a diffusing computation, and the query is about a route that has already being marked as active, the router will reply with its current best path and it will stop the query processing, avoiding the creation of a query loop and an upcoming package flooding. In most cases, after a diffusing computation is complete, the router that initiated the computation must distribute the obtained results. DUAL has a finite state machine (DUAL finite state machine) which controls all the possible states in which a router can be found while performing diffusing computations (if the router is performing local computations, the finite state machine
32
A Visual Application for Teaching and Learning. . .
423
Fig. 32.1 DUAL finite state machine
will be executing the IE1 event, leading to the r ¼ 0, O ¼ 1 state). Because there are multiple types of input events that can cause a route to change its state, some of which might occur while a route is active, DUAL defines multiple active states. A query origin flag (O) is used to indicate the current state. Figure 32.1 and Table 32.1 show the complete DUAL finite state machine (Doyle and Carrol 2005; Expo´sito et al. 2010). As discussed before, diffusing computation requires that a router receives the replies from all the neighbors it queried in order to select the new best route. However, there are extreme circumstances in which a neighbor might fail to respond a query. Any of those circumstances blocks the router originating the diffusing computation. To prevent these types of deadlock situations, EIGRP contains a built-in safety measure: a maximum amount of time a diffusing computation can take to execute. Whenever a diffusing computation takes longer than the timeout value, the diffusing computation is aborted; the adjacency with any nonresponding neighbors is cleared, and the computation proceeds as if these neighbors replied with an infinite metric. The route for which the computation is aborted is said to be SIA (Stuck In Active). It is clear that the processes describe before are not easy to understand. For teaching and learning purpose, a visual approach can dramatically support the users (e.g., students with only basic knowledge of networking) in their understanding of every single detail of this algorithm. That is the reason why we developed a graphical implementation of the DUAL algorithm which can be found in the DUAL Finite
424
V.T. Di Biase et al.
Table 32.1 Input events for the DUAL finite state machine Input event Description IE1 Any input event from which FC is satisfied or the destination is unreachable IE2 Query received from the Successor; FC not satisfied IE3 Input event other than a query from the Successor; FC not satisfied IE4 Input event other than a last reply or a query from the Successor Input event other than a last reply, a query from the Successor, or an increase in the IE5 distance to destination IE6 Input event other than a last reply IE7 Input event other than a last reply or an increase in the distance to destination IE8 Increase in the distance to destination IE9 Last reply received; FC not met with current FD IE10 Query received from Successor IE11 Last reply received; FC met with current FD IE12 Last reply received; set FD to infinity
State Machine module of Easy-EIGRP. The main purpose of this module is to provide an intuitive and interactive tool for teaching and learning how EIGRP really works.
32.4
Dual’s Implementation in Easy-Eigrp
Easy-EIGRP (Expo´sito et al. 2010, http://www.zebra.org.) is a didactic application which main goal is the teaching and learning of EIGRP. Easy-EIGRP provides five modules with the aim of allowing users an improved, efficient and easy way to understand the protocol; these modules are: (1) the EIGRP Manager, (2) the DUAL Finite State Machine module, (3) the Partial Network Map Viewer, (4) the EIGRP Tables Viewer and, (5) the Logger module. In this paper we will focus on the DUAL Finite State Machine module, although it is worth to mention that the Logger module plays an important role when it comes to debugging (refer to Expo´sito et al. 2010 for further details). The DUAL implementation is based in a set of Java classes (DUAL, NeighborDiscovery, RTP, etc.) which are represented in Fig. 32.2, along with their corresponding relationships. The NeighborDiscovery class is responsible for discovering neighbors and setting new adjacencies. It also manages EIGRP table’s inputs (represented by the RoutingTable, the TopologyTable and the NeighborTable classes) every time a topology change is detected based on the loss (which will be announced by the HoldTimerRTOThread) or discovery of a neighbor. The diagram also shows that the NeighborTable, the RoutingTable and the TopologyTable are composed of Neighbors, Routes and Destination objects respectively. In turn, Destination objects are formed of a set of FeasibleSuccessors and Distances.
32
A Visual Application for Teaching and Learning. . .
425
Fig. 32.2 Easy-EIGRP’s DUAL class diagram
The RTP (Realiable Transport Protocol) class is the one that will manage EIGRP’s packet exchange, guaranteeing the delivery and the ordering of these packets. This class relies on the ListenerThread’s operation, since this thread is the entity that will monitor the PC’s corresponding interfaces, passing the received packets to the RTP layer, where these packets will be processed. Finally, the DUAL class embodies the decision process for all route computation by tracking all routes advertised by all neighbors. Whenever a local or a diffusing computation happens, DUAL will notify the pertinent changes to the EIGRPTablesPanel that eventually will update the EIGRP’s tables. If a diffusing computation occurs, DUAL will indicate to change the state of a Destination on the TopologyTable and to assemble its corresponding replyStatusTable. Additionally, the HoldTimerRTOThread will keep track of the SIA timers in order to prevent possible deadlock situations discussed in Sect. 32.3. The interaction of the DUAL class can be represented as shown in Fig. 32.3. Its operation was designed following some basic rules defined in (Pepelnjk 2000): 1. Whenever a router chooses a new Successor, it informs all its other neighbors about the new RD (Reported Distance). 2. Every time a router selects a Successor, it sends a poison update to its Successor.
426
V.T. Di Biase et al.
Fig. 32.3 DUAL’s module interaction
3. A poison update is sent to all neighbors on the interface through which the Successor is reachable unless split-horizon is turned off, in which case, it is sent to only the Successor. Easy-EIGRP also implements a set of methods which are capable of determining and handling both local and diffusing computations which are discussed next. It is important to mention that, since Easy-EIGRP was developed using reverse engineering, all its processes, operations and rules are the same defined by EIGRP. Easy-EIGRP intends to emulate the EIGRP protocol. When Easy-EIGRP is not performing diffusing computations, each route is in the passive state, as stated by the DUAL’s rules of EIGRP. Easy-EIGRP will reassess its list of FSs for a route, when a change in the cost or state of a directly connected link is detected or due to the reception of an update, a query or a reply packet. In case Easy-EIGRP’s DUAL module is not able to find a possible Successor (different from the current one) for the affected destination, the application will begin a diffusing computation and it will change the route to the active state. Next, Easy-EIGRP will perform every single step stated in Sect. 32.3.2. EasyEIGRP begins a diffusing computation by sending queries to all its neighbors. Simultaneously, the application will create a data structure which represents EIGRP’s reply-status table; this table will contain an input (including the neighbor’s IP address, a time stamp for the query sent, a time stamp for the reply received, etc.) for each neighbor queried. Since every step processed on a diffusing computation corresponds to an input event in the DUAL finite state machine (see Fig. 32.1) and since there is no didactic tool that represents or explain this process, we developed a graphical view that allows users to witness the whole computation (see Fig. 32.4). Easy-EIGRP’s DUAL finite state machine module is composed of five sections: 1. The prefix list panel, located on the left upper corner, is responsible for listing and maintaining record of all the prefixes that were handled by the application at
32
A Visual Application for Teaching and Learning. . .
427
Fig. 32.4 Easy-EIGRP’s DUAL finite state machine viewer
some point. It is noteworthy that even if the prefix is lost, this panel will maintain the prefix record for playback. 2. The DUAL finite state machine panel, located on the right upper corner, allows users to view an image of the whole finite state machine used by EIGRP, which can be animated at any time the users require it. 3. In the middle left, users can find a logger which will provide information about every change registered by the finite state machine. Every time users select an event on the logger, the image of the finite state machine will be updated, showing users the current state of the machine at that point. In some cases, the finite state machine panel will show additional information on its right upper corner about the selected event, generating a correspondence between the event described by the logger and the events establish on Table 32.1. It is important to mention that each prefix owns an independent logger that, no matter what happens, will always keep past events or computations. 4. The reply-status table, located in the middle right is independent for each diffusing computation process and is unique for each prefix. The table is composed of 4 fields: SIA timer, IP address of the queried neighbor, reply status, and a field that represents the reply status graphically. The last field can show three types of images: a clock (indicating the router is waiting for a reply), a green check mark (indicating the router already received the expected reply) or a red X (indicating that the expected reply never arrived and that the
428
V.T. Di Biase et al.
DUAL Finite State Machine Module
5
1
Logger Information
2 1.5
Navegability
2.5
Controls Section
3
Animation
4 3.5
Events Reproduction
4.5
0.5 0
Fig. 32.5 Survey results
SIA timer limit has been reached). Each reply-status table’s row has a direct correspondence with an entry of the replyStatusTable (more specifically with a ReplyStatusNode) of a Destination object. 5. Finally, there is a media reproduction control panel in the bottom of the module. The main goal of this panel is to provide full flexibility of reproduction (forward, backward, pause, etc.) of past or current diffusing computations. Easy-EIGRP allows users to modify the delay of the reproducing events in order to allow a detailed analysis. This panel also let users specify a range of events on the logger for later reproduction. It is a fact that it is easier to understand abstract topics, such as the diffusing computation processes and general networking issues, with the help of images, colors and animations, that using traditional command line interfaces or shells. That is the reason why Easy-EIGRP implements these types of tools which provide a more natural way of observing and analyzing the behavior and the decision process of DUAL. Every single section of the DUAL module was designed and developed to improve and facilitate a detailed study of the algorithm, providing an excellent, interactive and friendly interface. To prove that the benefits of this graphical view are significant, we conducted a set of experimental laboratories where we grouped around 50 computer science students with different levels of networking knowledge. We created two types of laboratories. In the first laboratory, a computer with several network interface cards was connected to Cisco 2811 routers through Fast Ethernet connections. In the computer, students installed Easy-EIGRP and configured EIGRP in all the devices to see how routing information was propagated and to understand DUAL. The idea behind the second laboratory is to do a realistic EIGRP practice in a single computer. For that, we did a virtual appliance for Easy-EIGRP, and students created some virtual machines (by cloning the virtual appliance), connected them in a
32
A Visual Application for Teaching and Learning. . .
429
network, configured Easy-EIGRP in these virtual machines, and also studied how routing information was propagated and handled by DUAL. After the experiment ended, a survey was filled by them. The results are shown in Fig. 32.5, where we can see the high acceptance level of Easy-EIGRP’s DUAL module, particularly about characteristics such a impact of the events reproduction, animations features, utility of the controls section, navigability, etc.
32.5
Conclusions and Future Work
Easy-EIGRP is a limited implementation of the EIGRP protocol which can be used on both Windows and Linux for teaching and learning purposes. This application includes a powerful module which implements DUAL and offers a set of graphical interfaces for debugging and understanding the processes carried out by this complex algorithm. We had been using Easy-EIGRP to support the teaching and learning of DUAL with a group of networking students. The feedback that we received from these students was very positive. Most of them could easily understand EIGRP’s operation; in addition, they claimed that the DUAL Finite State Machine panel provided them a very positive support in order to analyze how the Diffusing Update Algorithm works. For future work, we plan to further develop Easy-EIGRP to support IPv6 (Internet Protocol version 6) (Davies 2008), since IPv6 will become the predominant layer-3 protocols in tomorrow’s networks. Since our main goal is to offer Cisco instructors an excellent didactic application for the teaching of EIGRP, we already contacted Cisco Systems for a possible distribution of Easy-EIGRP, and now we are looking forward for an answer. Acknowledgment This work was partially supported by the CNTI (Centro Nacional de Tecnologı´as de Informacio´n).
References The BIRD internet routing daemon. http://bird.network.cz. Coltun R, Ferguson D, Moy J (1999) OSPF for IPv6. RFC 2740 Davies J (2008) Understanding IPv6. 2nd edn. Microsoft Press, USA Dijkstra EW, Scholten CS (1980) Termination detection for diffusing computations. Inf Process Lett 11(1):1–4 Doyle J, Carrol J (2005) Routing TCP/IP, vol 1, 2nd edn. Cisco Press, Indianapolis, USA Expo´sito J, Trujillo V, Gamess E (2010) Easy-EIGRP: a didactic application for teaching and learning of the enhanced interior gateway routing protocol Garcı´a-Lunes-Aceves J(1993) Loop-free routing using diffusing computations. IEEE/ACM transactions on networking, 1(1):130-141 GNU Zebra. http://www.zebra.org.
430
V.T. Di Biase et al.
Handley M, Hodson O, Kohler E (2002) XORP: an open platform for network research. ACM SIGCOMM computer communication review, pp 53–57 Hedrick C (1998) Routing information protocol. STD 34, RFC 1058 ISO (2002) Intermediate system to intermediate system intra-domain routing exchange protocol for use in conjunction with the protocol for providing the connectionless-mode network service. ISO 8473, international standard 10589:2002, Second Edition Kohler E, Morris R, Chen B, Jannotti J, Frans Kaashoek M (2000) The click modular router. Laboratory for Computer Science, MIT Malkin G (1998) RIP Version 2. RFC 2453 Moy J (1989) The OSPF specification. RFC 1131 Oran D (1990) OSI IS-IS intra-domain routing protocol. RFC 1142 Pepelnjk I (2000) EIGRP network design solutions: the definitive resource for EIGRP design, deployment, and operation. Cisco Press, Indianapolis, USA Quagga Routing Suite. http://www.quagga.net. Trujillo V, Expo´sito J, Gamess E (2010) An alternative way of teaching the advanced concepts of the diffusing update algorithm for EIGRP. Lecture notes in engineering and computer science: proceedings of the world congress on engineering and computer science 2010, WCECS 2010, San Francisco, 20–22 Oct 2010, pp 107–112 Vyatta. http://www.vyatta.org.