Junzo Watada, Gloria Phillips-Wren, Lakhmi C. Jain, and Robert J. Howlett (Eds.) Intelligent Decision Technologies
Smart Innovation, Systems and Technologies 10 Editors-in-Chief Prof. Robert J. Howlett KES International PO Box 2115 Shoreham-by-sea BN43 9AF UK E-mail:
[email protected]
Prof. Lakhmi C. Jain School of Electrical and Information Engineering University of South Australia Adelaide, Mawson Lakes Campus South Australia SA 5095 Australia E-mail:
[email protected]
Further volumes of this series can be found on our homepage: springer.com Vol. 1. Toyoaki Nishida, Lakhmi C. Jain, and Colette Faucher (Eds.) Modeling Machine Emotions for Realizing Intelligence, 2010 ISBN 978-3-642-12603-1 Vol. 2. George A. Tsihrintzis, Maria Virvou, and Lakhmi C. Jain (Eds.) Multimedia Services in Intelligent Environments – Software Development Challenges and Solutions, 2010 ISBN 978-3-642-13354-1 Vol. 3. George A. Tsihrintzis and Lakhmi C. Jain (Eds.) Multimedia Services in Intelligent Environments – Integrated Systems, 2010 ISBN 978-3-642-13395-4 Vol. 4. Gloria Phillips-Wren, Lakhmi C. Jain, Kazumi Nakamatsu, and Robert J. Howlett (Eds.) Advances in Intelligent Decision Technologies – Proceedings of the Second KES International Symposium IDT 2010, 2010 ISBN 978-3-642-14615-2 Vol. 5. Robert J. Howlett (Ed.) Innovation through Knowledge Transfer, 2010 ISBN 978-3-642-14593-3 Vol. 6. George A. Tsihrintzis, Ernesto Damiani, Maria Virvou, Robert J. Howlett, and Lakhmi C. Jain (Eds.) Intelligent Interactive Multimedia Systems and Services, 2010 ISBN 978-3-642-14618-3 Vol. 7. Robert J. Howlett, Lakhmi C. Jain, and Shaun H. Lee (Eds.) Sustainability in Energy and Buildings, 2010 ISBN 978-3-642-17386-8 Vol. 8. Ioannis Hatzilygeroudis and Jim Prentzas (Eds.) Combinations of Intelligent Methods and Applications, 2010 ISBN 978-3-642-19617-1 Vol. 9. Robert J. Howlett (Ed.) Innovation through Knowledge Transfer 2010, 2011 ISBN 978-3-642-20507-1 Vol. 10. Junzo Watada, Gloria Phillips-Wren, Lakhmi C. Jain, and Robert J. Howlett (Eds.) Intelligent Decision Technologies, 2011 ISBN 978-3-642-22193-4
Junzo Watada, Gloria Phillips-Wren, Lakhmi C. Jain, and Robert J. Howlett (Eds.)
Intelligent Decision Technologies Proceedings of the 3rd International Conference on Intelligent Decision Technologies (IDT´ 2011)
123
Prof. Junzo Watada
Prof. Lakhmi C. Jain
Waseda University Graduate School of Information, Production and Systems (IPS) 2-7 Hibikino Wakamatsuku, Fukuoka Kitakyushu 808-0135 Japan E-mail:
[email protected]
University of South Australia School of Electrical and Information Engineering Mawson Lakes Campus Adelaide South Australia SA 5095 Australia E-mail:
[email protected]
Prof. Gloria Phillips-Wren
Prof. Robert J. Howlett
Loyola University Maryland Sellinger School of Business and Management 4501 N Charles Street Baltimore, MD 21210 USA E-mail:
[email protected]
KES International PO Box 2115 Shoreham-by-sea West Sussex BN43 9AF United Kingdom E-mail:
[email protected]
ISBN 978-3-642-22193-4
e-ISBN 978-3-642-22194-1
DOI 10.1007/978-3-642-22194-1 Smart Innovation, Systems and Technologies
ISSN 2190-3018
Library of Congress Control Number: 2011930859 c 2011 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed on acid-free paper 987654321 springer.com
Preface
Intelligent Decision Technologies (IDT) seeks an interchange of research on intelligent systems and intelligent technologies which enhance or improve decision making in industry, government and academia. The focus is interdisciplinary in nature, and includes research on all aspects of intelligent decision technologies, from fundamental development to the applied system. The field of intelligent systems is expanding rapidly due, in part, to advances in Artificial Intelligence and environments that can deliver the technology when and where it is needed. One of the most successful areas for advances has been intelligent decision making and related applications. Intelligent decision systems are based upon research in intelligent agents, fuzzy logic, multi-agent systems, artificial neural networks, and genetic algorithms, among others. Applications of intelligence-assisted decision making can be found in management, international business, finance, accounting, marketing, healthcare, medical and diagnostic systems, military decisions, production and operation, networks, traffic management, crisis response, human-machine interfaces, financial and stock market monitoring and prediction, and robotics. Some areas such as virtual decision environments, social networking, 3D human-machine interfaces, cognitive interfaces, collaborative systems, intelligent web mining, e-commerce, e-learning, e-business, bioinformatics, evolvable systems, virtual humans, and designer drugs are just beginning to emerge. In this volume we publish the research of scholars from the Third KES International Symposium on Intelligent Decision Technologies (KES IDT’11), hosted and organized by the University of Piraeus, Greece, in conjunction with KES International. The book contains chapters based on papers selected from a large number of submissions for consideration for the symposium from the international community. Each paper was double-blind, peer-reviewed by at least two independent referees. The best papers were accepted based on recommendations of the reviewers and after required revisions had been undertaken by the authors. The final publication represents the current leading thought in intelligent decision technologies. We wish to express our sincere gratitude to the plenary speakers, invited session chairs, delegates from all over the world, the authors of various chapters and reviewers for their outstanding contributions. We express our sincere thanks to the University of Piraeus for their sponsorship and support of the symposium. We thank the International Programme Committee for their support and assistance. We would like to thank Peter Cushion of KES International for his help with
VI
Preface
organizational issues. We thank the editorial team of Springer-Verlag for their support in production of this volume. We sincerely thank the Local Organizing Committee, Professors Maria Virvou and George Tsihrintzis, and students at the University of Piraeus for their invaluable assistance. We hope and believe that this volume will contribute to ideas for novel research and advancement in intelligent decision technologies for researchers, practitioners, professors and research students who are interested in knowledge-based and intelligent engineering systems. Piraeus, Greece 20–22 July 2011
Junzo Watada Gloria Phillips-Wren Lakhmi C. Jain Robert J. Howlett
Contents
Part I: Modeling and Method of Decision Making 1
2
3
4
5
6
A Combinational Disruption Recovery Model for Vehicle Routing Problem with Time Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xuping Wang, Junhu Ruan, Hongyan Shang, Chao Ma
3
A Decision Method for Disruption Management Problems in Intermodal Freight Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minfang Huang, Xiangpei Hu, Lihua Zhang
13
A Dominance-Based Rough Set Approach of Mathematical Programming for Inducing National Competitiveness . . . . . . . . . . . . Yu-Chien Ko, Gwo-Hshiung Tzeng
23
A GPU-Based Parallel Algorithm for Large Scale Linear Programming Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianming Li, Renping Lv, Xiangpei Hu, Zhongqiang Jiang
37
A Hybrid MCDM Model on Technology Assessment to Business Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mei-Chen Lo, Min-Hsien Yang, Chien-Tzu Tsai, Aleksey V. Pugovkin, Gwo-Hshiung Tzeng A Quantitative Model for Budget Allocation for Investment in Safety Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuji Sato
7
Adapted Queueing Algorithms for Process Chains . . . . . . . . . . . . . . . ´ Agnes Bog´ardi-M´esz¨oly, Andr´as R¨ovid, P´eter F¨oldesi
8
An Improved EMD Online Learning-Based Model for Gold Market Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shifei Zhou, Kin Keung Lai
47
57 65
75
VIII
9
Contents
Applying Kansei Engineering to Decision Making in Fragrance Form Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chun-Chun Wei, Min-Yuan Ma, Yang-Cheng Lin
85
10 Biomass Estimation for an Anaerobic Bioprocess Using Interval Observer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elena M. Bunciu
95
11 Building Multi-Attribute Decision Model Based on Kansei Information in Environment with Hybrid Uncertainty . . . . . . . . . . . 103 Junzo Watada, Nureize Arbaiy 12 Building on the Synergy of Machine and Human Reasoning to Tackle Data-Intensive Collaboration and Decision Making . . . . . . . . 113 Nikos Karacapilidis, Stefan R¨uping, Manolis Tzagarakis, Axel Poign´e, Spyros Christodoulou 13 Derivations of Information Technology Strategies for Enabling the Cloud Based Banking Service by a Hybrid MADM Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Chi-Yo Huang, Wei-Chang Tzeng, Gwo-Hshiung Tzeng, Ming-Cheng Yuan 14 Difficulty Estimator for Converting Natural Language into First Order Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Isidoros Perikos, Foteini Grivokostopoulou, Ioannis Hatzilygeroudis, Konstantinos Kovas 15 Emergency Distribution Scheduling with Maximizing Marginal Loss-Saving Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Yiping Jiang, Lindu Zhao 16 Fuzzy Control of a Wastewater Treatment Process . . . . . . . . . . . . . . . 155 Alina Chiros¸c˘a, George Dumitras¸cu, Marian Barbu, Sergiu Caraman 17 Interpretation of Loss Aversion in Kano’s Quality Model . . . . . . . . . 165 P´eter F¨oldesi, J´anos Botzheim 18 MCDM Applications on Effective Project Management for New Wafer Fab Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Mei-Chen Lo, Gwo-Hshiung Tzeng 19 Machine Failure Diagnosis Model Applied with a Fuzzy Inference Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Lily Lin, Huey-Ming Lee 20 Neural Network Model Predictive Control of a Wastewater Treatment Bioprocess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Dorin S¸endrescu, Emil Petre, Dan Popescu, Monica Roman
Contents
IX
21 Neural Networks Based Adaptive Control of a Fermentation Bioprocess for Lactic Acid Production . . . . . . . . . . . . . . . . . . . . . . . . . 201 Emil Petre, Dan Selis¸teanu, Dorin S¸endrescu 22 New Evaluation Method for Imperfect Alternative Matrix . . . . . . . . 213 Toshimasa Ozaki, Kanna Miwa, Akihiro Itoh, Mei-Chen Lo, Eizo Kinoshita, Gwo-Hshiung Tzeng 23 Piecewise Surface Regression Modeling in Intelligent Decision Guidance System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Juan Luo, Alexander Brodsky 24 Premises of an Agent-Based Model Integrating Emotional Response to Risk in Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Ioana Florina Popovici 25 Proposal of Super Pairwise Comparison Matrix . . . . . . . . . . . . . . . . . 247 Takao Ohya, Eizo Kinoshita 26 Reduction of Dimension of the Upper Level Problem in a Bilevel Programming Model Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Vyacheslav V. Kalashnikov, Stephan Dempe, Gerardo A. P´erez-Vald´es, Nataliya I. Kalashnykova 27 Reduction of Dimension of the Upper Level Problem in a Bilevel Programming Model Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Vyacheslav V. Kalashnikov, Stephan Dempe, Gerardo A. P´erez-Vald´es, Nataliya I. Kalashnykova 28 Representation of Loss Aversion and Impatience Concerning Time Utility in Supply Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 P´eter F¨oldesi, J´anos Botzheim, Edit S¨ule 29 Robotics Application within Bioengineering: Neuroprosthesis Test Bench and Model Based Neural Control for a Robotic Leg . . . . 283 Dorin Popescu, Dan Selis¸teanu, Marian S. Poboroniuc, Danut C. Irimia 30 The Improvement Strategy of Online Shopping Service Based on SIA-NRM Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Chia-Li Lin 31 The Optimization Decisions of the Decentralized Supply Chain under the Additive Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Peng Ma, Haiyan Wang 32 The Relationship between Dominant AHP/CCM and ANP . . . . . . . . 319 Eizo Kinoshita, Shin Sugiura
X
Contents
33 The Role of Kansei/Affective Engineering and Its Expected in Aging Society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Hisao Shiizuka, Ayako Hashizume
Part II: Decision Making in Finance and Management 34 A Comprehensive Macroeconomic Model for Global Investment . . . 343 Ming-Yuan Hsieh, You-Shyang Chen, Chien-Jung Lai, Ya-Ling Wu 35 A DEMATEL Based Network Process for Deriving Factors Influencing the Acceptance of Tablet Personal Computers . . . . . . . . 355 Chi-Yo Huang, Yi-Fan Lin, Gwo-Hshiung Tzeng 36 A Map Information Sharing System among Refugees in Disaster Areas, on the Basis of Ad-Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . 367 Koichi Asakura, Takuya Chiba, Toyohide Watanabe 37 A Study on a Multi-period Inventory Model with Quantity Discounts Based on the Previous Order . . . . . . . . . . . . . . . . . . . . . . . . 377 Sungmook Lim 38 A Study on the ECOAccountancy through Analytical Network Process Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Chaang-Yung Kung, Chien-Jung Lai, Wen-Ming Wu, You-Shyang Chen, Yu-Kuang Cheng 39 Attribute Coding for the Rough Set Theory Based Rule Simplications by Using the Particle Swarm Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Jieh-Ren Chang, Yow-Hao Jheng, Chi-Hsiang Lo, Betty Chang 40 Building Agents by Assembly Software Components under Organizational Constraints of Multi-Agent System . . . . . . . . . . . . . . 409 Siam Abderrahim, Maamri Ramdane 41 Determining an Efficient Parts Layout for Assembly Cell Production by Using GA and Virtual Factory System . . . . . . . . . . . . 419 Hidehiko Yamamoto, Takayoshi Yamada 42 Development of a Multi-issue Negotiation System for E-Commerce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Bala M. Balachandran, R. Gobbin, Dharmendra Sharma 43 Effect of Background Music Tempo and Playing Method on Shopping Website Browsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 Chien-Jung Lai, Ya-Ling Wu, Ming-Yuan Hsieh, Chang-Yung Kung, Yu-Hua Lin
Contents
XI
44 Forecasting Quarterly Profit Growth Rate Using an Integrated Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 You-Shyang Chen, Ming-Yuan Hsieh, Ya-Ling Wu, Wen-Ming Wu 45 Fuzzy Preference Based Organizational Performance Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Roberta O. Parreiras, Petr Ya Ekel 46 Generating Reference Business Process Model Using Heuristic Approach Based on Activity Proximity . . . . . . . . . . . . . . . . . . . . . . . . 469 Bernardo N. Yahya, Hyerim Bae 47 How to Curtail the Cost in the Supply Chain? . . . . . . . . . . . . . . . . . . 479 Wen-Ming Wu, Chaang-Yung Kung, You-Shyang Chen, Chien-Jung Lai 48 Intelligent Decision for Dynamic Fuzzy Control Security System in Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Xu Huang, Pritam Gajkumar Shah, Dharmendra Sharma 49 Investigating the Continuance Commitment of Volitional Systems from the Perspective of Psychological Attachment . . . . . . . . . . . . . . . 501 Huan-Ming Chuang, Chyuan-Yuh Lin, Chien-Ku Lin 50 Market Structure as a Network with Positively and Negatively Weighted Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Takeo Yoshikawa, Takashi Iino, Hiroshi Iyetomi 51 Method of Benchmarking Route Choice Based on the Input Similarity Using DEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 Jaehun Park, Hyerim Bae, Sungmook Lim 52 Modelling Egocentric Communication and Learning for Human-Intelligent Agents Interaction . . . . . . . . . . . . . . . . . . . . . . . . . 529 R. Gobbin, Masoud Mohammadian, Bala M. Balachandran 53 Multiscale Community Analysis of a Production Network of Firms in Japan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 Takashi Iino, Hiroshi Iyetomi 54 Notation-Support Method in Music Composition Based on Interval-Pitch Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 Masanori Kanamaru, Koichi Hanaue, Toyohide Watanabe 55 Numerical Study of Random Correlation Matrices: Finite-Size Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 Yuta Arai, Kouichi Okunishi, Hiroshi Iyetomi
XII
Contents
56 Predicting of the Short Term Wind Speed by Using a Real Valued Genetic Algorithm Based Least Squared Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Chi-Yo Huang, Bo-Yu Chiang, Shih-Yu Chang, Gwo-Hshiung Tzeng, Chun-Chieh Tseng 57 Selecting English Multiple-Choice Cloze Questions Based on Difficulty-Based Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 Tomoko Kojiri, Yuki Watanabe, Toyohide Watanabe 58 Testing Randomness by Means of RMT Formula . . . . . . . . . . . . . . . . 589 Xin Yang, Ryota Itoi, Mieko Tanaka-Yamawaki 59 The Effect of Web-Based Instruction on English Writing for College Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 Ya-Ling Wu, Wen-Ming Wu, Chaang-Yung Kung, Ming-Yuan Hsieh 60 The Moderating Role of Elaboration Likelihood on Information System Continuance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Huan-Ming Chuang, Chien-Ku Lin, Chyuan-Yuh Lin 61 The Type of Preferences in Ranking Lists . . . . . . . . . . . . . . . . . . . . . . 617 Piech Henryk, Grzegorz Gawinowski 62 Transaction Management for Inter-organizational Business Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629 Joonsoo Bae, Nita Solehati, Young Ki Kang 63 Trend-Extraction of Stock Prices in the American Market by Means of RMT-PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Mieko Tanaka-Yamawaki, Takemasa Kido, Ryota Itoi 64 Using the Rough Set Theory to Investigate the Building Facilities for the Performing Arts from the Performer’s Perspectives . . . . . . . . 647 Betty Chang, Hung-Mei Pei, Jieh-Ren Chang
Part III: Data Analysis and Data Navigation 65 A Novel Collaborative Filtering Model for Personalized Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661 Wang Qian 66 A RSSI-Based Localization Algorithm in Smart Space . . . . . . . . . . . 671 Liu Jian-hui, Han Chang-jun 67 An Improved Bee Algorithm-Genetic Algorithm . . . . . . . . . . . . . . . . 683 Huang Ming, Ji Baohui, Liang Xu
Contents
XIII
68 Application of Bayesian Network in Failure Diagnosis of Hydro-electrical Simulation System . . . . . . . . . . . . . . . . . . . . . . . . . . . 691 Zhou Yan, Li Peng 69 Application of Evidence Fusion Theory in Water Turbine Model . . . 699 Li Hai-cheng, Qi Zhi 70 Calculating Query Likelihoods Based on Web Data Analysis . . . . . . 707 Koya Tamura, Kenji Hatano, Hiroshi Yadohisa 71 Calculating Similarities between Tree Data Based on Structural Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 Kento Ikeda, Takashi Kobayashi, Kenji Hatano, Daiji Fukagawa 72 Continuous Auditing for Health Care Decision Support Systems . . . 731 Robert D. Kent, Atif Hasan Zahid, Anne W. Snowdon 73 Design and Implementation of a Primary Health Care Services Navigational System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743 Robert D. Kent, Paul D. Preney, Anne W. Snowdon, Farhan Sajjad, Gokul Bhandari, Jason McCarrell, Tom McDonald, Ziad Kobti 74 Emotion Enabled Model for Hospital Medication Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 Dreama Jain, Ziad Kobti, Anne W. Snowdon 75 Health Information Technology in Canada’s Health Care System: Innovation and Adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763 Anne W. Snowdon, Jeremy Shell, Kellie Leitch, O. Ont, Jennifer J. Park 76 Hierarchical Clustering for Interval-Valued Functional Data . . . . . . 769 Nobuo Shimizu 77 Multidimensional Scaling with Hyperbox Model for Percentile Dissimilarities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779 Yoshikazu Terada, Hiroshi Yadohisa 78 Predictive Data Mining Driven Architecture to Guide Car Seat Model Parameter Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789 Sabbir Ahmed, Ziad Kobti, Robert D. Kent 79 Symbolic Hierarchical Clustering for Visual Analogue Scale Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799 Kotoe Katayama, Rui Yamaguchi, Seiya Imoto, Hideaki Tokunaga, Yoshihiro Imazu, Keiko Matsuura, Kenji Watanabe, Satoru Miyano
XIV
Contents
Part IV: Miscellanea 80 Acquisition of User’s Learning Styles Using Log Mining Analysis through Web Usage Mining Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 809 Sucheta V. Kolekar, Sriram G. Sanjeevi, D.S. Bormane 81 An Agent Based Middleware for Privacy Aware Recommender Systems in IPTV Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821 Ahmed M. Elmisery, Dmitri Botvich 82 An Intelligent Decision Support Model for Product Design . . . . . . . . 833 Yang-Cheng Lin, Chun-Chun Wei 83 Compromise in Scheduling Objects Procedures Basing on Ranking Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843 Piech Henryk, Grzegorz Gawinowski 84 Decision on the Best Retrofit Scenario to Maximize Energy Efficiency in a Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853 Ana Campos, Rui Neves-Silva 85 Developing Intelligent Agents with Distributed Computing Middleware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863 Christos Sioutis, Derek Dominish 86 Diagnosis Support on Cardio-Vascular Signal Monitoring by Using Cluster Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873 Ahmed M. Elmisery, Mart´ın Serrano, Dmitri Botvich 87 Multiple-Instance Learning via Decision-Based Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885 Yeong-Yuh Xu, Chi-Huang Shih 88 Software Testing – Factor Contribution Analysis in a Decision Support Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897 Deane Larkman, Ric Jentzsch, Masoud Mohammadian 89 Sustainability of the Built Environment – Development of an Intelligent Decision System to Support Management of Energy-Related Obsolescence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907 T.E. Butt, K.G. Jones Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 921 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925
Part I Modeling and Method of Decision Making
A Combinational Disruption Recovery Model for Vehicle Routing Problem with Time Windows Xuping Wang, Junhu Ruan, Hongyan Shang, and Chao Ma
*
Abstract. A method of transforming various delivery disruptions into new customer-adding disruption is developed. The optimal starting time of delivery vehicles is analyzed and determined, which provides a new rescue strategy (starting later policy) for disrupted VRPTW. Then the paper considers synthetically customer service time, driving paths and total delivery costs to put forward a combinational disruption recovery model for VRPTW. Finally, in computational experiments, Nested Partition Method is applied to verify the effectiveness of the proposed model, as well as the strategy and the algorithm. Keywords: VRPTW, combinational disruption, disruption management, rescue strategies, nested partition method.
1 Introduction There are various unexpected disruption events encountered in the delivery process, such as vehicles break down, cargoes damage, the changes of customers’ service time, delivery addresses and demands. These disruption events, which often make actual delivery operations deviate from intended plans, may bring negative effects on the delivery system. It is necessary to develop satisfactory recovery plans quickly for minimizing the negative effects of disruption events. Vehicle routing problem (VRP), initially proposed by Dantzig and Ramser (1959), is an abstraction of the vehicle scheduling problem in real-world delivery systems. A variety of solutions for VRP have been put forward (Burak et al 2009), and a few researchers took delivery disruptions into account. Li et al (2009a, 2009b) proposed a vehicle rescheduling problem (VRSP), trying to solve the problem vehicle breakdown disruption. The thought of disruption management, which aims at minimizing the deviation of actual operations from intended plans with minimum costs, provides an effective idea to deal with unpredictable events (Jeans et al 2001). Several researchers have introduced the thought into logistics delivery field. Wang et al (2007) developed a disruption recovery model for the vehicle breakdown problem of Xuping Wang · Junhu Ruan · Chao Ma Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China e-mail:
[email protected] *
Hongyan Shang Commercial College, Assumption University of Thailand, Bangkok, 10240, Thailand J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 3–12. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
4
X. Wang et al.
VRPTW and proposed two rescue strategies: adding vehicles policy and neighboring rescue policy. Wang et al (2009a) built a multi-objective VRPTW disruption recovery model, studied the VRP disruption recovery model with fuzzy time windows (2009b), and carried out a further study on the vehicle breakdown disruption (2010). Mu et al (2010) developed Tabu Search algorithms to solve the disrupted VRP with vehicle breakdown disruption. Ding et al (2010) considered human behaviors to construct a disruption management model for delivery delay problem. Existing literatures have produced effective solutions for disrupted VRP with some certain disruption event, but the existing models and algorithms can handle only a certain type of uncertainty. It is difficult to solve actual vehicle routing problems with the reality that various disruption events often occur successively or even simultaneously. The purpose of the paper is to develop a common disruption recovery model for VRPTW, which may handle a variety and a combination of disruption events. Meanwhile, existing models for disrupted VRP did not consider the optimal starting time of delivery vehicles. Vehicles may arrive at delivery addresses early, but they can not serve until customers’ earliest service time. If the starting time is optimized for each vehicle, waiting costs may be reduced and a new rescue strategy may be provided for some disruption evens. The paper is organized as follows. A transformation method of delivery disruption events is developed in Section 2. Section 3 determines vehicles’ optimal starting time from the depot. Section 4 builds a combinational disruption recovery model for VRPTW. In Section 5, computational experiments are given to verify the effectiveness of the proposed model. Conclusions are drawn in Section 6.
2 A Transformation Method of Delivery Disruption Events 2.1 Preview of VRPTW The original VRPTW studied in the paper is as follows. One depot has K delivery vehicles with the same limited capacity. A set of customers N should all be visited once in requested time intervals. Each vehicle should leave from and return to the central depot. The target is to determine the delivery plan with the shortest total delivery distance. Notations used in the following are defined in Table 1. Table 1 Notations for original VRPTW Notations N: N0: K: Q: cij: tij: di: seri qijk : xijk : uik : Rstai : [stai,endi]: M:
Meanings A set of customers, N={1,2,…,n} A set of customers and the depot, N0= {0} N The total number of vehicles The limited capacity of each vehicle The distance between node i and j, i, j N0 The travel time between node i and j, i, j N0 The demand of node i, d0 = 0 The service time at node i The available capability of vehicle k between node i and node j A binary variable; xijk =1 mens vehicle k travels from node i to node j; otherwise xijk =0 A binary variable; uik =1 means that customer i is served by vehicle k; otherwise uik =0 The starting service time for customer i The time window of customer i; stai, earliest service time, endi, latest service time A large positive number
∪
∈
∈
A Combinational Disruption Recovery Model for Vehicle Routing Problem
5
The mathematical model of above VRPTW is: N0
K
N0
∑∑ ∑
m in
k =1 i = 0 j = 0 , j ≠ i
c ij x ijk
(1)
s.t. xijk = {0,1}
i, j ∈ N 0 , k ∈ {1,..., K }
(2)
uik = {0,1}
i ∈ N , k ∈ {1,..., K }
(3)
i ∈ N , k ∈ {1,..., K }
(4)
= ∑ uk 0 ≤ K k ∈ {1,..., K }
(5)
K
∑u k =1
ik
K
∑u k =1
=1 K
0k
N0
∑
l = 0, i ≠ l
k =1
xlik =
N0
∑
xijk = uik
j = 0,i ≠ j
N0
N0
∑d ×u − ∑q i =1
i
ik
qijk ≤ Q × xijk
j =1
0 jk
=0
i ∈ N 0 , k ∈ {1,..., K } k ∈ {1,..., K }
(7)
i, j ∈ N0 , k ∈ {1,..., K }
Rsta j ≥ Rstai + seri + tij − (1 − xijk ) × M stai ≤ Rstai ≤ endi
(6)
(8) i, j ∈ N , k ∈ {1,..., K }
i∈N
(9) (10)
where the objective function (1) is to minimize total delivery distance; constraint (4) ensures each customer is served only once by one of the vehicles; (5) ensures each vehicle should leave from and return to the depot; (6) represents the vehicle which arrives at customer i should leave from customer i; (7) and (8) represent any vehicle shouldn’t load more than its limited capability.
2.2 Delivery Disruption Events Transformation In order to develop a combinational disruption recovery model for VRPTW, the paper tries to transform different disruption events (involving the changes of customers’ requests) into new customer-adding disruption event. (1) Change of time windows Assuming that the service time of customer i is requested earlier, its original time window [stai, endi] will become [stai-△t, endi-△t] where △t is a positive number. If △t is so small that the vehicle k planned to serve customer i can squeeze out some extra time longer than △t by speeding up, the request will be ignored. If △t is large and vehicle k can’t squeeze out enough time, the request will be regarded as a disruption and customer i will be transformed into a new customer i' with time window [stai-△t, endi-△t]. Assuming that the service time of customer i is requested later, its time window [stai, endi] will be [stai+△t, endi+△t]. If △t is relatively small and vehicle k can
6
X. Wang et al.
wait for the extra time at customer i with the precondition that it will bring no effects on remaining delivery tasks, the request will be ignored and no disruption is brought to the original plan. If △t is large and vehicle k can’t wait for customer i, the request will be regarded as a disruption and customer i will be transformed into a new customer i' with time window [stai+△t, endi+△t]. (2) Change of delivery addresses If delivery addresses change, the original plan can’t deal with the changes which will be regarded as disruptions. For example, the delivery coordinate (Xi,Yi) of customer i is changed into (Xi’,Yi’), customer i will be transformed into a new customer i' with delivery coordinate (Xi’,Yi’). (3) Change of demands Changes of customers’ demands include demand reduction and demand increase. The demand reduction of some customer won’t bring disruptions on the original delivery plan. Vehicles can deliver according to their planed routing, so the demand reduction isn’t considered as a disruption in the paper. Whether the demand increase of customer i will be regarded as a disruption depends on the occurrence time t and the increase amount △d. If vehicle k which will serve customer i has left from the depot at t and no extra cargoes more than △d is loaded, the increase will be regarded as a disruption; If vehicle k has left from the depot at t and loads extra cargoes more than △d, the increase would not be regarded as a disruption. If vehicle k hasn’t left from the depot at t and can load more cargoes than △d, there will be no disruption on the original plan If vehicle k hasn’t left from the depot at t but can’t load more cargoes than △d, the demand increase is also looked as a disruption. After the demand increase being identified as a disruption, a new customer i' whose demand is △d will be added. (4) Removal of requests Customers may cancel their requests sometimes because of a certain reason, but the planed delivery routing needs no changes. When passing the removed customers, delivery vehicles just go on with no service. (5) Combinational disruption Combinational disruption refers to that some of above disruption events occur simultaneously on one customer or several customers. For one customer i with coordinate (Xi,Yi) and demand di, if its delivery address is changed into (Xi’,Yi’) and extra demand △d is requested, a new customer i' can be added with coordinate (Xi’,Yi’) and demand △d. For several customers, time window of customer i is requested earlier, from [stai, endi] to [stai-△t, endi-△t]; delivery address of customer j is changed, (Xi,Yi)→(Xi’,Yi’); extra demand △d is requested by customer m. The transformation of these disruptions is shown as Table 2.
;
Table 2 Transformation of combinational disruption from multi-customers Original customers i j m
Disruption events
△
△
[stai, endi] →[stai- t, endi- t] (Xi,Yi)→(Xi’,Yi’) dm→dm + d
△
New customers
△
△
i': [stai- t, endi- t] j’: (Xi’,Yi’) m’: d
△
Note that: After being transformed into new customers, original customers won’t be considered in new delivery plan except the customers with demand increase disruption.
A Combinational Disruption Recovery Model for Vehicle Routing Problem
7
3 Determination of the Optimal Starting Time for Vehicles Most existing researches on disrupted VRP assumed that all assigned vehicles left from the depot at time 0. Although delivery vehicles may arrive early at customers, they have to wait until the earliest service time, which will result in waiting costs. In fact, the optimal starting time of each vehicle can be determined according to its delivery tasks, which may decrease total delivery costs and provide a new rescue strategy for some disruption events. Some new notations used in the following are supplemented. Nk: the set of customers served by vehicle k, k {1,...,K}; wi: the waiting time at node i, i Nk; BSTk: the optimal starting time of vehicle k. [stai, endi] is the time window of customer i; ti-1,i is the travel time between node i-1 and i. Rstai , the starting service time for customer i, equals to the larger between the actual arrival time arri and the earliest service time stai , that is,
∈
∈
Rstai = max{arri , stai } , (i ≥ 1, i ∈ N k )
(11)
where arri depends on the starting service time for node i-1, the service time at node i-1 and the travel time ti-1,i between node i-1 and i. Thus, the actual arrival time arri at node i:
arri = Rstai −1 + seri −1 + ti −1,i , (i ≥ 1, i ∈ N k )
(12)
wi, the waiting time at node i, equals to Rstai-arri. The waiting time which can be saved by vehicle k is min{Rstai-arri}, which will equal to 0 when arri is bigger than stai for all the customers in Nk. When the actual finishing time Rstai+seri is later than the latest service time endi, the extra time at node i will be 0. When Rstai+seri ≤ endi, that is, the latest service time endi isn’t due when vehicle k finished the service for customer i, an extra time interval [Rstai+seri, endi] will exist. The total extra time of vehicle k in the delivery process, TFTLk, equals to: ⎧0, Rstai + seri > endi ; i ≥ 1, i ∈ N k (13) TFTLk =min {σ [endi − ( Rstai + seri )]} , σ = ⎨ ⎩1, Rstai + seri ≤ endi To sum up, the optimal starting time of vehicle k can be calculated by the following conditions and equations: (1) If the earliest service time of the first customer served by vehicle k is earlier than or equal to the travel time from the depot to the customer, that is, sta1≤ t0,1, the optimal starting time of vehicle k equals to 0, that is, BSTk=0. (2) If sta1> t0,1 and min{Rstai-arri}=0, then BSTk = ( sta1 − t0,1 ) + TFTLk , k ∈ {1,..., K }
(14)
(3) If sta1> t0,1 and min{Rstai-arri}>0, then BSTk = ( sta1 − t0,1 ) + min{min{Rstai − arri }, TFTLk } , i ∈ N k , k ∈ {1,..., K }
(15)
8
X. Wang et al.
4 Combinational Disruption Recovery Model for VRPTW Disruption Management aims at minimizing the negative effects caused by unexpected events to original plans, so the effects should be measured quantitatively before being taken as the minimization objective, which is called disruption measurement. The effects of disruption events on VRPTW mainly involve three aspects: customer service time, driving paths and delivery costs (Wang et al 2009a). In Section 2, the paper has transformed different disruption events into new customer-adding disruption event, so the disruption measurement on disrupted VRP, which will focus on measuring the new customer-adding disruption, is relatively simple. In disruption recovery plan, the number of customers, delivery addresses, time windows and other parameters may change, so some notations in original VRPTW are labeled as new notations correspondingly: N0→N0’, xijk→xijk’, stai→stai’, Rstai→Rstai’, endi→endi’, and so on. However, there are some notations unchanged, such as the number of vehicles K, the limited capability of vehicle Q. (1) Disruption measurement on customer service time The disruption on customers’ service time refers to that the actual arrival time is earlier than the earliest service time or later than the latest service time. The service time deviation of customer i is:
λ1 ( stai '− arri ') + λ2 ( arri '− endi ') , λ1 , λ2 ∈ {0,1}
(16)
where, if arri’<stai’, then λ1 = 1 and λ2 =0; if arri’>endi’, then λ2 = 1 and λ1 =0; if stai’ arri’ endi’, then λ1 , λ2 =0. The total service time disruption is:
≦ ≦
N'
θ ∑ (λ1 ( stai '− arri ') + λ2 (arri '− end i ')) , λ1 , λ2 ∈ {0,1}
(17)
i =1
Where θ is the penalty cost coefficient of per unit time deviation. (2) Disruption measurement on driving paths Total driving paths disruption is: N0 ' N0 '
N 0 ' N0 '
i =0 j = 0
i=0 j =0
σ ∑∑ ci j ( xi j '− xi j ) + μ ∑∑ ( xi j '− xi j ) , i, j ∈ N 0 ', xi j , xi j ' ∈ {0,1}
(18)
where σ is delivery cost coefficient of per unit distance; μ is the penalty cost coefficient of increasing or reducing a delivery path; xij, xij’ {0,1}, if there is a delivery path between node i and node j, xij=1, xij’=1, otherwise, xij=0, xij’=0. (3) Disruption measurement on delivery costs Delivery costs depend on total travel distance and the number of assigned vehicles, so total delivery costs disruption is:
∈
K
N0 '
σ (∑∑
N0 '
∑
k =1 i = 0 j = 0, j ≠ i
K
N0
cij xijk '− ∑∑
N0
∑
k =1 i = 0 j = 0, j ≠ i
K
cij xijk ) + ∑ CK (vk '− vk ) k =1
(19)
A Combinational Disruption Recovery Model for Vehicle Routing Problem N0 '
K
N0 '
∑∑ ∑
where
k =1 i = 0 j = 0, j ≠ i
N0
K
cij xijk '
9
represents the total delivery distance of the recovery plan;
N0
∑∑ ∑
k =1 i = 0 j = 0, j ≠ i
cij xijk represents the total delivery distance of the original plan; Ck is K
the fixed cost of a vehicle and
∑C k =1
∈
K
(vk '− vk ) represents the change of vehicle
fixed costs, where vk, vk’ {0,1}, if vehicle k is assigned in the original plan or in the recovery plan, vk or vk’=1, otherwise, vk or vk’=0. To sum up, a combinational disruption recovery model is developed: N'
min(θ ∑ (λ1 ( stai '− arri ') + λ2 (arri '− end i ')))
(20)
i =1
m in (σ
N0 ' N0 '
N0 ' N0 '
i= 0 j= 0
i=0 j=0
∑ ∑ c i j ( x i j '− x i j ) + μ ∑ ∑ ( x i j '− x i j )) N0 '
K
N0 '
min(σ (∑ ∑
∑
k =1 i = 0 j = 0, j ≠ i
K
N0
cij xijk '− ∑ ∑
N0
∑
k =1 i = 0 j = 0, j ≠ i
(21)
K
cij xijk ) + ∑ C K (vk '− vk ))
(22)
k =1
s.t. xijk ' = {0,1},
i, j ∈ N 0 ', k ∈ {1,..., K }
(23)
u jk ' = {0,1},
j ∈ N ', k ∈ {1,..., K }
(24)
K
∑u k =1
ik
K
∑u k =1
⎧ = 1, di ' = di '⎨ , ⎩≤ 2, di ' > di
i∈N '
(25)
K
0k
N0 '
∑
l = 0, i ≠ l
' = ∑ uk 0 ' ≤ K
(26)
k =1
xlik ' =
N0 '
∑ d '× u
N0 '
∑
j = 0, i ≠ j
xijk ' = uik ',
i ∈ N0 ', k ∈{1,..., K }
(27)
N0 '
'− ∑ q0 jk ' = 0, k ∈ {1,..., K }
(28)
qijk ' ≤ Q × xijk ' i, j ∈ N 0 ', k ∈ {1,..., K }
(29)
Rsta j ' ≥ Kstak + Rstai '+ seri '+ tij '− (1 − xijk ') × M , i, j ∈ N ', k ∈ {1,..., K }
(30)
Kstak = BSTk , k ∈ {1,..., K }
(31)
λ1 , λ2 ∈ {0,1} , xi j , xi j ' ∈ {0,1} , vk ', vk ∈ {0,1}
(32)
i =1
i
ik
j =1
Objective (20), (21) and (22) is to minimize the disruption on customers’ service time, driving paths and delivery costs respectively. Constraint (30) and (31) ensure vehicles leave from the central depot at their optimal starting time, where Kstak is the actual starting time of vehicle k and BSTk is the optimal starting time determined in Section 3.
10
X. Wang et al.
5 Computational Experiments The paper applied Nested Partitions Method (NPM) to solve the proposed model. NPM, proposed by Shi (2000), is a novel global optimization heuristic algorithm. The designed NPM algorithm for the combinational recovery model integrates three rescue strategies: (1) Adding vehicles policy. The strategy means some new vehicles which haven’t delivery tasks according to the original plan are added to meet requests of new customers. (2) Starting later policy. As vehicles don’t leave from the depot until their optimal starting time, there may be some vehicles still staying at the depot when disruption events occur. (3) Neighboring rescue policy. The strategy uses in-transit vehicles which adjoin new customers to deal with the disruptions.
5.1 Original VRPTW and Combinational Disruption Data The original VRPTW studied in the paper is from [7]: one depot owns 8 vehicles with the limited capacity 5t; the distance between two nodes can be calculated according to their coordinates; the speed of each vehicle is 1 km/h; coefficients θ , σ and μ are given 1, 1 and 10 respectively; detailed original data can be seen in [7]. By using improved genetic algorithm, [7] attained the optimal initial routing: vehicle 1: 0-8-2-11-1-4-0; vehicle 2: 0-10-5-13-0; vehicle 3: 0 -9-7-6-0; vehicle 4: 0-3-14-12-15-0. The total driving distance is 585.186. After the initial scheduling, there are still four spare vehicles in the depot. At time 32.65, change requests are received from customer 4, 8, 11, 14 and a new customer 16 occurs. The detailed change data are shown in Table 3. Table 3 Data of changes Customers
4 8 11 14 16
Original Original time Original demands coordinates windows (53,19) [96,166] 0.6 (56,4) [9,79] 0.2 (41,10) [74,174] 0.9 (73,29) [56,156] 1.8 -
New coordinates (53,29) Unchanged Unchanged Unchanged (55,60)
New time New demands windows [10,54] 1.6 Unchanged 1.4 Unchanged 1.9 [20,70] Unchanged [30,75] 2.0
5.2 Results and Findings According to Section 2, above disrupted nodes can be transformed into new customer nodes: from 4, 14, 8, 11 to 16, 17, 18, 19. A new customer 20 is added. There are three in-transit vehicles which are transformed into virtual customers 21, 22, 23 (A vehicle has not left when the disturbance occurred). Data after transformation are shown as Table 4. NPM algorithm with neighboring rescue policy produced the new routing: 0-17-18-8-2-0, 0-10-5-13-0, 0-9-7-6-0, 0-3-1-19-11-16-12-0, 0-20-15-0. NPM algorithm with starting later policy produced the new routing: 0-8-2-11-1-0, 0-10-5-13-0, 0-9-7-6-0, 0-16-3-18-12-15-0, 0-20-17-19-0.
A Combinational Disruption Recovery Model for Vehicle Routing Problem
11
Table 4 Data after transformation Customers 0 1 2 3 5 6 7 8 9 10 11 12 13 15 16 17 18 19 20 21 22 23
X (km) 50 19 33 35 70 27 10 56 16 68 41 83 25 70 53 73 56 41 55 54 57 43
Y (km) 50 0 3 21 94 44 69 4 81 76 10 43 91 18 29 29 4 10 60 18 61 57
di (t) 0 1.0 1.8 1.1 1.9 1.4 1.2 0.2 1.7 0.8 0.9 0.8 1.9 0.9 1.6 1.8 1.2 1.0 2.0 0 0 0
stai (h) 0 74 58 15 47 85 21 9 37 21 74 58 15 87 10 20 9 74 30 0 0 0
endi (h) +∞ 144 128 85 177 155 91 79 107 121 174 158 125 187 54 70 79 174 75 400 400 400
Table 5 Comparison of results
16 9
Total Time deviation 210.75 37.03
Number of Vehicles 7 6
679.79
19
66.76
5
737.11
7
33.30
5
Methods
Total distance
Paths deviation
Rescheduling Disruption Management by GA Disruption Management by NPM with neighboring rescue policy Disruption Management by NPM with starting later policy
840.76 841.69
The comparison of results with [7] is shown as Table 5. From the comparison, the paper finds: (1) Disruption Management by GA is superior to the Rescheduling in Paths deviation, Total Time deviation and Number of vehicles; Disruption Management by NPM with neighboring rescue policy produced better results in Total distance, Total Time deviation and Number of vehicles than the Rescheduling; Disruption Management by NPM with starting later policy outdoes the Rescheduling in all aspects. This verifies the advantage of Disruption Management in dealing with disruption events for VRPTW. (2) In Total distance and Number of vehicles, Disruption Management by NPM with neighboring rescue policy is superior to the Rescheduling and Disruption Management by GA; Disruption Management by NPM with starting later policy produced better results than the Rescheduling and Disruption Management by GA. This gives some evidences on that the transformation of disruption events and the designed NPM algorithm are effective to the combinational disruption recovery model. (3) Disruption Management by NPM with neighboring rescue policy is better than Disruption Management by NPM with starting later policy in Total distance but worse than the later in Paths deviation and Total Time deviation, which proves that considering the optimal starting time of vehicles may provide a new rescue strategy for disrupted VRP.
12
X. Wang et al.
6 Conclusions Disruption Management provides a good idea to minimize the negative effects of disruption events on the whole delivery system. For the reality that a variety of delivery disruptions often occur successively or simultaneously, the paper proposed a method of transforming various disruption events into new customer-adding disruption, which can facilitate to develop a combinational VRPTW disruption recovery model. The paper considered vehicles’ optimal starting time from the central depot, which can not only reduce the waiting costs of vehicles in transit but also provide a new rescue strategy for the disrupted VRP. The paper focused on the customer disruption events, but didn’t give enough consideration to vehicle disruption events and cargoes disruption events, which needs further efforts. Acknowledgments. This work is supported by National Natural Science Foundation of China (No. 90924006, 70890080, 70890083) and National Natural Science Funds for Distinguished Young Scholar (No.70725004).
References 1. Burak, E., Arif, V.V., Arnold, R.: The vehicle routing problem: A taxonomic review. Computers & Industrial Engineering 57(4), 1472–1483 (2009) 2. Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Management Science 6(1), 80–91 (1959) 3. Ding, Q.-l., Hu, X.-p., Wang, Y.-z.: A model of disruption management for solving delivery delay. In: Advances in Intelligent Decision Technologies: Proceedings of the Second KES International Symposium IDT, Baltimore, USA, July 28-30. SIST, vol. 4, pp. 227–237 (2010) 4. Jeans, C., Jesper, L., Allan, L., Jesper, H.: Disruption management operation research between planning and execution. OR/MS 28(5), 40–51 (2001) 5. Li, J.-q., Pitu, B.M., Denis, B.: A Lagrangian heuristic for the real-time vehicle rescheduling problem. Transportation Research Part E: Logistics and Transportation Review 45(3), 419–433 (2009) 6. Li, J.-q., Pitu, B.M., Denis, B.: Real-time vehicle rerouting problems with time windows. European Journal of Operational Research 194(3), 711–727 (2009) 7. Mu, Q., Fu, Z., Lysgaard, J., Eglese, R.: Disruption management of the vehicle routing problem with vehicle breakdown. Journal of the Operational Research Society, 1–8 (2010) 8. Shi, L.-y.: Nested Partition Method for Global Optimization. Operation Research 48(3), 390–407 (2000) 9. Wang, X.-p., Niu, J., Hu, X.-p., Xu, C.-l.: Rescue Strategies of the VRPTW Disruption. Systems Engineering-Theory & Practice 27(12), 104–111 (2007) 10. Wang, X.-p., Xu, C.-l., Yang, D.-l.: Disruption management for vehicle routing problem with the request changes of customers. International Journal of Innovative Computing, Information and Control 5(8), 2427–2438 (2009a) 11. Wang, X.-p., Zhang, K., Yang, D.-l.: Disruption management of urgency vehicle routing problem with fuzzy time window. ICIC Express Letters 3(4), 883–890 (2009b) 12. Wang, X.-p., Wu, X., Hu, X.-p.: A Study of Urgency Vehicle Routing Disruption Management Problem. Journal of networks 5(12), 1426–1433 (2010)
A Decision Method for Disruption Management Problems in Intermodal Freight Transport Minfang Huang, Xiangpei Hu, and Lihua Zhang
*
Abstract. In this paper, we propose a new decision method for dealing with disruption events in intermodal freight transport. First of all, the forecasting decision for the duration of disruption events is presented, which decides whether a rearrangement is needed. Secondly, a network-based optimization model for intermodal freight transport disruption management is built. Then an improved depth-first search strategy is developed, which is beneficial to automatically generating the routes and achieving the recovery strategies quickly. Finally, a numerical example is applied to verify the decision method. The new decision method supports the real-time decision making for disruption management problems. Keywords: Decision method, Disruption management, Intermodal freight transport.
1 Introduction Many power facilities are delivered by multiple modes of transport. Uncertainties and randomness always exist in freight transportation systems, especially in intermodal freight transportation. Intermodal freight transportation is the term used to describe the movement of goods in one and the same loading unit or vehicle which uses successive, various modes of transport (road, rail, air and water) without any handling of the goods themselves during transfers between modes (European Conference of Ministers of Transport, 1993) [14]. It is a multimodal chain of transportation services. This chain usually links the initial shipper to the final consignee of the container and takes place over long distances. The whole Minfang Huang School of Economics and Management North China Electric Power University No. 2 Beinong Rd., Huilongguan, Beijing, 102206, China e-mail:
[email protected] *
Xiangpei Hu · Lihua Zhang Dalian University of Technology No. 2 Linggong Rd., Ganjingzi Dist., Dalian, Liaoning, 116023, China e-mail:
[email protected],
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 13–21. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
14
M. Huang, X. Hu, and L. Zhang
transportation is often provided and finished by several carriers. Almost all types of freight carriers and terminal operator may, thus, be involved in intermodal transportation, either by providing service for part of the transportation chain or by operating an intermodal transportation system (network) [5]. Therefore, the satisfied flow continuity and transit nodes compatibility of the multimodal chain of transportation services is significant while making modal choice decision once multiple transport modes, multiple decision makers and multiple types of load units are included. Unexpected events (e.g. Hurricane, the snow disaster, traffic accidents) happening in one link of the multimodal chain could result in the disturbance of pre-decided transportation activities. A new strategy used to handle disruptions is disruption management. Its objective is the smallest disturbances the entire transportation system encounters with the new adjustment scheme, rather than the lowest cost. How to real-timely deal with the disruption events and achieve the coping strategies quickly and automatically is an important problem. It is necessary to present a new solution approach to improve the rationality and efficiency of disruption management in intermodal freight transportation. The remainder of this paper is organized as follows: Section 2 briefly reviews the related solution approaches and applications. In Section 3, the forecasting decision method for the duration of disruption events is presented, and an optimization algorithm for intermodal freight transport disruption management is constructed. A numerical example is given in Section 4. Finally, concluding remarks and future research directions are summarized in Section 5.
2 A Brief Review of Related Literature The research on the planning issues in intermodal freight transport has begun since the 1990s. Macharis and Bontekoning [15] conducted a comprehensive review on OR problems and applications of drayage operator, terminal operators, network operators and intermodal operators. Related decision problems for intermodal freight concern some combinations of rail, road, air and water transport. Following this approach, Caris et al. [3] provide an update on the review in Macharis and Bontekoning, with a stronger orientation towards the planning decisions in intermodal freight transport and solution methods. In the current results on intermodal transportation system, we find most of them are related with planning transportation activities. We divide them into 4 categories from the perspectives of intermodal carrier selection, transportation mode selection, transportation routes, and terminal location. For the aspect of intermodal carrier selection, Liu et al. [12] establish an improved intermodal network and formulate a multiobjective model with the consideration of 5 important characteristics, multiple objective, in-time transportation, combined cost, transportation risks and collaboration efficiency. Ma [16] proposes a method for the optimization of the carrier selection in network environment by inviting and submitting a tender based on multi-agent. With respect to transportation mode selection, Liu and Yu [11] use the
A Decision Method for Disruption Management Problems
15
graph-theory technique to select the best combination of transportation modes for shipment with the consideration of 4 characteristics, multiple objective, in-time transportation, combined cost and transportation risks. Shinghal and Fowkes [19] presents empirical results of determinants of mode choice for freight services in India which shows that frequency of service is an important attribute determining mode choice. For transportation route selection, Huang and Wang [7] analyze the evaluation indicators (transportation cost, transportation time, transportation risk, service quality, facility level) of intermodal transportation routes, establish the set of alternatives, incorporate the perspectives of quantitative and qualitative analysis to compare and select the route alternatives. Chang [4] formulate an international intermodal routing problem as a multi-objective multimodal multi-commodity flow problem with time windows and concave costs. Yang et al. [23] present a goal programming model for intermodal network optimization to examine the competitiveness of 36 alternative routings for freight moving from China to and beyond Indian Ocean. For the aspect of terminal location, in the area of hub location problems, Campbell et al. [2] review various researches on new formulations and better solution methods to solve larger problems. Arnold et al. [1] investigate the problem of optimally locating rail/road terminals for freight transport. A linear 0-1 program is formulated and solved by a heuristic approach. Sirikijpanichkul et al. [20] develop an integral model for the evaluation of road-rail intermodal freight hub location decisions, which comprises four dominant agents, hub owners or operators, transport network infrastructure providers, hub users, and communities. Meng and Wang [17] develop a mathematical program with equilibrium constraints model for the intermodal hub-and-spoke network design problem with multiple stakeholders and multitype containers. The existing results on intermodal transportation system are related with planning transportation activities. They just put emphasis on planning in advance. However, they lack the research on disruptions possibly occurred in each transport mode and can not achieve an operational scheme with overall smallest disturbances quickly. It is more important to ensure flow continuity and transit nodes compatibility. We have seen a number of results in disruption management in urban distribution system where only one mode of transport is used. Most of them are focused on the study of the disruption caused by customer requests or by dynamic travel time. The work of Potvin et al. [18] describes a dynamic vehicle routing and scheduling problem with time windows where both real-time customer requests and dynamic travel times are considered. In terms of the disruption caused by dynamic travel time, Huisman et al. [8] present a solution approach consisting of solving a sequence of optimization problems. Taniguchi et al. [21] use dynamic traffic simulation to solve a vehicle routing and scheduling model that incorporates real time information and variable travel times. Du et al. [6] design a solution process composed of initial-routes formation, inter-routes improvement, and intra-route improvement. Besides the above, there are some other results. The study of Zeimpekis et al. [24] present the architecture of fleet management system. Li et al. [10]
16
M. Huang, X. Hu, and L. Zhang
develop a prototype decision support system (DSS) [11]. Wang et al. [22] propose a transformation method for the recovery of the vehicle routing. The results above investigate the distribution system with only one transport mode. For example, in urban areas, only the transport mode of road is utilized to accomplish the delivery. Therefore, the disruption management in urban distribution system puts emphasis on the satisfaction of the customers. Meanwhile, in intermodal freight transport, it focuses on the choice of transportation modes and carriers. Therefore, it is necessary to provide a solution approach with the ability of qualitative and quantitative processing to disruption management problems in intermodal freight transport system.
3 A Decision Method for Intermodal Freight Disruption Problems and a Solution Algorithm 3.1 The Forecasting Decision for the Duration of Disruption Events The disruption management problems in intermodal freight transport system can be described as follows: after the cargos start from the origins according to the plan, unexpected events (e.g. Hurricane, the snow disaster, traffic accidents) happen in one link of the multimodal chain, which might result in the interruption of one transport mode. The pre-decided transportation activities might need to be rearranged. If necessary, a rescue scheme with smallest deviation which is measured by the cost and the time of the routes should be achieved in an efficient way. The duration of a disruption event, also means the delay of current transport activity, is used to decide whether a new arrangement should be made. If the delay is within the next carrier’s tolerance, then no rearrangement is needed, otherwise, rerouting with smallest deviation should be made. It is assumed the effect of the delay on the one after the next carrier has been considered in the next carrier’s tolerance. The way to achieve the duration of disruption events is to collect historic statistic data of typical disruption events in the specific transport mode. We take the accidents happened on freeways as an example to illustrate the study of disruption event duration. According to the causes of disruption events, the accidents on freeways can be divided into 7 types [13], vehicle crash, bumping into the objects, vehicle breakdown, injuring-causing accident, vehicle burning accident, death-causing accident, and dangerous stuff accident. The distribution of each kind of disruption event’s duration could be obtained by statistical analysis of historic statistic data. The forecasting decision tree for the duration of disruption event (vehicle crash) is shown in Figure 1. We use 3 parameters to illustrate an event’s duration, average duration ( Ai ), the lower and the upper of four quartiles ( Q min i , Q max i ) as the upper bound and lower bound of confidence interval. They can be calculated according to historic statistic data.
A Decision Method for Disruption Management Problems
Vehicles crash
17
Average duration ( A1 ) ( Q min1 , Q max1 )
Injuring and death causing? Yes Average duration ( A2 ) ( Q min 2 , Q max 2 )
No Average duration ( A3 ) ( Q min 3 , Q max 3 ) Any truck involved? Yes
Average duration ( A4 ) ( Q min 4 , Q max 4 )
No Average duration ( A5 ) ( Q min 5 , Q max 5 )
Fig. 1 The forecasting decision tree for the duration of disruption event
3.2 An Optimization Algorithm for Intermodal Freight Transport Disruption Management According to the forecasting duration or delay in Sect. 3.1, if the delay is out of the range of the next carrier’s tolerance, then rerouting with smallest deviation should be made. The deviation of sth route, denoted by Ds , can be calculated as follows.
Ds = α1
cs − c0 t −t + α2 s 0 c0 t0
(1)
The variables and parameters in Eq. (1) are explained below. The coefficients α1 and α 2 denote the decision maker’s preference. They are equal or greater than 0, and α1 + α 2 = 1 . cs denotes the total cost of sth route; t s the total transport time of sth route; c0 the total cost of the initial plan; t0 the total transport time of the initial plan. Therefore, Eq. (1) is equivalent to Eq. (2).
Ds' = α1
cs t + α2 s c0 t0
(2)
Fig.2 shows a network of intermodal freight transport. The nodes represent the cities (A, B, ……, H) which cargos need to pass through. In each city, there are several carriers providing different modes of transport respectively. For example, from the origination node, there are k modes of transport (A1, A2, ……, Ak) for the cargos to choose to arrive at City A.
18
M. Huang, X. Hu, and L. Zhang
Fig. 2 The virtual network of disruption event
The decision for the routing problem of intermodal freight transport can be turned into path searching through the graph shown in Fig. 2. Considering the characteristics of search strategies, we adopt an improved depth-first search strategy to generate the routing schemes.
3.3 An Improved Depth-First Search Strategy From Fig. 2, we observe that the network of intermodal freight transport corresponds to a State Space for decision-making, in which a transshipment location is a search node. An initial route is a path spanning through the State Space from the initial node (origination) to the goal node (destination), and an operational disruption recovery scheme with deviation is a path spanning through the State Space from the disrupted location to the goal node. Considering the characteristics of search strategies and the problem, we apply the principle of depth-first search strategy and improve it to generate the routing schemes. The improved depth-first search algorithm includes three factors: state sets, operators, and goal state. The details of state-space search based on the improved depth-first search algorithm for disruption management problems in intermodal freight system are given below. • State sets: are described by three elements, which are Pi (the cargos’ current location, where the disruption happens), i = A, B, C, ······; csij (accumulative cost of sth route when the cargos arrive at ith city by jth transport mode); tsij (accumulative transport time of sth route when the cargos arrive at ith city by jth transport mode). • Operators: cargos move from the current location (a search node) to the next location (a search node). • Goal state: As the nodes are searched, the goal state (destination) can be reached. Until to this state, csij is the total cost of sth route (ts), and tsij is the total transport time of sth route (ts).
After the search process is finished, the feasible recovery schemes are generated. And Ds' can be calculated by Eq. (2).
A Decision Method for Disruption Management Problems
19
4 A Numerical Example A numerical example is constructed to illustrate and verify the above decision method. The description of the example is given as follows. Suppose there are 10 ton of cargos to be transported from City A to City D, which should pass through City B and City C. The example intermodal transportation network is described in Fig. 3, which shows the available transport modes between two cities. And its corresponding operational intermodal transportation network is presented in Fig. 4. Transport cost per unit (1000RMB/ton) and required transport time (h) are listed in Table 1. Transshipment cost (1000RMB/ton) and time (h) between different transport modes are listed in Table 2.
Fig. 3 An example intermodal transportation network
(A -B
3 ,r
(B
(B
1-
oa d)
C
-C 3
2 ,r
(C
a) ,se
1-
D
1
oa d)
(C
-D 2
,ra i
l)
) ad ,ro
Fig. 4 The operational intermodal transportation network Table 1 Transport cost per unit/required time A-B Rail
B-C
9/20 --
C-D 6/18
Road 6/15 10/18
7/16
Sea
--
3/28 4/30
Table 2 Transshipment cost/time between different transport modes
Rail
Rail
Road
0/0
0.1/0.5 0.2/0.5
Sea
Road 0.1/0.5 0/0
0.1/1
Sea
0/0
0.2/0.5 0.1/1
20
M. Huang, X. Hu, and L. Zhang
We assume the decision maker has the same preference on the cost and the transport time, that is, α1 = 0.5 , α 2 = 0.5 . According to the improved depth-first search strategy in Sect. 3.3, the initial feasible routes can be generated. And the optimal one is achieved as A-B3-C1-D2, with the total cost of 163 and the total transport time of 64.5. If there is a disruption event happened in the link A-B3, and a delay of 8 hours occurred. The transport mode of sea in the link B3-C1 in Fig. 4 will not be available for the cargos. The recovery schemes have to be generated by the improved depth-first search strategy. And the scheme with smallest deviation will be chosen to deliver the cargos. Here the optimal recovery scheme is B3-C2-D2, with a deviation 0.98.
5 Concluding Remarks We present a new decision method for disruption problems in intermodal freight transport, which comprises the forecasting decision for the duration of disruption events, network-based optimization model, and improved depth-first search strategy. The forecasting for the duration of disruption events helps make a decision whether a rearrangement is need. The introduction of improved depth-first search strategy can be beneficial to automatically generating the initial routes and recovery routes. The method can compete with rapid decision-making in disruption management problems. Furthermore, it provides a new solution idea for other disruption management problems. Some specific work remains to be further studied, for example, large scale case study network should be selected to verify the decision method. Acknowledgments. This work is partially supported by Specialized Research Fund for the Doctoral Program of Higher Education of China (No.20100036120010), “the Fundamental Research Funds for the Central Universities” in China (No.09QR56), and by the grants from the National Natural Science Funds for Distinguished Young Scholar (No.70725004). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.
References [1] Arnold, P., Peeters, D., Thomas, I.: Modelling a rail/road intermodal transportation system. Transportation Research Part E 40(3), 255–270 (2004) [2] Campbell, J., Ernst, A., Krishnamoorthy, M.: Hub location problems. In: Drezner, Z., Hamacher, H. (eds.) Facility location: Applications and theory. Springer, Heidelberg (2002) [3] Caris, A., Macharis, C., Janssens, G.K.: Planning problems in intermodal freight transport: accomplishments and prospects. Transportation Planning and Technology 31(3), 277–302 (2008) [4] Chang, T.S.: Best routes selection in international intermodal networks. Computers & operations research 35(9), 2877–2891 (2008)
A Decision Method for Disruption Management Problems
21
[5] Crainic, T.G., Kim, K.H.: Intermodal transportation. In: Laporte, G., Barnhart, C. (eds.) Handbooks in Operations Research & Management Science: Transportation. Elsevier, Amsterdam (2007) [6] Du, T.C., Li, E.Y., Chou, D.: Dynamic vehicle routing for online B2C delivery. Omega 33(1), 33–45 (2005) [7] Huang, L.F., Wang, L.L.: An analysis of selecting the intermodal transportation routes. Logistics Engineering and Management 32(187), 4–6 (2010) [8] Huisman, D., Freling, R., Wagelmans, A.P.M.: A robust solution approach to the dynamic vehicle scheduling problem. Transportation Science 38(4), 447–458 (2004) [9] Li, J.Q., Borenstein, D., Mirchandani, P.B.: A decision support system for the singledepot vehicle rescheduling problem. Computers & Operations Research 34(4), 1008– 1032 (2007) [10] Li, J.Q., Mirchandani, P.B., Borenstein, D.: A Lagrangian heuristic for the real-time vehicle rescheduling problem. Transportation Research Part E 45(3), 419–433 (2009) [11] Liu, J., Yu, J.N.: Optimization model and algorithm on transportation mode selection in intermodal networks. Journal of Lanzhou Jiaotong University 29(1), 56–61 (2010) [12] Liu, J., Yu, J.N., Dong, P.: Optimization model and algorithm for intermodal carrier selection in various sections. Operations Research and Management Science 19(5), 160–166 (2010) [13] Liu, W.M., Guan, L.P., Yin, X.Y.: Prediction of freeway incident duration based on decision tree. China Journal of Highway and Transport 18(1), 99–103 (2005) [14] Macharis, C., Bontekoning, Y.M.: Opportunities for OR in intermodal freight transport research: A review. European Journal of Operational Research 153(2), 400–416 (2004) [15] Macharis, C., Bontekoning, Y.M.: Opportunities for OR in intermodal freight transport research: a review. European Journal of Operational Research 153(2), 400–416 (2004) [16] Ma, C.W.: Carrier selection in various sections of multi-modal transport based on multi-Agent. Journal of Harbin Institute of Technology 39(12), 1989–1992 (2007) [17] Meng, Q., Wang, X.C.: Intermodal hub-and-spoke network design: Incorporating multiple stakeholders and multi-type containers. Transportation Research Part B (2010), doi:10.1016/j.trb.2010.11.002 [18] Potivn, J.Y., Ying, X., Benyahia, I.: Vehicle routing and scheduling with dynamic travel times. Computers & Operations Research 33(4), 1129–1137 (2006) [19] Shinghal, N., Fowkes, T.: Freight model choice and adaptive stated preferences. Transportation Research Part E 38(5), 367–378 (2002) [20] Sirikijpanichkul, A., Van Dam, H., Ferreira, L., Lukszo, Z.: Optimizing the Location of Intermodal Freight Hubs: An overview of the agent based modelling approach. Journal of Transportation Systems Engineering and Information Technology 7(4), 71–81 (2007) [21] Taniguchi, E., Shimamoto, H.: Intelligent transportation system based dynamic vehicle routing and scheduling with variable travel times. Transportation Research Part C 12(3-4), 235–250 (2004) [22] Wang, X.P., Xu, C.L., Yang, D.L.: Disruption management for vehicle routing problem with the request changes of customers. International Journal of Innovative, Information and Control 5(8), 2427–2438 (2009) [23] Yang, X.J., Low, J.M.W., Tang, L.C.: Analysis of intermodal freight from China to Indian Ocean: A goal programming approach. Journal of Transport Geography (2010), doi:10.1016/j.jtrangeo.2010.05.007 [24] Zeimpekis, V., Giaglis, G.M., Minis, I.: A dynamic real-time fleet management system for incident handling in city logistics. In: 2005 IEEE 61st Vehicular Technology Conference, pp. 2900–2904 (2005)
A Dominance-Based Rough Set Approach of Mathematical Programming for Inducing National Competitiveness Yu-Chien Ko and Gwo-Hshiung Tzeng
*
Abstract. The dominance-based rough set approach is a powerful technology for approximating ranking classes. Analysis of large real-life data sets shows, however, decision rules induced from lower approximations are weak, that is supported by few entities only. For enhancing the DRSA, the mathematical programming is applied to support the lower approximations with entities as more as possible. The mathematical coding such as unions of decision classes, dominance sets, rough approximations, and quality of approximation is implemented in Lingo 12. It is applied on the 2010 World Competitiveness Yearbook of International Institute for Management Development (WCY-IMD). The results show the business finance and attitudes & values matter achieving the top 10 positions in the world competitiveness. Keywords: dominance-based rough set approach (DRSA), Mathematical programming (MP), national competitiveness.
1 Introduction DRSA is a powerful technology to process the relational structure, which has been successfully applied in many fields [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. However, it has rarely been used in the analysis of national competitiveness due to skewed dimensions and unique characteristics among nations. Up to today the annular reports, World Competitiveness Yearbook (WCY) and Global Competitiveness Report (GCR), publish the competitiveness ranks with statistical descriptions instead of relational structure, thus inferring of competitiveness structure still cannot be elaborated for policy makers and national leaders. This research adopts mathematical programming to design Yu-Chien Ko Department of Information Management, Chung Hua University, 707, Sec.2 Wufu road Hsinchu City 30012, Taiwan e-mail:
[email protected] *
Gwo-Hshiung Tzeng Graduate Institute of Project Management, Kainan University, No. 1 Kainan Road, Luchu Taoyuan County 338, Taiwan e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 23–36. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
24
Y.-C. Ko and G.-H. Tzeng
and develop unions of decision classes, dominance sets, rough approximations, and quality of approximation [11, 12, 13]. Finally, induction rules based on WCY 2010 are generated for stakeholders to help policy making and verification. WCY collects figure data and expert opinions together into in 4 consolidated factors, i.e. Economic Performance, Government Efficiency, Business Efficiency, and Infrastructure. Their details are divided into 20 sub-factors. Totally there more than 300 criteria are collected in the 20 sub-factors. The report provides the weakness, strength, trends of nations from a view of individual nation instead of crossing over nations [14, 15]. This research partitions nations into 2 parts i.e. at least 10th and at most 11th, shown in Figure 1, then induces rules. It discovers the facts how the top nations outperformed than the others.
Inducing rules for at least 10th
At least 10th
WCY dataset Inducing rules for at most 11th
At most 11th
Fig. 1 Competitiveness model
The remainders of this paper are organized by presenting the rough set and DRSA in Section 2, propositions of mathematical programming for DRSA in Section 3, application of the proposed DRSA on national competitiveness in Section 4, discussion of competitiveness criteria and the future work in Section 5. The last section concludes the paper and remarks.
2 Review on DRSA Measures or evaluations of competitiveness crossing over nations have not been deeply explored in the fields of dominance-based rough set. This section starts from the concept of rough set then extends to the dominance-based rough set. A. Rough Set The rough set can discover important facts hidden in data, and has the capacity to express decision rules in the form of logic statements “if conditions, then decision”. The conditions and decision in the statement specify their equivalence relations based on respective criteria. The degree of satisfying the rule is measured with entities contained in the relations such as coverage rate, certainty, and strength [16, 17, 18, 19]. Generally, it performs well at classification in many
A Dominance-Based Rough Set Approach of Mathematical Programming
25
application fields but cannot handle preference inconsistency between condition and decision for choice, ranking, and sorting. Therefore, the rough set is extended by applying dominance principle on the preference-ordered rough set [3, 4]. The extension substitutes indiscernibility relations by dominance-based relations. The approximation of dominance-based relations has involved the preference function, dominating and dominated entities, rough approximation, and unions of decision classes [1~10] below. B. Information System of DRSA A data table with preference information IS = (U , Q,V , f ) is presented with U = {x1 , x2 ,..., xn } , Q = C ∪ D
, f : X × Q → V , and V = {V1 , V2 , ..., Vq } where n is
the number of entities, C is the set of condition criteria, D is the set of decision criteria, X representing a subset of U that decision makers have willing to tell their preference on criteria, and f representing a total function such that f ( x, q ) ∈ Vq for all q ∈ P . The information function f ( x, q) can be regarded as a preference function when its domain is a scale of preference for criterion q (Greco et al., 2000). Thus the pair comparison of x, y ∈ U, f ( x , q ) ≥ f ( y , q ) ⇔ x q y ⇔ f q ( x ) ≥ f q ( y ) , means ‘x is at least as good as y in the preference strength with respect to criterion q’. The outranking relations not only make sense in data structure but also in mathematical functions. The rough approximation related to the mathematical structure is described below. C. Rough Approximation by Dominance Sets The dominance-based rough approximations can serve to induce entities assigned ≥
≤
to Clt (an upward union of classes) or Clt (a downward union of classes) where Cl is a cluster set containing ordered classes Clt , t ∈T and T={1, 2,…, n}. For all
s, t ∈T and s ≥ t, each element in Cl s is preferred at least as each element in Clt , which is constructed as
Clt≥ = ∪ Cls s≥t
Clt≤
= ∪ Cls s≤t
P-dominating and P-dominated sets are the rough approximations by taking object x as a reference with respect to P, P ⊆ C. +
P -dominating set: DP ( x) = { y ∈ X , yDP x} −
P -dominated set: DP ( x ) = { y ∈ X , xDP y}
26
Y.-C. Ko and G.-H. Tzeng
where y
q
x for DP+ ( x ) , x
q
y for DP− ( x) , and all q ∈ P . Explaining the ap-
proximation of decision classes by P-dominance sets is the key idea of rough set to infer one knowledge by another knowledge, which has been implemented in DRSA as ≥
≥
+
≥
P (Clt ) = { x' ∈ Clt , DP ( x' ) ⊆ Clt } ≥
≤
P (Clt ) = U − P (Clt −1 ) ≥
≥
≥
Bnp (Clt ) = P (Clt ) − P (Clt ) <
<
−
<
P (Clt ) = { x' ∈ Clt , DP ( x' ) ⊆ Clt } <
<
P (Clt ) = U − P (Clt +1 ) <
<
<
Bnp (Clt ) = P (Clt ) − P (Clt ) t = 1, ..., n
where Bnp(Clt≥ ) and Bnp(Clt≤ ) are P-doubtful regions. Analysis of large real-life data sets shows, however, that for some multi-criteria classification problems, the application of DRSA identifies large differences between lower and upper approximations of decision classes. In consequence, decision rules induced from lower approximations are weak, that is supported by few entities only [7]. For this reason a DRSA method is designed and developed in the mathematical programming to search entities for the lower approximation.
3 Mathematical Programming for DRSA This section covers unions of decision classes, dominance sets, rough approximations, and quality of approximation, which starts from propositions then uses them to build formulae. A. Propositions Reduction of criteria is expressed in a Proposition (A.1) by setting a coefficient w j to 0; otherwise w j equals to 1. m
(A.1) L =
∑w , w j
j
∈ {0,1},
j ∈ {q1 , q2 , ..., qm }
j =1
where m is the number of criteria in C, L is the number of criteria in the sub set P after reduction. For the entities assigned to Clt≥ , Proposition (A.2) sets ui = 1 when xi ∈ Clt≥ and xi ∈ Dt≥ ; otherwise ui = 0 , (A.2)
ui × L ≤
∑
m i∈Clt≥ , j =1
w j * uij , ui ∈ {0,1}
A Dominance-Based Rough Set Approach of Mathematical Programming
where uij = 1 when j for
Dt≥
f ( xi , j ) ≥ rij , ∀j ∈ P ;
27
otherwise uij = 0 ; rij is the low boundary of
. For the entities assigned to Clt , Proposition (A.3) sets ui = 1 when xi <
∈ Clt< and xi ∈ Dt< ; otherwise ui = 0 . (A.3)
ui × L ≤
∑
m i∈Clt< , j =1
w j * uij , ui ∈ {0,1}
where uij = 1 when f ( xi , j ) ≤ rij otherwise uij = 0 ; . rij is the high boundary of j for Dt< .
B. Mathematical Formulae for Rough Approximation By following the propositions above, dominance set, rough approximation, and quality of approximation are constructed in mathematical formula as below.
•
P-dominating set
•
P-dominated set
{
DP+ ( xi ) = xi ∈ Clt≥ , L =
{
DP− ( xi ) = xi Clt< , L = ∑ i∈Cl < , j =1 w j * u ij
•
m
t
∑
m i∈Clt≥ , j =1
w j * uij
}
}
Rough Approximation ≥
| Clt |
| P (Clt ) |= ∑ ui ≥
i =1
≥
|Clt |
| P (Clt ) |=| U | −∑ ui ≥
i =1
<
| Clt |
| P (Clt ) |= ∑ ui <
i =1
|Clt< |
<
| P (Clt ) |=| U | −
∑u i =1
i
For sustaining the quality of rough approximations, consistency constraints are designed to restrict the scope of dominance sets, i.e. D +p ( x ) could not contain entities in the D −p ( x ) and vice versa.
•
Consistency Constraints
∑ L −1 ≥ ∑ L −1 ≥
m i∈Clt≥ , j =1 m i∈Clt< , j =1
w j * uij w j * uij
28
Y.-C. Ko and G.-H. Tzeng
C. Evaluation of Rough Approximation According to Pawlak (2002) and Greco et al., (2000), there are 2 frequent measures of approximation.
•
Coverage rate for Clt≥ and Clt< approximation
CRP (Clt≥ ) =
| P (Clt≥ ) | | Clt≥ |
, CRP (Clt< ) =
| P (Clt< ) | | Clt< |
It expresses ‘the probability of entities in the unions of decision classes relatively belonging to the lower approximation’. Table 1 4 factors and 20 sub factors of WCY Economic Performance q1
Domestic Economy
q2
International Trade
q3
International Investment
q4
Employment
q5
Prices
Government Efficiency q6
Public Finance
q7
Fiscal Policy
q8
Institutional Framework
q9
Business Legislation
q10
Societal Framework
Business Efficiency q11
Productivity & Efficiency
q12
Labor Market
q13
Finance
q14
Management Practices
q15
Attitudes and Values
Infrastructure q16
Basic Infrastructure
q17
Technological Infrastructure
q18
Scientific Infrastructure
q19
Health and Environment
q20
Education
A Dominance-Based Rough Set Approach of Mathematical Programming
•
29
Quality of approximation for Cl ≥
<
| P (Clt ) | + | P (Clt ) |
γ P (Cl ) =
|U |
It expresses classification performance with ‘the probability of the entities being covered by all the lower approximations (partitions)’. D. Decision Rules based on Rough Approximation The unions of decision classes and rough approximations can serve to induce rules that are expressed in terms of ‘if…, then…’. Here are 2 exemplary rule types below. (D≥ ) if f ( x, j ) ≥ r j≥ ... and f ( x, j ' ) ≥ r j≥' then x ∈ Clt≥ ( D< ) if f ( x, j ) ≤ rj< ... and f ( x, j ') ≤ rj<' then x ∈ Clt<
j, j ' ∈ P Based on the propositions, mathematical formula, and exemplary rule types, national competitiveness is induced next.
4 Applying the Proposed DRSA to National Competitiveness According to 20101 WCY, 58 nations are listed in Appendix I. This section starts from description of data set then the algorithm of induction. A. WCY Data Set Table 1 presents 4 factors and 20 sub factors. The left column is the symbols for the sub factors. The ranks of nations in 2009 and 2010 are presented in Appendix I. Australia, Canada, Hong Kong, Malaysia, Norway, Singapore, Sweden, Switzerland, Taiwan, and USA are the top 10 nations. The followings will induce the dominance sub factors, called as dominance criteria, for nations. Also the boundaries of dominance criteria are called as dominance boundaries in the following context. B. An Algorithm of Competitiveness Induction An algorithm is designed to solve the optimal coverage rates for both Clt≥ and
Clt< approximations simultaneously. It can prevent the difference in supporting approximations i.e. avoiding one support is high and the other support is low. Before executing the algorithm, users might remove some criteria that have low cov≥
<
erage rates of CR (Cl ) or CR (Cl ) , which can reduce computing tasks and time. j
10 th
j
10 th
30
Y.-C. Ko and G.-H. Tzeng
Objective ≥
<
Max CRP (Cl10 th ) + CRP (Cl10 th ) − 0.01 × L Constraints ≥
≥
CRP (Cl10 th ) =
<
CRP (Cl10 th ) =
| P (Cl10 th ) | |
≥ Cl10 th
|
< P (Cl10 th )
|
|
< Cl10 th
=
∑ |
|
=
|
ui
≥ Cl10 th
∑ |
m i =1
m i =1
|
ui
< Cl10 th
|
m
L=
∑w , w j
j
∈ {0,1}, 1 ≤ L ≤ 3,
j =1
∑ u ×L ≤ ∑ L −1 ≥ ∑ L −1 ≥ ∑ ui × L ≤
m ≥
i∈Clt , j =1
m
i
<
i∈Clt , j =1
m
w j * uij , ui ∈ {0,1}
w j * uij , ui ∈ {0,1}
≥
w j * uij
<
w j * uij
i∈Clt , j =1 m
i∈Clt , j =1
This algorithm intends to find out the dominance criteria and boundaries for both lower approximations at one time. The results are presented next. ≥
<
C. Induced Rules of D10th and D10th The induced rule is presented with ‘if…,then…’, which comprises the dominance criteria and boundaries as below. Rule I: ≥ if ( f ( x , q ) ≥ 64.91) ∧ ( f ( x , q ) ≥ 62.30) then x ∈ Cl10 th 13 15
ri∈Cl ≥
10 th ,13
= 64.91, ri∈Cl ≥
10 th ,15
= 62.30
Dominance criteria and boundary of Rule I: (64.91/Finance, 62.30/Attitudes & Values) Rule II: < if ( f ( x , q ) ≤ 60.08) ∧ ( f ( x , q ) ≤ 62.30) then x ∈ Cl10 th 13 15
ri∈Cl <
10 th ,13
= 60.08, ri∈Cl <
10 th ,15
= 62.30
Dominance criteria and boundary of Rule II: (60.08/Finance, 62.30/Attitudes & Values)
A Dominance-Based Rough Set Approach of Mathematical Programming
31
Table 2 Evaluation parameters ≥
Rule I Rule II
CRP ( Cl10 th )
γ P (Cl )
0.77
0.78 0.78
<
CRP ( Cl10 th )
0.80
≥
This algorithm generates Rule I and II that separate the nations into D10th and < D10th in Figure 2.
D. Dominance criteria (sub factors) ≥
<
The dominance criteria of D and D10th are Finance and Attitudes & Values in business. They are listed in Table 3 next. 10 th
≥
D10th
At least 10th
WCY dataset <
D10th
At most 11th
Fig. 2 Induced model Table 3 Dominance criteria (sub factors) sub-factor
q13
q15
description in WCY 2010 banking and financial services, financial institutions' transparency, finance and banking regulation, stock markets, stock market capitalization, value traded on stock markets, shareholders' rights, credit is easily available for businesses, and venture capital. attitudes toward globalization are generally positive in the society, the image abroad of a country encourages business development, the national culture is open to foreign ideas, flexibility and adaptability of people are high when faced with new challenges, the needs for economic and social reforms is generally well understood, and the value system in the society supports competitiveness
32
Y.-C. Ko and G.-H. Tzeng
5 Discussion and the Future Work This section has 3 parts for discussing the proposed method. A. Rough approximation The proposed DRSA have 80% of the top 10, 77% of the rest, and totally 78% covered in the lower approximations. In the analysis of rank changes during 2009~2010, top 10 nations in P(Cl10≥ th ) have 8 nations covered by Rule I; 6 of them made progress and 2 of them sustained the same positions (please refer to Appendix I). The induced rules and the rank change of Appendix I deliver a message that 100% nations in P (Cl10≥ th ) performed dominance over others. USA, a member of the top 10 but out of P (Cl10≥ th ) , adversely degenerated. In this case the rough approximation is applied to get inside knowledge of national competitiveness. The discussion about rule generation is presented next. B. Rule generation Two rules composed of dominance criteria and boundaries are backwardly and optimally solved at one time by the proposed method, which separate nations by dominance boundaries (64.91/Finance, 62.30/Attitudes & Values) for P (Cl10≥ th ) and (60.08/Finance, 62.30/Attitudes & Values) for P (Cl10< th ) . The shorter length of dominance boundaries is easier for users to understand the knowledge of the approximations. To approaching the shortest length, subtracting 0.01*L in the objective function is designed for the rules. Therefore, only two criteria compose the induced rules. C. Rule validation The binary variables wj control whether criteria j is dominance or not. Their solved wj and dominance boundaries are substituted into constraints
∑ and L − 1 ≥ ∑ ui × L ≤
m
≥
i∈Clt , j =1
w j * uij , u i × L ≤
m ≥
i∈Clt , j =1
w j * uij ,
∑
m <
i∈Clt , j =1
w j * uij ) , L − 1 ≥
∑
m <
i∈Clt , j =1
w j * uij ,
to validate the supports and exceptions for the
rough approximations. If a nation satisfies then it supports otherwise it plays as an exception. The validation results are shown as below. Top 10 nations, Australia, Canada, Hong Kong, Malaysia, Singapore, Sweden, Switzerland, and Taiwan, support the Rule I except Norway and USA. Nations beyond top 10, Argentina, Austria, Belgium, Brazil, Bulgaria, China Mainland, Colombia, Croatia, Czech Republic, Estonia, France, Germany, Greece, Hungary, Iceland, Indonesia, Italy, Jordan, Kazakhstan, Korea, Lithuania, Mexico, New Zealand, Peru, Philippines, Poland, Portugal, Qatar, Romania, Russia, Slovak Republic, Slovenia, Spain, Turkey, Ukraine, United Kingdom, and Venezuela, support the Rule II except Chile, Denmark, Finland, India, Ireland, Israel, Japan, Luxembourg, Netherlands, South Africa, and Thailand.
A Dominance-Based Rough Set Approach of Mathematical Programming
33
D. The future work DRSA provides a good technology for processing relational structure and the proposed method implements DSRA into a mathematical tool. In the future work the mathematical programming can be applied to equivalent relations, flow network, and so forth. Hopefully there will be more insides to be discovered and help decision making.
6 Concluding Remarks The rule generation for the lower approximations is designed and implemented in this research. The proposed DRSA inferring the relational structure by the mathematical programming makes optimal solutions available for objectives. In the results not only the rough approximations but also the dominance criteria and boundaries are provided. Based on the WCY 2010, the business finance and attitudes & values are inferred as the dominance criteria for the top 10 nations. Nations higher than or equal to the dominance boundaries at least sustain positions or made progress during 2009 ~ 2010.
References [1] Greco, S., Matarazzo, B., Slowinski, R.: Rough set approach to multi-attribute choice and ranking problems. In: ICS Research Report 38/95, Proceedings of the Twelfth International Conference, Hagen, Germany, pp. 318–329. Springer, Berlin (1997) [2] Greco, S., Matarazzo, B., Slowinski, R.: Rough approximation of a preference relation by dominance relations. ’European Journal of Operational Research 117(1), 63– 83 (1999) [3] Greco, S., Matarazzo, B., Slowinski, R.: Extension of the rough set approach to multicriteria decision support. INFOR 38(3), 161–193 (2000) [4] Greco, S., Matarazzo, B., Słowiński, R., Stefanowski, J.: Variable consistency model of dominance-based rough sets approach. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 170–181. Springer, Heidelberg (2001) [5] Greco, S., Matarazzo, B., Slowinski, R.: Rough set theory for multicriteria decision analysis. European Journal of Operational Research 129(1), 1–47 (2001) [6] Greco, S., Matarazzo, B., Slowinski, R.: Rough approximation by dominance relations. International Journal of Intelligent Systems 17(2), 153–171 (2002) [7] Blaszczynski, J., Greco, S., Matarazzo, B., Slowinski, R.: Multi-criteria classification - A new scheme for application of dominance-based decision rules. European Journal of Operational Research 181(3), 1030–1044 (2007) [8] Liou, J.H., Yen, L., Tzeng, G.H.: Using decision rules to achieve mass customization of airline services. European Journal of Operational Research 205(3), 680–686 (2010) [9] Fan, T.F., Liu, D.R., Tzeng, G.H.: Rough set-based logics for multicriteria decision analysis. European Journal of Operational Research 182(1), 340–355 (2007) [10] Shyng, J.-Y., Shieh, H.-M., et al.: Using FSBT technique with Rough Set Theory for personal investment portfolio analysis. European Journal of Operational Research 201(2), 601–607 (2010)
34
Y.-C. Ko and G.-H. Tzeng
[11] Błaszczyński, J., Dembczyński, K., Słowiński, R.: Interactive analysis of preferenceordered data using dominance-based rough set approach. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 489–498. Springer, Heidelberg (2006) [12] Li, H.L., Chen, M.H.: Induction of Multiple Criteria Optimal Classification Rules for Biological and Medical Data. Computers in Biology and Medicine 38(1), 42–52 (2008) [13] Kotlowski, W., Dembczynski, K., Greco, S., Slowinski, R.: Measures of monotone relationships using dominance-based rough set approach (DRSA), http://www.iiasa.ac.at/~marek/ftppub/Pubs/csm06/ kotlowski_pres.pdf [14] IMD, World Competitiveness Yearbook, Institute Management Development, Lausanne, Switzerland (2009) [15] IMD, World Competitiveness Yearbook, Institute Management Development, Lausanne, Switzerland (2010) [16] Pawlak, Z., Slowinski, R.: Rough set approach to multi attributes decision analysis. European Journal of Operational Research 72(3), 443–459 (1994) [17] Pawlak, Z.: Rough set approach to knowledge-based decision support. European Journal of Operational Research 99(1), 48–57 (1997) [18] Pawlak, Z.: Rough sets, decision algorithms and Bayes’ theorem. European Journal of Operational Research 136(1), 181–189 (2002) [19] Zhu, D., Premkumar, G., Zhang, X., Chu, C.H.: Data mining for network intrusion detection: a comparison of alternative methods. Decision Science Journal 32(4), 635– 660 (2001) [20] Kyeong-Won, K., Hwa-Nyeon, K.: Global financial crisis overview. SERI Quarterly 2(2), 12–21 (2009) [21] Stiglitz, J.E.: Lessons from the Global Financial Crisis of 2008. Seoul Journal of Economics 23(3), 321–339 (2010)
A Dominance-Based Rough Set Approach of Mathematical Programming
35
Appendix (I) Table 4 Competitiveness ranks during 2009 ~ 2010 nations
competitiveness ranks
progressing
degenerating
2009
2010
Argentina
55
55
Australia
7
5
1
Austria
16
14
1
Belgium
22
25
Brazil
40
38
Bulgaria
38
53
Canada
8
7
Chile
25
28
China Mainland
20
18
1
Colombia
51
45
1
Croatia
53
56
Czech Republic
29
29
Denmark
5
13
Estonia
35
34
Finland
9
19
France
28
24
Germany
13
16
Greece
52
46
Hong Kong
2
2
Hungary
45
42
Iceland
-
30
1
India
30
31
1
Indonesia
42
35
Ireland
19
21
Israel
24
17
1
Italy
50
40
1
Japan
17
27
Jordan
41
50
Kazakhstan
36
33
1
Korea
27
23
1
Lithuania
31
43
Luxembourg
12
11
1 1
the top 10 nations
1 1
1 1 1
1 1
1 1 1 1 1 1 1 1 1
1 1
1 1
1
Malaysia
18
10
Mexico
46
47
1
1
Netherlands
10
12
1
New Zealand
15
20
1
36
Y.-C. Ko and G.-H. Tzeng
Table 4 (continued) Norway
11
9
Peru
37
41
1
1 1
Philippines
43
39
1
Poland
44
32
1
Portugal
34
37
1
Qatar
14
15
1
Romania
54
54
Russia
49
51
Singapore
3
1
1
Slovak Republic
33
49
Slovenia
32
52
South Africa
48
44
1 1
1
1 1 1
Spain
39
36
Sweden
6
6
1
Switzerland
4
4
1
Taiwan
23
8
Thailand
26
26
1
1
Turkey
47
48
1
Ukraine
56
57
1
United Kingdom
21
22
1
USA
1
3
1
Venezuela
57
58
1
1
A GPU-Based Parallel Algorithm for Large Scale Linear Programming Problem Jianming Li, Renping Lv, Xiangpei Hu, and Zhongqiang Jiang
*
Abstract. A GPU-based parallel algorithm to solve large scale linear programming problem is proposed in this research. It aims to improve the computing efficiency when the linear programming problem becomes sufficiently large scale or more complicated. This parallel algorithm, based on Gaussian elimination, uses the GPU (Graphics Processing Unit) for computationally intensive tasks such as basis matrix operation, canonical form transformation and entering variable selection. At the same time, CPU is used to control the iteration. Experimental results show that the algorithm is competitive with CPU algorithm and can greatly reduce the computing time, so the GPU-based parallel algorithm is an effective way to solve large scale linear programming problem. Keywords: Linear Programming, Parallel Algorithm, GPU, CUDA (Compute Unified Device Architecture).
1 Introduction Linear programming problem is a basic branch of operational research. Linear programming algorithm was first presented by Dantzig in 1947. It plays a very important role in many kinds of economic activities such as allocation of funds and task scheduling [1-2]. It is mainly used to solve how to utilize existing resources to make the prospective goal achieving optimum. Although the existing linear programming problem algorithms are effective in solving many practical problems, Jianming Li · Renping Lv · Zhongqiang Jiang School of Electronic & Information Engineering Dalian University of Technology No. 2 Linggong Rd., Ganjingzi Dist., Dalian, Liaoning, 116023, China e-mail:
[email protected] *
Xiangpei Hu Institute of Systems Engineering Dalian University of Technology No. 2 Linggong Rd., Ganjingzi Dist., Dalian, Liaoning, 116023, China e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 37–46. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
38
J. Li et al.
they have to run a long time to find solutions to large scale problems. To overcome this limitation, researchers have proposed some methods for improvement, such as the infeasible exterior point simplex algorithm for assignment problems[3] and the efficient search direction for linear programming problems[4]. These methods reduce the iterative times successfully, but fail to improve the efficiency of iterative process. Recently a more promising approach has attracted a lot of attention which parallelizes these algorithms on parallel or distributed computers[5-7]. It has advantages of reducing iteration times, speeding up the solving process. However these existing parallel algorithms have brought the users a lot of inconvenience due to the following drawbacks: (1) For most researchers, the parallel machines are too expensive, and the usage and management of parallel machines is relatively complex. (2) The frequent communications between CPUs in parallel machine are hardly acceptable in most parallel machines. In recent years, the increasing demand of the multimedia and games industries for accelerating 3D rendering has driven the development of graphics processing unit (GPU) [8-10]. These consumer-level GPUs are cost-effective not only for game playing, but also for scientific computing. Many researchers and developers have shown their interest in harnessing the power of commodity graphics hardware for general-purpose parallel computation [11-13]. JIN implements parallel linear programming algorithm on GPU Shader and speedup well. Comparing with GPU Shader, CUDA contains less passes, and it’s more efficient and flexible. Based on the analysis above, this paper will propose a linear programming algorithm based on GPU programming with CUDA. In section 2, we present our parallel linear programming algorithm principle. Then numerical experiment is given in section 3. We will give a conclusion and a description of our future work in the last section.
2 GPU-Based Parallel Linear Programming Algorithm 2.1 Linear Programming Problem Linear programming is the problem of optimizing a linear objective function subject to a set of linear constraints, either equalities or inequalities. The standard form is shown as follows: m a x z = c1 x1 + c 2 x 2 + " + c n x n ⎧ ⎪ ⎪ ⎪ s .t ⎨ ⎪ ⎪ ⎪⎩ x j
a 1 1 x 1 + a 1 2 x 2 + " + a 1 n x n = b1 a 2 1 x1 + a 2 2 x 2 + " + a 2 n x n = b 2 "" a m 1 x1 + a m 2 x 2 + " + a m n x n = b m ≥ 0, j = 1 ~ m
A GPU-Based Parallel Algorithm for Large Scale Linear Programming Problem
39
Simplex algorithm is a classical method to solve the linear programming problem. After getting the standard form of linear programming problem, we can solve it through simplex algorithm. For the past few years, scholars have combined simplex algorithm, genetic algorithm and other integrated algorithm to improve the efficiency of solving linear programming. The algorithm procedures are illustrated in detail as following: Step 1. Initialization. Find out the initial feasible base, determine the basic feasible solution, calculate the value of objective function, and build the initial linear programming tableau. Step 2. Optimality criterion. Calculate check number vectors; test the check numbers of non basic variable x j . σ = c − m c a ∑ j
j
i =1
i
ij
Stop if σ j ≤ 0 , so we find the optimum solution, otherwise go to the next step. Step 3. Insolubleness check. Find out max( σ j ) = σ k among
σ j > 0 , j = m + 1, " n . Identify xk as entering variable. The problem is insoluble if the counterpart coefficient column vector A k ≤ 0 , then stop. Otherwise calculate the θ . b b θ = min{ i | a ik > 0 , i = 1, " , m } = l a ik a lk
Identify xl as s leaving variable and turn to the next step. Step 4. The leaving variable of xl and entering variable of xk . Use Gauss elimination method to process elementary transformation matrices with pivot element alk . Transform the counterpart column vector of x k and A k = ( a1 k , a 2 k , " , a lk , " , a mk ) T into A k' = ( 0 , 0 , L ,1, L , 0 ) T . Transform xl from feasible basis x B into x k , and get new linear programming tableau. Turn to step 2. While processing large-scale linear programming problem, each iteration of simplex algorithm takes a lot of computing time. The time complexity of CPU is related to m * m .
2.2 Principle of GPU-Based Parallel Algorithm Simplex algorithm can be transformed to SIMD process in GPU with adequately utilizing the ability of high speed floating-point calculation and parallel calculation of GPU. Parallel algorithm will improve the efficiency of solving large-scale linear programming problem. Simplex algorithm entirety is iteration process that the calculation of current iteration depends on previous iteration. The parallel simplex algorithm based on GPU is divided into four parts: (1) parallel simplex algorithm of
40
J. Li et al.
linear programming, (2) parallel calculation of canonical form, leaving and enter variables, (3) implementation of GPU-based parallel algorithm, (4) optimizing the use of GPU memory. 2.2.1 Parallel Simplex Algorithm of Linear Programming
In each iteration, computational intensive tasks are concentrated on elementary transformation matrices. The process of coefficient matrix need m * n time-steps on CPU. After attentively considering, we would find that the operation on each element of the coefficient matrix is independent. Mapping the operation on the elements of each row in coefficient matrix to a thread in GPU is feasible. This paper use CPU to control the iteration while use GPU for calculating intensive tasks. So the canonical form and checkout vector σ could be calculated by n threads simultaneously that will greatly reduce computation time of each iteration. 2.2.2 Parallel Calculation of Canonical Form, Leaving and Entering Variables
Computationally intensive tasks are assigned to GPU, such as entering variable selection, leaving variable selection and calculating canonical form while are concentrated at step 4 in section 2.1. Calculation of canonical form is divided into two steps: changing pivotal element to one and rotate variable. We change pivotal element to one through dividing the element of the pivot row by pivotal element. It needs n time-steps in CPU while one time-step in GPU through mapping element i to thread blocks Ti (0<= i
ai (i = 0,1, ", n − 1)
Fig. 1 The comparison of changing pivotal element to one between CPU and GPU
A GPU-Based Parallel Algorithm for Large Scale Linear Programming Problem
41
Entering variable is the largest variable in checkout vector. We use parallel reducing algorithm to find it out as shown in figure 2. The process in CPU need n-1 time-steps while (log n) time-steps in GPU. The process of leaving variable also uses parallel reducing algorithm.
Fig. 2 Parallel reducing algorithm
2.3 Implementation of GPU-Based Parallel Algorithm In the part, we realize the parallel linear programming algorithm in GPU using CUDA. (1) Variable definition: n : The number of variables in standard form; m : The number of constraint conditions; A [ m ][ n ] : Coefficient matrix; c : Cost vector; b : Right-hand member vector. (2) The execution step of the program: Step 1: Initialize parameters. Convert linear programming problem to standard form by adding slack variable and surplus variable in CPU; Initialize m and n and matrix A[ m][n] ; Transmit data to GPU. Step 2: Calculate basis feasible solution in GPU. Step 3: Calculate max(σ j ) = σ k in GPU using parallel reducing algorithm, calculate entering variable obtained in CPU.
xk and judge whether the optimal solution has been
42
J. Li et al.
θ
in GPU, calculate leaving variable xl and judge whether the problem is unbounded in CPU. Step 5: Make Gauss transformation using alk as pivotal element, go to Step 2. Following is the flow chart of our algorithm: Step 4: Parallel calculate
ٛ
Fig. 3 Flow chart of the parallel simplex algorithm based on GPU-acceleration
2.4 Optimizing the Use of GPU Memory In order to improve the GPU memory use of parallel linear programming algorithm, we propose the following improvement: (1) Optimizing use of GPU chip memory. Each thread needs to visit column k of coefficient matrix during the Gauss transformation. According to the feature of GPU, the access speed of shared memory is faster than that of global memory. So the column k is copied from global memory to shared memory. Each thread of a block just reads data from shared memory during computing which will reduce the time of fetching data. To make full use of this feature, the number of threads in each block is set to the upmost value.
A GPU-Based Parallel Algorithm for Large Scale Linear Programming Problem
43
(2) The back-and-forth transmission of data between CPU and GPU take a negative effect to parallel simplex algorithm. So the frequency and quantity of data transmission should be reduced as far as possible. In this paper, the transmitted data in each cycle just include two integers which are used to judge whether the optimal solution is already obtained and whether the problem is unbounded.
3 Experimental Results and Analysis We apply our algorithm to a set of linear programming problems. In order to analyze the relation between the accelerating effect and the column number m , the row number n of coefficient matrix, we performed two experiments on a computer which has 1.6GHz Core 2 Inter 2140 processor, 2048M RAM, NVDIA GeForce 9800GT display card and Windows 7 operation system. We applied GPU and CPU algorithm to each data group for 100 times and compare the average execution time of each algorithm. Table 1 and table 2 is the experimental results of each group. Logarithmic diagram 1 and diagram 2 show the difference of algorithm execution time between CPU and GPU. Table 1 Coefficient matrix m=n m
n
CPU times (ms)
GPU times (ms)
Speed-up
100 200 500 1000 2000 5000 7000
100 200 500 1000 2000 5000 7000
57 162 1004 3912 15123 96413 192499
50 72 104 180 295 1166 1604
1.1 2.3 9.7 21.7 51.3 82.7 120.0
Table 2 Coefficient matrix m>>n m
n
CPU times(ms)
GPU times(ms)
Speed-up
100 200 500 1000 2000 5000 7000
10 10 10 10 10 10 10
41 95 478 1945 7410 39021 64715
42 67 81 181 249 530 754
1.4 5.9 10.7 29.8 73.6 85.8
44
J. Li et al. m=n 250 ) c200 e s ( e m150 i t n o i100 t u c e x 50 E
CPU GPU
0 100
200
500
1000
2000
5000
7000
Problem scale
Fig. 4 Network m=n
m>>n 70 )60 c e s (50 e m i40 t n30 o i t u c20 e x E 10
CPU GPU
0 100
200
500
1000
2000
5000
7000
Problem scale
Fig. 5 m >> n
The results indicate that our parallel algorithm has the following advantages: (1) Better speed-up. Datum in Table 1 shows the execution time of parallel linear programming algorithm is shorter than CPU method. The speedup ratio is 120 when the scale of coefficient matrix’s size reaches to 7000*7000. While the linear programming problem’s scale is larger, the effect of acceleration of parallel simplex algorithm is more obviously. (2) The speed-up ratio linearly correlates with m . When m is equal to n , Figure 4 indicates that the execution time of CPU is related to m * m , and the time complexity in theory is O ( m * ( m + n)) . The execution time of GPU linearly correlates with
m , and the time complexity in theory is O (m + n) . In the case
A GPU-Based Parallel Algorithm for Large Scale Linear Programming Problem
45
that m is far larger or smaller than n , our algorithm doesn’t perform well, see Figure 4 and Figure 5. To sum up, our algorithm has the most obvious acceleration result when m and n are both large.
4 Conclusions In this paper, we propose a GPU-based parallel algorithm to solve large scale linear programming problem. From the analysis of experimental results, we can conclude that our algorithm is competitive with CPU algorithm, it can greatly reduce the computing time. Therefore, the GPU-based parallel algorithm is an effective way to solve the large scale linear programming problem. As pointed out above, the advantages of the results are summarized as follows. From a theoretical perspective, our algorithm is a parallel algorithm other than serial algorithm, so the computing time can be greatly reduced. This research contributes to the literature on the large scale linear programming problem. From a practical perspective, the proposed algorithm provides a helpful decision-making tool to ordinary users who are hard to contact with parallel machine, to solve large scale problems, such as production planning problems, inventory control, etc.
References [1] Papadimitrious, C.H., Steiglitz, K.: Combinatorial optimization: algorithms and complexity. Printice-Hall Inc., Englewood Cliffs (1992) [2] Marriott, K., Stuckey, P.J.: Programming with Constraints An Introduction. The MIT Press, Cambridge (1998) [3] Paparrizos, K.: An infeasible exterior point simplex algorithm for assignment problems. Mathematical Programming 51(1-3), 45–54 (1991) [4] Luh, H., Tsaih, R.: An efficient search direction for linear programming problems. Computers and Operations Research 29(2), 195–203 (2002) [5] Nwana, V., Darby-Dowman, K., Mitra, G.: A co-operative parallel heuristic for mixed zero-one linear programming: Combining simulated annealing with branch and bound. European Journal of Operational Research 164, 12–23 (2005) [6] Lyu Jr, J., Luh, H., Lee, M.-c.: Performance analysis of a parallel dantzig-wolfe decomposition algorithm for linear programming. Computers and Mathematics with Applications 44, 1431–1437 (2002) [7] Maros, I., Mitra, G.: Investigating the sparse simplex algorithm on a distributed memory multi processor. Parallel Computing 26, 151–170 (2000) [8] Li, J.M., Wan, D.L., Chi, Z.X., Hu, X.P.: An Efficient Fine-Grained Parallel Particle Swarm Optimization Method based on GPU-acceleration. International Journal of Innovative Computing, Information and Control 3(6), 1707–1714 (2007) [9] Harris, M.J., Coombe, G.: Physically-based Visual Simulation on Graphics Hardware. In: Proceedings of Graphics Hardware, pp. 109–118 (2002) [10] Jowens, J.D., Luebke, D., Govindaraju, N.: A survey of general purpose computation on graphics hardware. Euro-Graphics 2005, 21–51 (2005)
46
J. Li et al.
[11] Li, J.-m., Chi, Z.-x., Wan, D.-l.: A Parallel Genetic Algorithm Based on Fine-grained Model With GPU-Accelerated. Journal of Harbin Institute of Technology 23(6), 697–704 (2008) [12] O’Leary, D.P., Jun, J.H.: Implementing an Interior Point Method for Linear Programs on a CPU-GPU System. Electronic Transactions on Numerical Analysis 28, 174–189 (2008) [13] Wang, G.M., Wan, Z.P., Wang, X.J.: Genetic algorithm based on simplex method for solving linear-quadratic bi-level programming problem. Computers & Mathematics with Applications 56(10), 2550–2555 (2008)
A Hybrid MCDM Model on Technology Assessment to Business Strategy Mei-Chen Lo, Min-Hsien Yang, Chien-Tzu Tsai, Aleksey V. Pugovkin, and Gwo-Hshiung Tzeng *
Abstract. The wave of globalization, and government’s policy towards joining regional market with boundary-less. The entire market becomes a fair and fully competitive market environment, and in order to develop feasible technology strategies. Technology Assessment (TA) is being increasingly viewed as an important tool to aid in the shift towards technology development. The paper aims at to reflect different aspects of technology assessment and their relative importance for future business strategy. We adopt the hierarchical model with multiple criteria to evaluate the alternative concepts in approaching technology assessment process for business strategy (BS). This paper use DANP methods includes DEMATEL, and ANP to establish the investment model. The preference of strategies is demonstrated by VIKOR for selecting appropriate alternatives. The relationship of interdependence and feedback from criteria to influence the setting priority of strategies are discussed. The presented model appears to be comprehensive, flexible and easy to implement in managerial practice. The numerical example is illustrated. Keywords: Technology Assessment, Business Management (BS), Multiple Criteria Decision Making (MCDM), DEMATEL (Decision Making Trial and Evaluation Laboratory), Analytic Network Process (ANP), VIKOR (VlseKriterijumska Optimizacija I Kompromisno Resenje). Mei-Chen Lo Department of Business Management, National United University No. 1, Lienda Rd., Miaoli 36003, Taiwan e-mail:
[email protected] *
Mei-Chen Lo · Gwo-Hshiung Tzeng Institute of Project Management, Kainan University No. 1, Kainan Road, Luchu, Taoyuan County 338, Taiwan e-mail:
[email protected] Min-Hsien Yang · Chien-Tzu Tsai Management School, Feng Chia University, Taichung, Taiwan No. 100, Wenhwa Rd., Seatwen, Taichung, 40724 Taiwan Aleksey V. Pugovkin Department of Control Systems and Radioelectronics, Tomsk State University No. 40, Lenina Prospect, Tomsk, Russia 634050 e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 47–56. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
48
M.-C. Lo et al.
1 Introduction Technology Assessment (TA) is as an important tool to aid in the shift towards technology development. TA is a wide concept and evolving at the different levels: national, industrial and corporate. This article aims to provide some clarification by reflecting on the different aspects described in the literature as the forms of TA, and evaluating them in terms of their potential contributions to business performance. For a growing international business, it is facing to the increasingly complex operations in many aspects, including finance, general administration, logistic, sales and marketing, and manufacturing. Although multinational cooperation has ample resources and well experienced in cross-nation management, they have to integrate their global information technology and establish and unified strategy for universal implementation, so as to enable monitoring and assisting the operations of all subsidiaries around the globe in an effective and timely manner. As such, TA starts to play an increasingly important role in developing responsive technology compatibility and innovative strategies under the intense local market competition. Both Russian and Taiwanese multinational corporations face the issues of the application business deployment and system integration and maintenance architecture, regardless of the management style of the regional cultures – organization (business units) and information architecture is centralized, decentralized, or both. This study discusses multinational corporations’ business modes – functions, organization, system on integration and maintenance, and related theories with hybrid Multiple Criteria Decision Making (MCDM) for performance evaluation toward to guidance business strategies. The numerical example is presented to ensure the method is feasible and useful for the illustrations.
2 Technology Assessments There are many empirical studies on TA (Coates, 1980; Schot and Rip, 1997; Van Den Ende et al., 1998; Van Eijndhoven, 1997). Pope et al. (2004) presents a conceptualizing sustainability assessment and uses a concept of “integrated assessment” to achieve the impact of strategic assessment. Smits and Leijten (1991) focus on TA as a process consisting of analyses of technological development and its consequences and of debate in relationship to these consequences. It provides information that could help the company involved in developing their strategies. Coates (1980) presents that TA is a class of policy studies which systematically examine the effects on society that may occur when a technology is introduced, extended or modified. It emphasizes those consequences that are unintended, indirect or delayed. Cetron and Connor (1972) attempt to establish an early warning system to detect, control, and direct technological changes and developments so as to maximize the public good while minimizing the public risks. Pretorius and de Wet (2000) define a framework by the hierarchical structure of the enterprise to assess the impact of manufacturing technology on the productivity and competitiveness of the enterprise.
A Hybrid MCDM Model on Technology Assessment to Business Strategy
49
Following the previous findings, Lo (2010) and Lo et al. (2007) presents the attributes of TA and the hierarchy structure of evaluation model (as Fig. 1) on corporate level that are important to users and investors, including capability-R&D, competition-rivalry, competence-manufacturing, and customer-market. In this approach we incorporate the four-aspect of TA accordingly to the concepts of multiple criteria quantitative modeling. The main goal TA is divided into four aspects at the second level of the hierarchy. Every aspect is divided into three or four criteria at the next level.
Technology
Competition Environment
Technology Product Related Technologies Compatibility Reusable
Rivalry Market Supplier
Internal Strengths
External Opportunities
Competence of Manufacturing Capability Utilization Productivity Flexibility
Customer Service Production
Quality Timing Delivery Adaptability
Technology Assessment to Business Strategy
Research and Development
A. Research and Development (RD)
B. Competition Environment (CE) C. Competence of Manufacturing (CM) D. Customer Service (CS)
a1 Technology Product a2 Related Technologies a3 Compatibility a4 Reusable b1 Rivalry b2 Market b3 Supplier c1 Capability c2 Utilization c3 Productivity c4 Flexibility d1 Quality d2 Timing d3 Delivery d4 Adaptability
Fig. 1 Framework to Hierarchy Structure for Technology Assessment
3 Assessment Model This paper is combining DEMATEL, ANP and VIKOR for solving the dependence and feedback problems (relationship map, weighted, and ranking) to reflect to the real world. Across this method we recognize the gaps and guide the direction for business strategies of potential partners from Taiwan and Russia. This study aims to decide the sub-factors that would affect mutual influence of four perspectives and sub-factors, and to establish a more complete business strategy evaluation framework of TA.
3.1 A Hybrid MCDM Model for Business Opportunity Evaluation The evaluation procedure of this study consists of several steps. First, we identify the aspects (dimensions) and criteria that managers (people who concerns the business opportunities with Taiwan and Russia) consider the most important. After constructing the evaluation criteria hierarchy, we manipulate DEMATEL technique to build an network-relationship map (NRM) and an ANP method is then used to obtain the relative importance of weightings in preferences for each criterion. The measurement of performance corresponding to each criterion is conducted under surveying the domain experts. Finally, we conduct VIKOR method and to index criteria of identifying a way to achieve the aspired outcomes as well as the ranking results. The ANP method currently deals with normalization in the supermatrix by assuming each cluster has equal weight. Although the method to normalize the supermatrix is easy, using the assumption of equal weight for each cluster to obtain
50
M.-C. Lo et al.
the weighted supermatrix seems to be irrational because there are different degrees of influence among the criteria in real world (Ou Yang et al., 2008).
3.2 About the Methodologies The DEMATEL technique was developed by the Battelle Geneva Institute: (1) to analyze complex ‘real world problems’ dealing mainly with interactive mapmodel techniques; and (2) to build qualitative and factor-linked aspects of societal problems (Gabus & Fontela, 1972). The DEMATEL technique was used to investigate and solve the complicated problem group. DEMATEL technique was developed with the belief that the pioneering and proper use of scientific research methods could help to illuminate specific and intertwined phenomena and contribute to the recognition of practical solutions through a hierarchical structure. The methodology, according to the concrete characteristics of objective affairs, can verify interdependence among variables/ attributes and confine the relationship that reflects the characteristics with an essential system and evolutionary trend. DEMATEL has been successfully applied in many situations such as marketing strategies, e-learning evaluations, control systems, safety problems, and environment watershed plans (Liou et al., 2007; Tzeng et al., 2007). The ANP is the general form of the analytic hierarchy process (AHP) (Saaty, 1980) which has been used in MCDM to release the restriction of hierarchical structure. The ANP method is expressed by a unidirectional hierarchical relationship among decision levels. The top element of the hierarchy is the overall goal for the decision model. The hierarchy decomposes to a more specific criterion, until a level of manageable decision criteria is met. Under each criterion, sub-criteria elements relative to the criterion can be constructed. The ANP separates complex decision problems into elements within a simplified hierarchical system. This study adopt the concept of ANP and combing DEMATEL and ANP method using in obtaining the relationship between each dimension/criteria and the relative weight of criteria. The VIKOR method was developed for multi-criteria optimization of complex system. It determines the compromise ranking list, the compromise solution, and the weight stability intervals for preference stability of the compromise solution obtained with the initial given weights. This method focuses on ranking and selecting from a set of alternatives in the presence of conflicting criteria. It introduces the multi-criteria ranking index based on the particular measure of “closeness” to the “ideal” solution (Opricovic & Tzeng, 2004). Assuming that each alternative is evaluated according to each criterion function, the compromise ranking could be performed by comparing the measure of closeness to the ideal alternative. The multi-criteria measure for compromise ranking is developed which used as an aggregating function in a compromise programming method.
4 Numerical Example This research involves several objects both from Taiwan and Russia, which are High-Tech manufacturing related leaders, telecom experts from different function
A Hybrid MCDM Model on Technology Assessment to Business Strategy
51
which includes top management, senior R&D, the senior administrative personnel and marketing functions. The questionnaire of TA evaluation mainly was composed of two parts: questions for evaluating the relative importance of criteria and company’s performance corresponding to each criterion. Table 1 The total-influence dimensions matrix TD. Dimensions
RD
CE
CM
CS
ri
RD
0.292
0.387
0.342
0.596
1.616
CE
0.512
0.252
0.343
0.589
1.695
CM
0.355
0.291
0.213
0.550
1.409
CS
0.728
0.620
0.644
0.924
2.915
si
1.887
1.549
1.541
2.659
Table 2 The sum of influences cause and affected on dimensions and criteria. Dimensions/Criteria
ri
si
ri + si
ri − si
A. Research and Development (RD)
1.616
1.887
3.503
-0.270
a1 Technology Product
6.992
6.472
13.464
0.520
a2 Related Technologies
6.316
6.887
13.203
-0.571
a3 Compatibility
6.599
6.808
13.407
-0.209
a4 Reusable
6.675
6.577
13.252
0.098
B. Competition Environment (CE)
1.695
1.549
3.244
0.146
b1 Rivalry
7.012
5.407
12.419
1.605
b2 Market
6.780
6.566
13.346
0.214
b3 Supplier
6.313
6.199
12.512
0.114
C. Competence of Manufacturing (CM)
1.409
1.541
2.950
-0.132
c1 Capability
6.741
6.744
13.485
-0.003
c2 Utilization
6.416
6.737
13.153
-0.321
c3 Productivity
6.810
7.543
14.353
-0.733
7.251
6.564
13.815
0.687
2.915
2.659
5.574
0.256
c4 Flexibility D. Customer Service (CS) d1 Quality
6.877
6.828
13.705
0.049
d2 Timing
5.995
6.980
12.975
-0.985
d3 Delivery
5.626
6.611
12.237
-0.985
d4 Adaptability
6.741
6.221
12.962
0.520
To gain information that is more valuable for making decisions, we use four dimensions as of Research and Development (RD), Competence of Manufacturing (CM), Competition Environment (CE) and Customer Service (CS) to draw a
52
M.-C. Lo et al.
relationships diagram of business opportunities by TA evaluating and ANP to determine the evaluation criteria weights and rank the priority. As Table 1, Table 2 and Table 3 present the total-influence dimensions matrix, the sum of influences cause and affected on dimensions/criteria and its weights. By the improvement from the consideration of its interrelationship, influence on cause and affect. Fig. 2 demonstrated the directions for strategic move by priority.
2.50
r-s 2. 50
2.00
b1 Rivalry
1.50
1. 50
1.00 0.50
r-s
2. 00
1. 00
b2 Market
b3 Suplier
12.00 - 0.50
12.50
13.00
13.50
d4 Adaptable
0. 50
r+s
0.00 14.00
d1 Quality
r+s
0. 00
14.50
12. 00 -0. 50
- 1.00
-1. 00
- 1.50
-1. 50
12. 50
13. 00
d3 Delivery
r-s
13. 50
14. 00
14. 50
d2 Timing
D Customer
0.30
Service (CS)
B Competitive Environment (CE)
0.20
0.10
r+s
0.00 0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
-0.10
C
Competence of Manufacturing (CM)
-0.20
A
Research and Development (RD)
-0.30
r-s
2.50
2.50
2.00
2.00
1.50
1.50
c4 Flexibility
1.00 0.50
- 1.00
12.50
13.00
13.50
1.00
a1 Technology Product a4 Reusable
0.50
c1 Capability
r+s
0.00 12.00 - 0.50
r-s
14.00
14.50
c2 Utilization c3 Productivity
- 1.50
r+s
0.00 12.00 - 0.50 - 1.00 - 1.50
12.50
13.00
13.50
14.00
14.50
a3 Compatibility a2 Related Technologies
Fig. 2 The impact NRM of relations
The overall relative weights of the four aspects of TA, which are obtained by applying ANP. Due to the differences of TA environment, construction of the technology platform, business function, industry position and so forth, the various units lead the thought on a difference into the way of analysis by job function. We set several groups to calculate the relative weights as Table 3. Obviously, except administration group, the other groups’ shows consistent on the rank with the sequence of RD, CM, CE and CS.
A Hybrid MCDM Model on Technology Assessment to Business Strategy
53
Table 3 The Weights of Respond TA RD [w1]
Item
CE [w2]
CM [w3]
CS [w4]
Management
0.432 (1) 0.152 (3) 0.228 (2) 0.138 (4)
R&D
0.414 (1) 0.181 (3) 0.235 (2) 0.122 (4)
Administration 0.279 (2) 0.204 (3) 0.182 (4) 0.284 (1) Operation
0.348 (1) 0.228 (3) 0.238 (2) 0.136 (4)
Marketing
0.350 (1) 0.200 (3) 0.217 (2) 0.184 (4)
From Table 4, the aspect of synthesis performance value for different functions, it shows Management has superior complacency for TA, next is Marketing then Administration, Operational function and the last is R&D, in sequence. Table 4 Overall Performance Measure if Different Functions Evaluation Criteria
Management R&D
a1 Technology Product
7.55 (12)
Administration Operation Marketing
5.52 (15) 7.17 (9)
6.47 (14) 7.28 (10)
a2 Related Technologies 8.02 (3)
6.10 (13) 7.26 (7)
6.83 (10) 7.99 (5)
a3 Compatibility
7.04 (7)
8.39 (1)
7.05 (7)
7.16 (12) 7.26 (11)
7.97 (5)
a4 Reusable
7.82 (7)
6.9 0(10) 7.30 (6)
7.11 (6)
b1 Rivalry
7.39 (14)
6.98 (9)
7.75 (4)
6.32 (15) 8.25 (2)
b2 Market
7.06 (15)
7.09 (6)
6.49 (15)
7.20 (4)
7.80 (8)
b3 Supplier
7.80 (8)
7.96 (1)
6.94 (11)
6.93 (8)
6.90 (13)
c1 Capability
7.61 (11)
7.30 (4)
7.22 (8)
6.56 (12) 8.07 (3)
c2 Utilization
7.67 (10)
7.15 (5)
6.67 (14)
7.20 (4)
c3 Productivity
7.44 (13)
6.06 (14) 6.94 (11)
7.85 (7)
6.63 (11) 6.76 (14)
c4 Flexibility
8.03 (2)
6.79 (11) 7.13 (10)
6.56 (12) 6.64 (15)
d1 Quality
7.94 (6)
6.14 (12) 7.49 (5)
7.29 (3)
8.07 (3)
d2 Timing
8.16 (1)
7.04 (7)
6.90 (13)
8.04 (2)
7.35 (9)
d3 Delivery
7.76 (9)
7.63 (2)
8.39 (1)
6.87 (9)
7.99 (5)
d4 Adaptability
8.00 (4)
7.48 (3)
7.95 (3)
8.29 (1)
8.99 (1)
Synthesis value (Rank)
77.38 (1)
68.46 (5) 73.05 (3)
70.58 (4) 76.48 (2)
Remark: synthesis value = performance value *10 * weight.
The strategies/alternatives (Lo et al., 2007) are presented, which emphasizes the business goal of satisfying customers’ needs. –
Innovation/Intelligent Property (IIP): Build an innovative environment for continuously technology development achievement;
54
M.-C. Lo et al.
– – – – –
Knowledge Platform (KPF): Knowledge accumulation, problem solving, lesson learned and information sharing; Response System (RPS): External environmental change to cause Market/Operation strategies to be adjusted; Communication System (CMS): a wide channel for customer services in viewing the progress of the on-line production and 24-hour on line service; Efficiency Evaluation System (EES): Internal and external factors to move the operation work in effectively; Competitiveness Evaluation System (CES): Evaluate the way of technology adoption and shorten the time of technology development into production phase.
Beside, there is no business model or strategies can be applied to the real world all the same, so when adopting a feasible strategies in the changeable business environment, it is necessary to consider as our method which concerns customers’ feelings and needs, according to their tendency to find the gap to improve it as well as heading to achieve the ideal solution or aspired level. For the RD criterion, such priorities include related technologies, compatibility, reusable, and technology product. For the CE criterion, the priorities include market, supplier, and rivalry. For the CM criterion, the priorities include productivity, capability, utilization, and flexibility. For the CS criterion, the priorities include timing, quality, delivery, and adaptability. Combing Fig. 2 and Table 5 which clarify that improvement of priorities in dimensions/criteria should be considered the NRM by thinking whole systems for reducing the gaps to achieve the customers’ needs, it is not only care about the weightings but also the influence by direction and indirection to each other. The performance evaluation has been demonstrated as Table 5 which combining with relative importance of criteria by ANP (global weights). We use the global weight in ANP to compare the performances of each alternative as the ANP provides significant feedback. Respondents were asked to evaluate the level of satisfaction according to each criterion. The performance score and gap (by aspired level) for the possible alternatives of TA concerns is shown in Table 5. Using the performance values, relative results can be obtained. Therefore, the gap identified related to needs-recognition and evaluation of alternatives, which is the same as the DEMATEL of the impact-direction map shown in Fig. 2. By integrating and calculating the investigated data, we verify the overall performance and determine that RPS surpasses EES, which surpasses CES, KPF, IIP, and then CMS (as of RPS ; EES ; CES ; KPF ; IIP ; CMS). The overall performance value of EES (30.814) and CES (30.763) are close, whereas the CMS has a large gap from them. The performance indexes of CMS further demonstrates that the evaluation of alternatives is scored at 28.659 at the lowest point. Relatively, the gap analysis shows the priority (as of CMS ; IIP ; KPF ; CES ; EES ; RPS) to problem solving could be the most suggested further of next moves for business strategies.
A Hybrid MCDM Model on Technology Assessment to Business Strategy
55
Table 5 Performance Evaluation and Gaps Calculation Criteria \ Alternatives
A. Research and Development (RD) a1 a2 a3 a4
Local Global Aspired weight weight (ANP) Level 0.270
0.065 0.069 0.069 0.066
10.0 10.0 10.0 10.0
0.298 0.075 0.361 0.091 0.341 0.086
10.0 10.0 10.0
0.245 0.244 0.273 0.237
0.051 0.051 0.057 0.050
10.0 10.0 10.0 10.0
0.257 0.069 0.262 0.070 0.247 0.066 0.233 0.063 TOTAL 5.000 1.000 Performance Ranking for Alternative (Priority) Priority for Problem Solving
10.0 10.0 10.0 10.0 10.0
Technology Product Related Technologies Compatibility Reusable
B. Competitive Environment (CE) b1 Rivalry b2 Market b3 Suplier
0.242 0.257 0.254 0.246 0.251
C. Competence of Manufacturing (CM) 0.210 c1 c2 c3 c4
Capability Utilization Productivity Flexibility
D. Customer Service (CS) d1 d2 d3 d4
Quality Timing Delivery Adaptable
0.269
KPF
Gap
(0.147)
8.500
(0.076)
9.200 (0.080) 7.000 (0.300) 8.000 (0.200) 8.200 (0.180)
9.500 8.500 9.000 7.000
(0.050) (0.150) (0.100) (0.300)
7.053
(0.295)
7.384
8.000 8.400 4.800
(0.200) (0.160) (0.520)
6.000 7.000 9.000
7.762
(0.175)
7.145
(0.191)
IIP
8.083
7.200 7.800 8.200 7.800
7.200 6.400 8.200 6.800 30.043
Gap
IIP
(0.280) (0.220) (0.180) (0.220)
Gap
(0.114)
RPS
CMS
Gap
6.520
(0.225)
CMS
EES
Gap
7.943
(0.137)
EES
CES
Gap
7.296
(0.226)
CES
6.000 (0.400) 7.000 (0.300) 8.000 (0.200) 5.000 (0.500)
9.000 (0.100) 8.000 (0.200) 7.600 (0.240) 7.200 (0.280)
5.200 (0.480) 8.300 (0.170) 7.400 (0.260) 8.200 (0.180)
(0.262)
8.056
(0.194)
7.883
(0.212)
7.352
(0.265)
7.842
(0.216)
(0.400) (0.300) (0.100)
8.000 9.100 7.000
(0.200) (0.090) (0.300)
7.000 8.500 8.000
(0.300) (0.150) (0.200)
7.200 8.000 6.800
(0.280) (0.200) (0.320)
8.600 8.200 6.800
(0.140) (0.180) (0.320)
7.190
(0.199)
8.165
(0.129)
7.015
(0.244)
8.129
(0.116)
7.544
(0.175)
7.549
(0.219)
8.757
(0.098)
7.242
(0.223)
7.389
(0.177)
8.081
(0.134)
8.000 6.000 7.000 7.800
(0.200) (0.400) (0.300) (0.220)
9.000 (0.100) 6.700 (0.330) 8.000 (0.200) 9.000 (0.100)
6.000 (0.400) 5.000 (0.500) 8.000 (0.200) 9.000 (0.100)
8.400 (0.160) 8.800 (0.120) 7.400 (0.260) 8.000 (0.200)
7.000 (0.300) 8.000 (0.200) 7.400 (0.260) 7.800 (0.220)
(0.200) 9.200 (0.080) 5.000 (0.500) 7.200 (0.280) 7.400 (0.260) (0.100) 9.000 (0.100) 8.000 (0.200) 6.800 (0.320) 7.800 (0.220) (0.300) 8.000 (0.200) 9.000 (0.100) 8.200 (0.180) 8.600 (0.140) (0.400) 8.800 (0.120) 7.000 (0.300) 7.400 (0.260) 8.600 (0.140) 0.151 32.362 0.107 28.659 0.181 30.814 0.139 30.763 0.150
4 2
RPS
7.384
9.000 (0.100) 8.000 (0.200) 8.500 (0.150) 4.000 (0.600)
(0.280) 8.000 (0.360) 9.000 (0.180) 7.000 (0.320) 6.000 0.162 30.623
5
KPF
1 3
6 6
2 1
3 5
4
Based on the findings, CMS should be focused on improving on the way of communication skills (i.e. email, internet contact, satellite system,…,etc.) and more interactivities on human contact will be helpful; IIP should improve on the regulation of intellectual property protection, innovative environment, idea sharing and practice platform (innovation mechanism), and met expectations; and EES, CES, and KPF should improve product availability, market penetration, channel maintenance, feedback system, evaluation mechanism, monitor/auditing system, and knowledge sharing platform. These alternatives provide interaction or tradeoff on manipulation business strategies for foreign investment, it should improve their business models to achieve consumers’ needs, generate more repurchases, and devise the best marketing strategies for providing the most effective and efficient ways to meet their stage-goal.
5 Conclusions From the analysis result that we can recognize the practice of technology development activities still have large space to promote their business across country via each studied objective in the Taiwan industry. Meanwhile, the analysis of the potential high-tech market force and technology development advantage in Russia reveals the way of collaboration could be taken into businesses considerations from both Taiwanese and Russian. Technology, marketing and production are the basic elements for TA. Timing to take action gives the opportunities more close to the reality. Using the DEMATEL in conjunction with an ANP, determine the relative weights of specific criteria. The proposed model is suitable for dealing with any complicated and complex decision-making issues whose criteria are interdependent. The analysis result presents several aspects to see the opportunities toward to TA in between Taiwan and Russia. Some issues from this study in technology concerns may also be considered.
56
M.-C. Lo et al.
1. Via the telecommunication market and equipment analysis, LAN equipment production is the most attractive for investors in Russia. 2. Flexible and fast tool for efficient and prospective investment management skills in a large set of production tasks are required. 3. Middle and small innovation enterprises with government support will be more attractive at point of view of guarantee supplying. 4. For market opportunities, the particular economic zones with economical advantages arisen the big interest (such like in West Siberia - Tomsk and Novosibirsk).
References [1] Cetron, M.J., Connor, L.W.: A Method for Planning and Assessing Technology against Relevant National Goals in Developing Countries. In: Cetron, M.J., Bartocha, B. (eds.) The Methodology of Technology Assessment, Gordon and Breach, New York (1972) [2] Coates, J.J.: In: Porter, A.L., Rossini, F.A., Carpenter, S.R., Roper, A.T. (eds.) A Guidebook for Technology Assessment and Impact Analysis, North Holland, New York (1980) [3] Gabus, A., Fontela, E.: World problems, an invitation to further thought within the framework of DEMATEL battelle institute. Paper Presented at the Geneva Research Centre (1972) [4] Liou, J.H., Tzeng, G.H., Chang, H.C.: Airline safety measurement using a hybrid model. Journal of Air Transport Management 13(4), 243–249 (2007) [5] Lo, M.C., Michnik, J., Cheng, L.P.: Technology Assessment as A Guidance to Business Management of New Technologies. In: 2007 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM 2007), Singapore, December 2-5 (2007) [6] Lo, M.C.: A Preference Relation Model for Technology Assessment in Business Management. International Journal of Information Systems for Logistics and Management (IJISLM) 6(1), 81–86 (2010) [7] Ou Yang, Y.P., Shieh, H.M., Leu, J.D., Tzeng, G.H.: A novel hybrid MCDM model combined with DEMATEL and ANP with applications. International Journal of Operations Research 5(3), 160–168 (2008) [8] Pope, J., Annandale, D., Morrison-Saunders, A.: Conceptualizing sustainability assessment. Environmental Impact Assessment Review 24(6), 595–616 (2004) [9] Pretorius, M.W., de Wet, G.: A model for the assessment of new technology for the manufacturing enterprise. Technovation 20(1), 3–10 (2000) [10] Saaty, T.L.: The analytic hierarchy process. McGraw-Hill, New York (1980) [11] Schot, J., Rip, A.: The Past and Future of Constructive Technology Assessment. Technological Forecasting and Social Change 54(2-3), 251–268 (1997) [12] Smits, R., Leijten, J.: Technology Assessment, Waakhond of Speurhond?, Kerckebosch, Zeist, p. 264, p. 340 (1991) [13] Tzeng, G.H., Chiang, C.H., Li, C.W.: Evaluating intertwined effects in e-learning programs: A novel hybrid MCDM model based on factor analysis and DEMATEL. Expert Systems with Applications 32(4), 1028–1044 (2007) [14] Van Den Ende, J., Mulder, K., Karel, Knot, M., Moors, E., Vergragt, P.: Traditional and Modern Technology Assessment: Toward a Toolkit. Technological Forecasting and Social Change 58(1-2), 5–21 (1998) [15] Van Eijndhoven, J.C.M.: Technology Assessment: Product or Process? Technological Forecasting and Social Change 54(2), 269–286 (1997)
A Quantitative Model for Budget Allocation for Investment in Safety Measures Yuji Sato
*
Abstract. The objective of this study was to develop a quantitative model for budget allocation for investment in safety measures in a chemical plant, for determining the sustainability of the company. Developing the model in conjunction with decision-making on strategic investments for safety is complicated because of the subjective factors that enter into the inspection of chemical plants and the choice of appropriate safety measures. This study addressed this problem by applying the Analytic Hierarchy Process (AHP), showing how to quantify inherent risks within a chemical plant for the optimization of the budget allocation for investment in safety measures. A case study was carried out, which clarified the correlation between safety measures and the degree of risk reduction and guided how to allocate budget for safety measures. Keywords: budget allocation, investment in safety measures, percent complete.
1 Introduction With growing interest in global environmental issues, chemical companies need to take responsibility for reducing plant-based risks such as fires, explosions or leakages, given their potentially devastating human and environmental consequences. In addition to taking responsibility, companies need to adopt strategic investment in safety measures, which are inseparable from profit generation, due to the increasing focus on accountability to stakeholders. Developing a quantitative model for budget allocation for investment in safety measures is complicated, particularly in chemical companies, where the introduction of plant safety measures is critical to the success of risk management. In designing such a quantitative model for chemical plants, plant inspections by safety supervisors should form the core, taking into account intangible factors as well as objective plant data. In addition, safety managers must evaluate and choose safety measures in making decisions about strategic investments, which are usually costly and surrounded by uncertainty. Managers in chemical companies continually face such difficulties. Yuji Sato Graduate School of Policy Science, Mie Chukyo University, 1846, Kubo, Matsusaka, Mie, 515-8511 Japan *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 57–64. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
58
Y. Sato
The difficulties arise mainly from the subjective factors that enter into the inspection of chemical plants and the evaluation and choice of appropriate safety measures. First, the intangible factors include features such as smell, color and sound, which should be reflected in the output of the quantitative model but are rather difficult to quantify, often dependent on safety supervisors’ intuition after many years of experience. They would be able to refer to the quantitative data, such as the progression of number of accidents; those data, however, are not enough to capture the depth and richness of risk of the plant. Second, the evaluation and choice of appropriate safety measures is often beset by uncertainty over the likely effects of the different measures available, particularly given the rapidly changing technological environment. Thus, the design of the model and decision-making for budget allocation, relying heavily as it does on experience, knowledge, as well as intuition, means that the evaluation and choice of appropriate safety measures often lacks transparency and traceability. This paper aims to address this problem; dealing with intangible factors in risk assessment for chemical plants. Unlike previous studies, the quantifying scheme proposed for the intangible risk factors in chemical plants integrates subjective or qualitative information obtained from safety managers into a quantitative model for budget allocation. The quantification of risks is accomplished by an innovative decision support method, the Analytic Hierarchy Process (AHP), which is able to incorporate quantitative and qualitative judgments into evaluations.
2 Literature Review Given the need for necessary and sufficient risk assessment within chemical plants, a variety of approaches have been proposed to date. Six Sigma, a business management strategy first implemented by Motorola in the 1980s and one of the most popular methods for product quality control (Tennant, 2001), can be applied to risk assessment for chemical plants. This method aims to improve the quality of the production process by specifying and eliminating defects in the manufacturing process. Six Sigma employs a set of quality management methods that suit risk management (Snee, 1999), including a statistical analysis method and a quality control method. Even with the wide dissemination of Six Sigma, there have been increasing concerns about its implementation failures. One of the main reasons for the failures is the lack of an appropriate model as to how to effectively guide the implementation of the method (Satya, 2009). In the application of this method to risk management, therefore, how to quantify intangible factors in assessing risks is a critical issue. Failure Modes and Effects Analysis (FMEA), a procedure in operations management formerly introduced by the U.S. Military Forces in the late 1940s, is also applicable to risk management. Although FMEA was initially developed by the military, the methodology is now extensively employed in a wide variety of industries to examine potential failures in products, processes, designs and services. It is integrated into Advanced Product Quality Planning to provide primary risk
A Quantitative Model for Budget Allocation for Investment in Safety Measures
59
mitigation tools and timing in the prevention strategy—both in design and in process formats (Zigmund and Pavel, 2009). While FMEA is powerful tool for the elimination or reduction of failures, how to determine risk priorities of failure modes has been an important issue. Traditional FMEA determines the risk priorities of failure modes by precisely assessing risk factors in terms of the occurrence, severity and detection of each failure mode. This approach, however, is sometimes not feasible in actual application due to the subjectivity in rating the severity, occurrence and detection. Ying-Ming et al. (2007) have proposed treating the risk factors as fuzzy variables and evaluating them by using fuzzy linguistic terms and fuzzy ratings. Besides the subjectivity in judgment, FMEA does not determine the necessity for response, which may blind safety supervisors in risk assessment. All of these approaches build on the use of models and focus on how frequently failures (the cause of risks) occur, how serious their consequences are, and how easily they can be detected. Any model, however, is simplified and generalized, which means that the model has only a limited region of validity. It is the use of deficient models that actually poses the most serious threat to the validity of risk assessment (Björn, 2003). Even worse is the fact that the assumptions of modelling are often hard coded, which means that not only the validity of the model but also modelling flexibility and modularity are harmed (Zee, 2004). Furthermore, the number of studies on budget allocation for safety measures within chemical companies is limited because of inherent plant-based risks, which results in a lack of a “standard” scheme of optimization for the budget plan. Chemical companies surrounded by uncertainties such as unexpected loss of disasters and safety measures’ envisaged economic effects, therefore, have tried to develop “haute couture” budget optimization by, for instance, consulting professional analysts for safety. While from the perspective of the uncertainty of the results of the investments, such as potential benefits or unexpected costs of early stage manufacturing technology, a budget plan for safety measures is similar to that for new technologies in manufacturing, which serves as a useful reference for this issue. As Beaumont (1999) ascertained, the criteria that firms use to make investment decisions in manufacturing technology are: how firms manage the introduction of new technology; whether firms experience unanticipated effects from new technology; what factors impede or assist its implementation. Some of these factors cannot be clarified before the investments are implemented. Although decision makers on investments are not completely ignorant of what the future might be, investment decisions are made under conditions of uncertainty. Frank (1998) considered the nature and acceptable level of risk, together with management’s personal attitude to risk. O’Brien and Smith (1993) also noted that investments in advanced manufacturing systems must be made while taking into account factors that are difficult to predict. In their paper, how a decision process might be designed and managed was discussed, and the application of the AHP was proposed. Most previous studies, including Frank (1998) and O’Brien and Smith (1993), however, focused solely on the efficacy and efficiency of each safety measure in conjunction with cost minimization; sustainability of a company was not taken into account.
60
Y. Sato
3 Model In this section, a quantitative model for budget allocation for investment in safety measures in a chemical company is developed. The model reflects intangible risk factors while the plan considers global environmental protection issues as well as cost minimization. Although many risk assessments for chemical plants and approaches to budget allocation in risk management have been previously proposed, most have neglected intangible yet critical factors within risk assessment or just focused on cost minimization in risk reduction. Seen in this light, the approach proposed in this research is critical to the sustainability of the company. A quantitative model of risk assessment must include the intangible risk factors, and the outcome of the model must clarify how to allocate limited resources (e.g., people, goods and capital) to precautionary safety measures. The advantage of the AHP is that it is able to incorporate quantitative and qualitative judgments into assessment. It also makes the evaluation and choice of appropriate safety measures transparent and traceable, even though the design of the model relies heavily on not only experience and knowledge but also intuition. The quantification of the risks within chemical plants is undertaken using the following two steps, combining the breadth and depth of risks within the plants and incorporating quantitative and qualitative judgments in risk assessment. 1.
Identify and classify risks according to past experience Building on the chemical company’s existing accident response manual, an additional review of past accidents within the plants is undertaken. To support the process, interviews would be conducted with the organization maintaining statistics in this field, such as the Ministry of Economy, Trade and Industry in Japan.
2.
Develop a quantitative model of risk assessment reflecting safety supervisors’ perceptions A focus group with safety supervisors in the plants is held to capture the depth and richness of risk assessment within the plants. In this process, the issue of risks within the plants would be discussed with the professional analyst for safety. In capturing the safety supervisors’ perceptions, questionnaires including AHP-formatted questions would be used. This process facilitates clarification of the priority orders of different risks.
Let wD={wdi} and wM={wmij}, respectively denote the weight of dimensions and that of measures of a risk reduction plan elicited by safety supervisors by using the AHP, where di D and mij M show a set of∈dimensions and ∈ measures, respectively. Further, let P={pij} denote percent complete of each measure, which represents the achievement record of each safety measure in comparison with the initial estimate each year. Then the degree of risk reduction can be defined as an index representing how high the risk level of the plants would be one year later: degree of risk reduction = ∑ i ∑ j wdi wmij pij.
(1)
A Quantitative Model for Budget Allocation for Investment in Safety Measures
61
As shown in equation (1), the degree of risk reduction is calculated as a ratio of risk level, which is not based on the number of accidents within the plants. The rationale for the definition is the fact that the model studied in this paper considers the predicted risk level based on the evaluation of each safety measure and its expected effect on risk reduction. In equation (1), the weights of each dimension and measure, di and mij, were quantified through discussion among the focus group within the plants, reflecting the consensus of safety supervisors. Percent complete pij, on the other hand, was estimated by the highly experienced safety supervisors in charge of each section. As a result, the risk level of the plants one year later can be predicted as: 1 - (degree of risk reduction) = 1 - ∑ i ∑ j wdi wmij pij.
(2)
4 A Case Study in a Japanese Chemical Company This section demonstrates an application of the quantitative model of risk assessment proposed in this paper as used in a Japanese chemical company (whose name cannot be revealed due to a confidentiality agreement). Although the chemical company has tried to reduce the number of accidents, the current safety status of the plants is not satisfactory. Each division causes some type of accident, and facility error has become a major cause of accidents. What is worse is that the number of accidents increased 80% from 2007 to 2008, after they had succeeded in reducing the number of accidents in 2007. These situations require that the company takes preventive action immediately. To reduce risks, the company undertook case studies for shareholders, and consulted a professional analyst for the safety of chemical plants. A focus group with safety supervisors in the plants classified risks within the plants into three dimensions as follows: Equipment (hazardous object facilities, poisonous object facilities, utility facilities, construction); Human (education and training, unsafe actions, security); Regulation and others (compliance, design review, inspection system). Based on the classification, concrete measures for risk reduction were proposed as shown in Table 1, consisting of company-wide projects and activities within the plants, and the risk reduction scheme was developed based on the safety measures listed in Table 1. Table 1 Safety measures for risk reduction. Dimensions
Equipment
Human
Regulation
Measures Projects Emergency inspection Static electricity measure Tank preservation Failure Mode and Effect Analysis
Activities Electricity intentional preservation Environmental risk hedge activity Incinerator abolition
Superintendent arousal Security intensification
Natural calamity measure License institution OHSMS (Occupational Health and Safety Management System)
Equipment measurement system Fire code observance Inspection system establishment
Zero-emission activity 5S activity
62
Y. Sato
In developing the quantitative model for budget allocation for investment in safety measures for chemical plants, plant inspections by safety supervisors should form the core, taking into account intangible factors as well as objective plant data. Intangible factors such as smell, color and sound included in the inspection that should be reflected in the output of the model, however, are often difficult to quantify and often dependent on safety supervisors’ intuition based on many years of experience. In order to quantify these factors for the risk assessment, the AHP was applied in this study. Safety supervisors were required to answer a series of questions formatted by the AHP to derive the weights for the dimensions and measures. Omitting the details of the procedure here, they conducted pairwise comparisons of all possible combinations of dimensions, such as “Which dimension do you think is more important for the risk reduction of our plants, Equipment or Regulation?” Within the chemical plants, the safety supervisors were required to evaluate the degree of importance of dimensions wD and that of measures wM, first. Then, the safety supervisors in each section measured the percent complete of each measure from the year 2007 to 2008. Lastly, the total degree of risk reduction from the year 2007 (set 1) to 2008 was calculated. Table 2 summarizes the results obtained from the questionnaire formatted by the AHP. The numbers in the wD and wM columns represent the weights for measures normalized by the l1-norm within each dimension. In terms of the importance of dimensions, some supervisors emphasized the importance of Equipment, and others that of Human. In the aggregate, the safety supervisors weighted Human (wd2) most, 0.390. For the measures, tank preservation (wm13) ranked highest, 0.201, among the measures for Equipment; OHSMS (Occupational Health and Safety Management System) (wm25) ranked highest, 0.303, among those for Human; and fire code observance (wm32) ranked highest, 0.329, among those for Regulation. Thus, the degree of risk reduction was estimated to be 0.392, as of the end of 2008, according to the percent complete of each measure from 2007 to 2008. Based on these results obtained from the quantitative model, safety supervisors would be able to rationally allocate budget for safety measures. Being clarified the correlation between safety measures and the degree of risk reduction, safety supervisors may relate the investment amount for safety measures and the percent complete pij in this paper. Then the budget allocation could be optimized such as by linear programing or other optimization methods. In case LP would be applied to the optimization, cost minimization problem with objective function minimizing investment amounts and with budget constraints could be formulated. In the optimization, a budget allocation for investment entails not only minimizing costs for risk reduction but also integrating the economic, legal and social engineering perspectives, such as “Zero-emission activity” or “5S activity,” within the framework shown in Table 1.
㻌
Regulation
Human
Equipment
Dimensions
㻌 㻌 㻌
0.365
㻌 㻌 㻌
wD
㻌 㻌
㻌 㻌
sub total
wd3 : 0.245 㻌 㻌 㻌 㻌
㻌 㻌
sub total
wd2 : 0.390 㻌 㻌 㻌 㻌
㻌 㻌
sub total
wd1 : 㻌 㻌 㻌
㻌 㻌 㻌
wm31 wm32 wm33 wm34 wm35 㻌
Equipment management system Fire code observance Inspection system establishment Zero emission activity "5S" activity 㻌
㻌
wm21 : wm22 : wm23 : wm24 : wm25 : 㻌
Superintendent arousal Security intensification Natural calamity measure License institution OHSMS 㻌
0.20 0.00 0.20 0.00 0.40 㻌
0.50 0.00 0.30 0.30 0.90
0.95 0.69 0.50 0.30 0.20 0.40 0.10
P
0.0371
0.00960 0.000 0.00629 0.000 0.0213
0.181
0.0371 0.000 0.0188 0.0188 0.106
0.174
0.0560 0.0340 0.0366 0.0125 0.0139 0.0181 0.00272
wD* wM* P
0.608
: : : : :
: : : : :
: : : : : : :
0.392
㻌
p31 p32 p33 p34 p35
p21 p22 p23 p24 p25
p11 p12 p13 p14 p15 p16 p17
Estimated risk level 1 year later:
㻌
0.196 0.329 0.128 0.130 0.217
㻌
0.190 0.186 0.161 0.161 0.303
㻌
0.161 0.135 0.201 0.114 0.190 0.124 0.0745
wM
Degree of risk reduction:
: : : : : 㻌
wm11 : wm12 : wm13 : wm14 : wm15 : wm16 : wm17 : 㻌
Emergency inspection Static electricity measure Tank preservation FMEA Electricity intentional preservation Environmental risk hedge activity Incinerator abolition 㻌
Measures
Table 2 Risk assessment: a case within the chemical company.
A Quantitative Model for Budget Allocation for Investment in Safety Measures 63
64
Y. Sato
5 Concluding Remarks This paper focuses on a quantitative model of risk assessment for a chemical company, and the correlation between safety measures and the degree of risk reduction are clarified by applying the AHP. Inherent risks of the chemical plants are quantified based on both concrete measures for risk reduction and the consensus of safety supervisors in the plants. In addition, the degree of risk reduction is evaluated based on the importance and the percent complete of safety measures for risk reduction, which guides the allocation of budget for investment in safety measures. Two open-ended questions remain. First, how to approximate the percent complete of each safety measure for actual cases within chemical companies. The relationship between investment amounts and percent complete of each safety measure just might not lend itself to linear approximation. In such cases, the process of a safety measure needs to be segmented so as to formulate an LP problem. Second, how to establish a feedback system to refine the safety supervisors’ subjective judgments in risk assessment. Since the quantitative model of risk assessment developed in this paper is based on plant inspections by safety supervisors, improving the predictive accuracy and taking on broad intangible factors in addition to the objective plant data is crucial.
References Beaumont, N.B.: Investment decisions in Australian manufac-turing. Technovation 18(11), 689–695 (1999) Björn, W.: Models, modelling and modellers: an application to risk analysis. European Journal of Operational Research 75(3), 77–487 (2003) Frank, L.: Approaches to risk and uncertainty in the appraisal of new technology capital projects. International Journal of Production Economics 53(1), 21–33 (1998) O’Brien, C., Smith, S.J.E.: Design of the decision process for strategic investment in advanced manufacturing systems. International Journal of Production Economics 30-31, 309–322 (1993) Satya, S.C.: Six Sigma programs: An implementation model. International Journal of Production Economics 119(1), 1–16 (2009) Snee, R.D.: Why should statisticians pay attention to Six Sigma? Quality Progress, 100–103 (September 1999) Tennant, G.: SIX SIGMA: SPC and TQM in Manufacturing and Services. Gower Publishing, Co. Ltd, UK (2001) Vargas, L.G.: An overview of the analytic hierarchy process and its applications. European Journal of Operational Research 48, 2–8 (1990) Ying-Ming, W., Kwai-Sang, C., Gary, K.K.P., Jian-Bo, Y.: Risk evaluation in failure mode and effects analysis using fuzzy weighted geometric mean. Expert Systems with Applications 36(2-1), 1195–1207 (2007) van der Zee, D.J.: Modelling decision making and control in manufacturing simulation. International Journal of Production Eco-nomics 100(1), 155–167 (2004) Zigmund, B., Pavel, G.: Failure Analysis of FMEA. Advanced Logistics Development, Tel-Aviv, Israel (2009)
Adapted Queueing Algorithms for Process Chains ´ Agnes Bog´ardi-M´esz¨oly, Andr´as R¨ovid, and P´eter F¨oldesi
Abstract. Process chains are a common modeling paradigm for analysis and optimization of logistic processes, and are intensively used in many practical applications. The ProC/B toolset is a collection of software tools for modeling, analysis, validation and optimization of process chains. The ProC/B models can be translated into queueing networks or Petri nets, which can be solved by effective techniques and algorithms to evaluate performance metrics. The base queueing model with Mean-Value Analysis evaluation algorithm, and their adaptations for modeling thread pool and queue limit have been verified and validated for multi-tier software systems. The goal of our work is to adapt these models and algorithms for process chains to model parallel processes and queue limit.
1 Introduction Process chains are a common modeling paradigm for analysis and optimization of logistic processes [1], and are intensively used in many practical applications. Different useful approaches have been developed to model the behavior and performance of systems composed from many interacting components with known ´ Agnes Bog´ardi-M´esz¨oly Department of Automation and Applied Informatics, Budapest University of Technology and Economics, address: Magyar Tud´osok k¨or´utja 2. Q e´ p. 2. em., 1111 Budapest, Hungary e-mail:
[email protected] Andr´as R¨ovid ´ Institute of Intelligent Engineering Systems, Obuda University, address: B´ecsi u´ t 96/B, 1034 Budapest, Hungary e-mail:
[email protected] P´eter F¨oldesi Department of Logistics and Forwarding, Sz´echenyi Istv´an University, address: Egyetem t´er 1, 9026 Gy˝or, Hungary e-mail:
[email protected]
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 65–73. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
66
´ Bog´ardi-M´esz¨oly, A. R¨ovid, and P. F¨oldesi A.
characteristics. In case of logistic systems the components stand for processes related to the collection of steps in the supply chain. As an other example the multi-tier software systems can also be mentioned, where components are related to requests corresponding to certain tiers of the system. There is a demand for researching the ways how performance models of such systems can become more efficient as well as validated. In most of the cases these systems can advantageously be represented for instance by queueing networks or Petri nets, which support effective techniques and algorithms to determine their performance metrics. However, in logistic systems instead of the mentioned models mainly the process-chain based specification is applied, which cannot directly be utilized by techniques developed for queueing networks or Petri nets. When talking about process-chains the ProC/B formalism [2] can be mentioned as a modeling language designed to the needs of logistic networks, accompanied by the so called ProC/B tool, aimed for modeling, analysis, validation and optimization of process oriented systems [3, 4]. Furthermore, with the help of the ProC/B tool the mapping of ProC/B models onto queueing networks or Petri nets can also be performed [5, 6], such a way indirectly allowing the application of performance models and algorithms designed for them. The general scheme of mapping ProC/B models onto queueing networks is fairly natural. A standard queueing network is characterized by a set of queues and a set of routing chains as capturing system structure and behavior, respectively. The paper is organized as follows. Section 2 covers the background and related work. Section 3 provides and analyzes a novel algorithm to model parallel processes. Section 4 proposes and investigates an adapted model and algorithm with queue limit. Finally, Section 5 reports conclusions and future work.
2 Background and Related Work Queueing theory [7, 8] is one of the key analytical modeling techniques used for information system performance analysis [9]. Queueing networks and their extensions (such as queueing Petri nets [10]) have also been proposed to model web-based software systems[11, 12, 13]. This section discusses the base queueing network model and the Mean-Value Analysis evaluation algorithm for multi-tier software systems used in this paper as basis for modeling process chains. Definition 1. The base queueing model is defined for multi-tier information systems [13, 14], which are modeled as a network of M queues Q1 , ..., QM illustrated in Fig. 1. Each queue represents an application tier. Sm denotes the service time of a request at Qm (1 ≤ m ≤ M). A request can take multiple visits to each queue during its overall execution, thus, there are transitions from each queue to its successor and its predecessor, as well. Namely, a request from queue Qm either returns to Qm−1 with a certain probability pm , or proceeds to Qm+1 with the probability 1 − pm . There are only two exceptions: the last queue QM , where all the requests return to the previous
Adapted Queueing Algorithms for Process Chains
67
queue (pM = 1) and the first queue Q1 , where the transition to the preceding queue denotes the completion of a request. Internet workloads are usually session-based. The model can handle session-based workloads as an infinite server queueing system Q0 that feeds the network of queues and forms the closed queueing network depicted in Fig. 1. Each active session is in accordance with occupying one server in Q0 . The time spent at Q0 corresponds to the user think time Z.
Fig. 1 Modeling a multi-tier information system using a queueing network
A product form network should satisfy the conditions of job flow balance, one-step behavior, and device homogeneity [9]. The job flow balance assumption holds only in some observation periods, namely, it is a good approximation for long observation intervals since the ratio of unfinished jobs to completed jobs is small. The MVA algorithm for closed queueing networks [9, 15] iteratively computes the average response time and the throughput performance metrics. The model can be evaluated for a given number of concurrent sessions N. A session in the model corresponds to a customer in the evaluation algorithm. The algorithm uses visit numbers instead of transition probabilities, and visit numbers can be easily derived from transition probabilities. The algorithm introduces the customers into the queueing network one by one (1 ≤ n ≤ N). The cycle terminates when all the customers have been entered. The pseudo code of the MVA algorithm is as follows. Algorithm 1. Pseudo code of the MVA algorithm 1: for all m = 1 to M do 2: Lm = 0 3: for all n = 1 to N do 4: for all m = 1 to M do 5: Rm = Vm · Sm · (1 + Lm ) M
6:
R = ∑ Rm
7: 8: 9:
τ = n/(Z + R) for all m = 1 to M do L m = τ · Rm
m=1
68
´ Bog´ardi-M´esz¨oly, A. R¨ovid, and P. F¨oldesi A.
Definition 2. The Mean-Value Analysis (MVA) is defined by Algorithm 1, where the input parameters of the algorithm are the number of customers (N), the number of tiers (M), the average user think time (Z), the visit number (Vm ) and the average service time (Sm ) for Qm (1 ≤ m ≤ M). Moreover, the output parameters are the throughput (τ ), the response time (R), the response time for Qm (Rm ) and the average length of Qm (Lm ). The MVA algorithm for closed queueing networks is applicable only if the network is in product form. In addition, the queues are assumed to be either fixed-capacity service centers or infinite servers, and in both cases, exponentially distributed service times are assumed. Remark 1. Since the base queueing model satisfies the conditions above, the MVA algorithm can evaluate the base queueing model. Remark 2. The computational complexity of the MVA algorithm (Definition 2) is Θ (N · M), where N is the number of customers and M is the number of tiers. In this paper adapted models and algorithms for process chains have been introduced in order to model parallel processes and queue limit. The following adapted models and algorithms for modeling thread pool and queue limit have been verified and validated for multi-tier software systems [16, 18, 17]. The verification and validation for process chains are a subject of future work.
3 Adapted Algorithm for Parallel Processes The base queueing model (Definition 1) may be applied also for modeling process chains. The chain elements should be organized into tiers by maintaining the rule, that only elements of neighbouring tiers may communicate. Elements belonging to the same tier should have the same purpose. The MVA evaluation algorithm (Definition 2) can be adapted in order to model parallel processes, as well. Assume that the actual request contains sequential as well as parallel process elements. Parallel elements can be performed simultaneously with a sequential element. Definition 3. The adapted MVA for parallel processes (MVA-PP) is defined by Algorithm 2, where the index s is related to a sequential process element index and the index p corresponds to a parallel process element index from 1 ≤ m ≤ M. Proposition 1. The novel MVA-PP (Definition 3) can evaluate the base queueing model (Definition 1). Proof. Since the MVA evaluation algorithm (Definition 2) can evaluate the base queueing model (Definition 1) shown in Remark 1, and the extensions have not modified the original part of the algorithm, thus, only the extensions have to be proven. In Step 9 of Algorithm 2, the queue length Lm has to be modified to model parallel processes. In Steps 10 and 11 of the algorithm, since the queue length cannot be negative, if the obtained queue length Lm would be negative, it has to be zero.
Adapted Queueing Algorithms for Process Chains
69
Algorithm 2. Pseudo code of the MVA-PP 1: for all m = 1 to M do 2: Lm = 0 3: for all n = 1 to N do 4: for all m = 1 to M do 5: Rm = Vm · Sm · (1 + Lm ) M
6:
R = ∑ Rm m=1
7: 8: 9: 10: 11: 12: 13: 14: 15:
τ = n/(Z + R) for all p = 1 to M do if p is parallel then L p = τ · R p − SSps · Ls if L p < 0 then Lp = 0 for all s = 1 to M do if s is sequential then L s = τ · Rs
Proposition 2. The computational complexity of the novel MVA-PP (Definition 3) is Θ (N · M). Proof. Assume that each execution of the ith line takes time ci , where ci is a constant. The total running time is the sum of running times for each statement executed. A statement that takes ci time to execute and is executed n times contributes ci · n to the total running time. The worst-case running time of this novel MVA-PP can be seen in below. If N and M is finite, the computational time is finite, the algorithm is terminating. (c4 + c5 + c8 + c9 + c10 + c11 + c12 + c13 + c14 + c15 ) · N · M + (c3 + c4 + c6 + c7 + c8 + c13) · N +
(1) (2)
(c1 + c2 ) · M + (c1 + c3 )
(3)
Consider only the leading term of the formula, since the lower-order terms are relatively insignificant for large N and M. The constant coefficient of the leading term can be ignored, since constant factors are less significant than the order of growth in determining computational efficiency for large inputs. Since the order of growth of the best-case and worst-case running times is the same, the asymptotic lower and upper bounds are the same, thus, the computational complexity is Θ (N · M).
4 Adapted Algorithm for Queue Limit The base queueing model (Definition 1) and the MVA evaluation algorithm (Definition 2) can be adapted in order to model the queue limit. If the current requests
´ Bog´ardi-M´esz¨oly, A. R¨ovid, and P. F¨oldesi A.
70
exceed the queue limit, the next incoming requests will be rejected. In these cases, the queue length does not have to be updated. Definition 4. The adapted queueing model with queue limit (QM-QL) is defined by Fig. 2, where the Qdrop is an infinite server queueing system, the Zdrop is the time spent at Qdrop , the QL is the queue limit. If the QL is less than the queued requests M
sum ∑ Lm , the next requests proceed to Qdrop . Requests from Qdrop proceed back m=1
to Q0 , namely, these requests are reissued.
Fig. 2 Adapted queueing model with queue limit
Definition 5. The adapted MVA with queue limit (MVA-QL) is defined by Algorithm 3, where the Zdrop is the time spent at Qdrop , the QL is the queue limit. Proposition 3. The novel MVA-QL (Definition 5) can be applied as an approximation method to the proposed QM-QL (Definition 4). Proof. The QM-QL model does not satisfy the condition of job flow balance (see in Section 2). Thus, the MVA-QL evaluation algorithm can be applied as an approximation method to the QM-QL model. In Step 8 of Algorithm 3, when it computes the throughput, the Zdrop of the model is taken into consideration similarly to Z. In Steps 11 and 13 of the algorithm, if the M
QL is less than the queued requests sum ∑ Lm , the next requests proceed to Qdrop m=1
in the model, the queue length does not have to be updated in the algorithm.
Adapted Queueing Algorithms for Process Chains
71
Algorithm 3. Pseudo code of the MVA-QL 1: for all m = 1 to M do 2: Lm = 0 3: nql = 1 4: for all n = 1 to N do 5: for all m = 1 to M do 6: Rm = Vm · Sm · (1 + Lm ) M
7:
R = ∑ Rm m=1
8: 9: 10:
τ = nql/(Z + Zdrop + R) for all m = 1 to M do Lm = τ · R m
11:
if ∑ Lm > QL then
M
m=1
12: 13: 14: 15: 16: 17:
for all m = 1 to M do Lm = oldLm else nql = nql + 1 for all m = 1 to M do oldLm = Lm
Proposition 4. The computational complexity of the novel MVA-QL (Definition 5) is Θ (N · M). Proof. Assume that each execution of the ith line takes time ci , where ci is a constant. The total running time is the sum of running times for each statement executed. A statement that takes ci time to execute and is executed n times contributes ci · n to the total running time. The worst-case running time of this novel algorithm can be seen below. If N and M is finite, the computational time is finite, the algorithm is terminating. (c5 + c6 + c9 + c10 + c12 + c13 + c16 + c17) · N · M + (c4 + c5 + c7 + c8 + c9 + c11 + c12 + c16 ) · N +
(4) (5)
(c1 + c2 ) · M + (c1 + c3 + c4 )
(6) (7)
Consider only the leading term of the formula, since the lower-order terms are relatively insignificant for large N and M. The constant coefficient of the leading term can be ignored, since constant factors are less significant than the order of growth in determining computational efficiency for large inputs. Since the order of growth of the best-case and worst-case running times is the same, the asymptotic lower and upper bounds are the same, thus, the computational complexity is Θ (N · M). These adaptations do not increase the complexity of the evaluation algorithm, because the computational complexity of the original algorithm is the same.
72
´ Bog´ardi-M´esz¨oly, A. R¨ovid, and P. F¨oldesi A.
5 Conclusions and Future Work The ProC/B models of process chains can be mapped onto queueing networks or Petri nets, which can be solved by effective techniques and algorithms to evaluate performance metrics. In this paper novel models and algorithms for process chains have been proposed to model parallel processes and queue limit. It have been shown that the MVAPP can evaluate the base queueing model, and the MVA-QL can be applied as an approximation method to the QM-QL. The computational complexity of the adapted algorithms have been provided, as well. The adapted models and algorithms have been verified and validated for multitier software systems, the verification and validation for process chains are a subject of future work. Acknowledgements. This paper was supported by the J´anos Bolyai Research Scholarship of the Hungarian Academy of Sciences. This work is connected to the scientific program of the ”Development of quality-oriented and harmonized R+D+I strategy and functional model at BME” project. This project is supported by the New Hungary Development Plan (Project ´ ID: TAMOP-4.2.1/B-09/1/KMR-2010-0002).
References 1. Kuhn, A.: Prozessketten in der Logistik, Entwicklungstrends und Umsetzungsstrategien. Verlag Praxiswissen, Dortmund (1995) (in German) 2. Arns, M., Fischer, M., Tatlit¨urk, H., Tepper, C., V¨olker, M.: Modeling and Analysis Framework of Logistic Process Chains. In: Proc. of Joint Tool Session at PNPM/MMB/PAPM Conferences, Aachen, Germany, pp. 56–61 (2001) 3. Bause, F., Beilner, H., Fischer, M., Kemper, P., V¨olker, M.: The ProC/B toolset for the modelling and analysis of process chains. In: Field, T., Harrison, P.G., Bradley, J., Harder, U. (eds.) TOOLS 2002. LNCS, vol. 2324, p. 51. Springer, Heidelberg (2002) 4. Arns, M., Eickhoff, M., Fischer, M., Tepper, C., V¨olker, M.: New Features in the ProC/B toolset In: Tools of the 2003 Illinois International Multiconference on Measurement, Modelling, and Evaluation of Computer-Communication Systems, Universit¨at Dortmund, Fachbereich Informatik, Technical Report 781 (2003) 5. Arns, M., Fischer, M., Kemper, P., Tepper, C.: Supply Chain Modelling and Its Analytical Evaluation. Journal of the Operational Research Society 53, 885–894 (2002) 6. Bucholz, P., Tepper, C.: Functional Analysis of Process-Oriented Systems. In: Fleuren, H., den Hertog, D., Kort, P. (eds.) Operations research proceedings, pp. 127–135. Springer, Heidelberg (2004) 7. Kleinrock, L.: Theory, Queueing Systems, vol. 1. John Wiley and Sons, Chichester (1975) 8. Kleinrock, L.: Computer Applications, Queueing Systems, vol. 2. John Wiley and Sons, Chichester (1976) 9. Jain, R.: The Art of Computer Systems Performance Analysis. John Wiley and Sons, Chichester (1991)
Adapted Queueing Algorithms for Process Chains
73
10. Kounev, S., Buchmann, A.: Performance Modelling of Distributed E-Business Applications using Queuing Petri Nets. In: IEEE International Symposium on Performance Analysis of Systems and Software (2003) 11. Smith, C.U., Williams, L.G.: Building responsive and scalable web applications. In: Computer Measurement Group Conference, pp. 127–138 (2000) 12. Menasc´e, D.A., Almeida, V.: Capacity Planning for Web Services: Metrics, Models, and Methods. Prentice Hall PTR, Upper Saddle River (2001) 13. Urgaonkar, B.: Dynamic Resource Management in Internet Hosting Platforms, Dissertation, Massachusetts (2005) 14. Urgaonkar, B., Pacifici, G., Shenoy, P., Spreitzer, M., Tantawi, A.: An Analytical Model for Multi-tier Internet Services and its Applications. ACM SIGMETRICS Performance Evaluation Review 33(1), 291–302 (2005) 15. Reiser, M., Lavenberg, S.S.: Mean-Value Analysis of Closed Multichain Queuing Networks. Association for Computing Machinery 27, 313–322 (1980) ´ Improved Performance Models for Web-Based Software Systems, 16. Bog´ardi-M´esz¨oly, A.: Modeling Thread Pool and Queue Limit Performance Factors, p. 132. LAP LAMBERT Academic Publishing, Saarbrcken (2010) ´ R¨ovid, A., Levendovszky, T.: Performance Prediction of Web17. Bog´ardi-M´esz¨oly, A., Based Software Systems. In: Computational Intelligence in Engineering. SCI, vol. 313, pp. 323–336. Springer, Heidelberg (2010) ´ Levendovszky, T.: A Novel Algorithm for Performance Prediction 18. Bog´ardi-M´esz¨oly, A., of Web-Based Software Systems. Performance Evaluation 68(1), 45–57 (2011)
An Improved EMD Online Learning-Based Model for Gold Market Forecasting Shifei Zhou and Kin Keung Lai
*
Abstract. In this paper, an improved EMD (Empirical Mode Decomposition) online learning-based model for gold market forecasting is proposed. First, we adopt the EMD method to divide the time series data into different subsets. Second, a back-propagation neural network model (BPNN) is used to function as the prediction model in our system. We update the online learning rate of BPNN instantly as well as the weight matrix. Finally, a rating method is used to identify the most suitable BPNN model for further prediction. The experiment results show that our system has a good forecasting performance.
1 Introduction 1.1 Motivation Forecasting gold price becomes increasingly important. For a long history, the trading of gold in the international market is continuously active. The derivative of gold trading on the international gold market owns great variety. It mainly contains gold future, gold option, gold forward contracts, and so on[1]. Remarkably, since the price of gold varies within a limited range, this means gold is able to reduce the effect of inflation, control the rise of price and help to carry out constrictive monetary policy. Hence, gold becomes an important risk hedging tool as well as investment tool. Therefore, the works of predicting the price of gold market becomes very significant and important to investors. However, the current forecasting algorithm has poor precision performance for non-linear problem. We will mainly focus on the artificial neural network (ANN) forecasting algorithm. First, the traditional ANN prediction algorithms employ a Shifei Zhou · Kin Keung Lai Department of Management Sciences, Collage of Business, City University of Hong Kong, Kowloon, Hong Kong e-mail:
[email protected] *
Kin Keung Lai School of business and management, North China Electric Power University, Beijing, China e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 75–84. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
76
S. Zhou and K.K. Lai
global fixed learning rate to change the weight matrix. This will cause two problems. On one hand, if the learning rate is too small, the weight matrix will change very slowly. Then, the network training process will take a long time to converge[2].
Fig. 1(a) Algorithm Convergence of Small Learning Rate.
Fig. 1(b) Algorithm Convergence of Large Learning Rate.
On the other hand, if the learning rate is too large, we may miss the minimal optimal and cause the algorithm to diverge, like what has shown in Figure 1. Second, unlike the polynomial function, most of the non-linear problems have complex error surfaces. Hence, there will be many local minima. Accordingly, the training process may be trapped in such minima[3]. And the prediction precision will be affected.
1.2 Contributions In this paper, we propose an improved empirical mode decomposition model(IEMD) to forecast the trend and price of gold market. IEMD model can make efficient use of the history data to predict the future price. During the backpropagation neural network training process, we dynamically change the learning rate by using a global meta-learning rate. Our main contributions are listed as followings: (1) High Convergence Speed. When the trend of history data is downward, we will change the learning rate to increase the weight matrix in the same direction and speed up the decreasing speed of error function. Similarly, when the trend is upslope, the gradient will be increased. And the process will “climb the hill” of error function effectively. Both of methods will speed up the convergence speed of neural network training process. (2) High Prediction Precision. By calculating the gradient of the error function, we minimize the mean square error through changing the weight matrix of prediction model. The novel system we propose is able to update the learning rate of back-propagation neural network instantly, and capture the trend of data series. As a result, our system is able to give a high prediction precision for gold market.
An Improved EMD Online Learning-Based Model for Gold Market Forecasting
77
(3) Significant Application. Gold plays an important role on the international market. It obtains an increasing attention from investors. Not only it owns the function of risk hedging, but also it can be stored without depreciation. A precise prediction of price on the gold market will help investors to save money. This, therefore, shows the potential significance of our system.
2 Related Works 2.1 Empirical Mode Decomposition What is empirical mode decomposition? The empirical mode decomposition method was firstly proposed by Huang et al[4]. This time series data decomposition technique applies Hilbert transform to nonlinear and nonstationary time series data. The main idea of EMD is to decompose a time series into a sum of oscillatory functions. The EMD can make highly use of the history data. Since the EMD is able to divide the original time series into several subsets of data by using the intrinsic mode functions(IMF), the characteristics of different subsets of series can be identified through training different BPNNs. We can use these trained BPNNs to predict the future trend of time series and select one of the network model for future prediction.
2.2 Online Learning Algorithm What is online learning algorithm? In the online learning algorithm, the weight vectors of input data are updated immediately after the presentation of each data point[5]. Therefore, the online learning algorithm is able to adjust the weight of input data and capture the trend of time series instantly. Online learning algorithm has a number of advantages[5]. As the weight matrix is updated recursively, this algorithm can be use when there is no fixed training set and new data keeps coming in. Besides, as the local minima is a problem for gradient descent in nonlinear models, online learning can easily escape from local minima when a noise data is input. In this paper, we will use online learning algorithm to update the weight matrix. At the same time, we also update the learning rate to make sure the network model can be trained at a fast convergence speed.
3 Improved Empirical Mode Decomposition Model (IEMD) 3.1 IEMD Model Structure Yu et al has proposed an EMD-based neural network ensemble learning model [6, 7]. In this model, presented as EMD-FNN-ALNN for shot, they use forward neural network (FNN) as the prediction model. The time series data is divided into several subsets by using IMF[4]. Each subset is used as an input data to the FNN model for training. The FNN model is finally used to predict future outcome. All
78
S. Zhou and K.K. Lai
the predicted results will be assigned a weight and combined together by using an Adaptive Linear Neural Network (ALNN). This model is pretty good to use the history data. However, prediction results will be greatly affected by the weight of each FNN model. Besides, if one of the FNN models has made a precise prediction, the final prediction may be imprecise because all the predicted results are summed together. To overcome these problems, we propose an improved EMD online learningbased back-propagation neural network model.
Fig. 2 An Improve EMD Online Learning-based BPNN Model.
The structure of IEMD-BPNN-PMR model is shown in Fig 2.To make efficient use of the history data, we adopt the EMD method to partition the data into several subsets. Each subset of data is used to train the back-propagation neural network. The time series are decomposed according to the sifting procedure proposed by Yu at el[6]. This procedure is repeated until all the data are divided and each subset of data has only one local minimal or maximal. At the end of the sifting procedure, the data series x(t) can be expressed as follows: n
x(t ) = ∑ ci (t ) + rn (t )
(3.1)
i =1
Where n is the number of IMFs, namely number of data subsets, ci(t) is the ith IMF, and rn(t) is the final residue of the procedure. Thus, any time series data can be decomposed by using this EMD method. And the frequency components which are contained in each frequency band are different. They also change the variance of data series x(t), while rn(t) represent the trend of the data series x(t).
3.2 Back-Propagation Neural Network (BPNN) Back-propagation neural network is one type of artificial neural network[8], which is a class of typical intelligent learning paradigm and widely used in the field of data forecasting. In this paper, we use a three-layer neural network with errorback-propagation algorithm. And this network model is used as the predicting model of our system. There are many nodes in the hidden layer. Hence, there will be multiple combinations of the weights and data points.
An Improved EMD Online Learning-Based Model for Gold Market Forecasting
79
BPNN is able to provide a flexible mapping between inputs and outputs. In the work of Hornik[9], they had proved that a three-layer of feed-forward neural network with an identity transfer function in the output unit and logistic functions in the hidden layer can approximate any continuous function arbitrarily well. Therefore, in this paper, we use the three layer of BPNN as the forecasting model. Assume that yi represent the ith hidden node, then n
yi = ∑ wi xi + w0
(3.2)
i =1
where xi is the data input and wi is the corresponding weight. Meanwhile, yi is also called a net input, which will be used in the later calculation. After the hidden layer’s nodes are generated, we use the activation function yj=f(yi) to transform the net input to output, where j=1,…,m and m represents the number of nodes in hidden layer. By using this network model, we finally make the prediction for the time series.
3.3 Prediction Model Rating We have to ensemble the prediction results from different BPNN models. Thus, the prediction model rating (PMR) method is proposed to handle this problem. In this section, we will calculate the mean squared error E (MSE) of every output of the BPNN model.
Ej =
1 (t j 2
− y j)
2
(3.3)
Where tj is the target value of the time series; yj is the predicted outcome from the jth BPNN model; j ranges from 1 to m. Then, we compare all these Ej and find the minimal one, and increase the rate of jth model by 1. This procedure is repeated throughout the training process. Finally, we will get a BPNN prediction model with largest rate for prediction.
4 Improved Online Learning Algorithm 4.1 Online Weight Update In this section, an improved online learning algorithm is proposed. We will start with the mean squared error and make some mathematical methods to deduct the relationship between weight and MSE. First, we use the chain rule to decompose the gradient into two factors. Then, by using (3.3), we have the following result
1 ∂ (t j − y j ) 2 ∂y j = 2 = ( y j − t j ) f ' ( yi ) xi G= ∂wij ∂y j ∂wij ∂E j
(4.1)
80
S. Zhou and K.K. Lai
Where G represents the gradient of MSE. G is not necessary a positive value since f’(yi) may be a negative value. Here, we assume G is positive. Then, if there is an increment in weight wij, the error function will be accordingly increased. Therefore, this relationship helps us to modify the weight matrix to control the errors. Then, from Eqn (2.4) of chapter 2 in the works of Saad[5], we can calculate the increment of weight as follows.
Δwij = − μij
∂E j ∂wij
= μij (t j − y j ) f ' ( yi ) xi
(4.2)
where μij represents the online learning rate. We can see that the increment of weight will depend on the learning rate. If
μij is
large, a single change to the
weight will similarly large. The online weight update, that uses a local and time-varying learning rate for each weight, can be defined as follows.
wij (t + 1) = wij (t ) + μij (t )Δwij (t )
(4.3)
where Δwij (t ) is the increment of weigh.
4.2 Online Learning Rate Update As discussed above, the online learning algorithm owns the ability of escaping from the local minimum. The noise in the stochastic error surface is likely to bounce the network out of local minima as long as they are not too severe. When the weight matrix is modified, we simultaneously change the local learning rate by gradient descent method. In order to reduce error function, we will change the learning rate before changing the weights. Then, the gradient can be defined as follows.
∂E (t + 1) ∂E (t + 1) ∂wij (t + 1) = = ( y j − t j ) f ' ( yi ) xi Δwij (t ) ∂μij (t + 1) ∂wij (t + 1) ∂μij (t )
(4.4)
Then, we can define the online learning rate as follows.
μij (t) = μij (t −1) + λ(t j − yj ) f ' ( yi )xiΔwij (t)
(4.5)
λ is a global meta-learning rate and is fixed globally. Since the mean average error and weight matrix are changed instantly with the series data input, the local learning rate is also dynamically adapted according to the change of weight matrix and the result of transfer function. As a result, this dynamic learning rate is able to change the step of learning and improve the performance of algorithm convergence.
An Improved EMD Online Learning-Based Model for Gold Market Forecasting
81
5 Experiments 5.1 Evaluation Criteria In this part, three classes of measurement criteria including level prediction, convergence speed and directional forecasting are employed to evaluate the performance. Firstly, unlike the work in [10], we use the mean absolute error (MAE) [11] rather than the criteria of root mean squared error (RMSE)[10] because RMSE varies with the variability within the distribution of error magnitudes and MAE is a more natural measurement for the average error. Thus, we use MAE to compare the prediction precision of different forecasting model with our novel model.
MAE =
1 N
N
∑ yˆ − y i =1
i
i
Where N represents the total number of data points;
(5.1)
yˆi is the prediction result of
ith data point; yi is the target value of ith data point. The smaller the value of MAE, the more precise the prediction model will be. Secondly, we will compare our model with other prediction models on the aspect of convergence speed. Even though an algorithm can run with a high speed, this does not mean that the algorithm is a good one, because if the prediction result owns a large value of MAE, the algorithm will be valueless to investors. Therefore, we will consider the running time of the models together with the prediction precision. Finally, in order to measure the moving direction of time series accurately, we use a directional statistic (Dstat) [12], which is defined by the following equation.
1 N (5.2) ∑ ai ×100% N i =1 where ai = 1 if ( yˆ i +1 − yi )( yi +1 − yi ) > 0 , otherwise, ai = 0 . N represents the Dstat =
total number of testing samples. The larger this factor is, the more proper direction the forecasting model will have.
5.2 Experimental Results In this section, we use the daily gold future price data from database of Global Financial Data (GFD). The database provides the most extensive time series data on commodities which is available anywhere. It includes historical data on about 100 different commodities. In this paper, we use the data spanning from 2 Jan 3 2006 to 13 Jul 2009, totally 820 observations as training data set. And we use the rest of the time series spanning from 14 Jul 2009 to 14 Jul 2010 for the evaluation of the prediction performance, which contains totally 252 observations. The training data and testing data are input into our model for performance evaluation. The empirical results are the average performance measurement of the models because we
82
S. Zhou and K.K. Lai
compare the results by running different models on the data set several times. We finally count the results and get an average performance between all these models. In many literatures, researchers use the logistic sigmoid
f (u ) =
1 as ac1 + e− u
tivation function for the units in hidden layer. This function has a small asymmetric range from 0 to 1. Then, this logistic sigmoid function will worsen the condition of network. Accordingly, in our prediction model, we choose tanh as the activation function, which owns a symmetric range from -1 to 1 and is more suitable for activation function for hidden units. Since then we have
tanh ' (u ) = 1 − tanh(u )2 ,
f ' ( yi ) = 1 − yi2 .
Table 1 The Average MAE and Running Time Comparisons. Forecasting Model
MAE
Rank
Time(m)
Rank
FNN
4.27
5
94
1
BPNN
2.53
4
109
3
ARIMA
2.46
3
136
5
EMD-FNN-ALNN
1.78
2
124
4
IEMD-BPNN-PMR
0.92
1
102
2
From Table 1, the MAE values of different models are different. FNN has large MAE of 4.27 and rank the last. Our model, IEMD-BPNN-PMR, ranks at the first place with a MAE value of 0.92 and is followed by EMD-FNN-ALNN model, ARIMA model, BPNN model and FNN model. This sequence represents a descending order of prediction precision. At the aspect of running time, we can see that the FNN model spend less time to converge. It takes 94 minutes in average, which is only 16.6% of the total time of all models. Our model ranks at the second place and takes 102 minutes to converge. The FNN model is a simple type of neural network. For a three layer of FNN, the model consists of a linear combination of the input data and the weight matrix and an activation function to transfer the combined results. Hence, the FNN model will take least running time to converge. However, the precision of this model is also the worst. Conversely, in our model, we adopt a BPNN model as the prediction model. Although it takes more time to get converged, but the prediction results are more precise than the FNN model. Our model runs faster than the individual BPNN model. This is because we have employed an EMD method to decompose the input data, which ensures the network to get converged fast to some extent. Besides, the usage of improved online learning algorithm makes sure the weight matrix is changed instantly at each epoch. As a result, the training process has a high convergence speed.
An Improved EMD Online Learning-Based Model for Gold Market Forecasting
83
Table 2 The Dstat (%) Comparisons. Forecasting Model
Dstat (%)
Rank
FNN
54.67
4
BPNN
59.41
3
ARIMA
51.86
5
EMD-FNN-ALNN
61.58
2
IEMD-BPNN-PMR
75.24
1
The results of Table 2 show the comparisons of different models on the aspect of directional forecasting. Ranging at the first place is the IEMD-BPNN-PMR model, which is followed by EMD-FNN-ALNN model. The reason why our model can predict correctly the trend of data series most of the time is that we make some changes to the original BPNN model and use tanh as the activation function, which has a symmetric range from -1 to 1. This will help the activation function to capture the series trend flexibly and accurately.
6 Conclusions In this paper, we proposed an improved EMD online learning-based model for gold market forecasting. We firstly use the EMD method to partition the time series data into several subsets of data series, which will be used for training the neural network. Then, back-propagation neural network is employed to function as the prediction model in our novel system since BPNN is one of widely studied and used learning algorithms in the academy. By studying the essential process of BPNN mathematically, we give a deduction process to show that the online learning can be updated instantly. As the learning rate is online updated, the weight will also be updated after every data input. This will ensure the BPNN model to be able to escape from the local minimum of the error function and, therefore, will show a good performance of quick convergence speed. The experiment results show that our system has performed a good prediction results as well as running time. Although we apply the novel algorithm to the gold market and obtain some impressive results, there is still some improvement we can make. The global metalearning rate can be further studied to see whether there exists a relationship between MAE and data samples. Besides, our model can also be applied to the stock and warrant market. There will be more interesting and practical results.
References [1] Shafieea, S., Topalb, E.: An overview of global gold market and gold price forecasting. Resources Policy 35(3), 178–189 (2010) [2] Orr, G.B., Leen, T.K.: Using Curvature Information for Fast Stochastic Search. In: Neural Information Processing Systems, pp. 606–612 (1996)
84
S. Zhou and K.K. Lai
[3] Orr, G.B., Leen, T.K.: Learning in neural networks with local minima. Physical Review. A 46(8), 5221–5231 (1992) [4] Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis. Proceedings of the Royal Society A: Mathematical, Physical & Engineering Sciences 454, 903–995 (1998) [5] Saad, D. (ed.): On-Line Learning in Neural Networks. Cambridge University Press, New York (1999) [6] Yu, L., Wang, S., Lai, K. K.: An EMD-Based Neural Network Ensemble Learning Model for World Crude Oil Spot Price Forecasting. In: Soft Computing Applications in Business 2008, pp. 261-271 (2008) [7] Yu, L., Lai, K. K., Wang, S., He, K.: Oil Price Forecasting with an EMD-Based Multiscale Neural Network Learning Paradigm. In: International Conference on Computational Science, vol. (3), pp. 925–932 (2007) [8] White, H.: Connectionist nonparametric regression: multilayer feedforward networks can learn arbitrary mappings. Neural Networks 3, 535–549 (1990) [9] Hornik, K., Stinchocombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2, 359–366 (1989) [10] Bo, W., Wang, S., Lai, K.K.: A Hybrid ARCH-M and BP Neural Network Model For GSCI Futures Price Forecasting. In: International Conference on Computational Science, vol. (3), pp. 917–924 (2007) [11] Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research 30(3), 79–82 (2005) [12] Yu, L., Wang, S., Lai, K. K.: A novel nonlinear ensemble forecasting model incorporating GLAR and ANN for foreign exchange rates. Computers & OR 32, 2523–2541 (2005)
Applying Kansei Engineering to Decision Making in Fragrance Form Design Chun-Chun Wei*, Min-Yuan Ma, and Yang-Cheng Lin
*
Abstract. The decision making process is usually vague and hard to describe clearly, which is regarded as something of a black box. It is essential for companies or manufacturers to comprehend the consumer’s thinking or feeling. In order to help product designer best meet the consumer’s specific feeling and expectation, we conduct an experimental study on fragrances using the Kansei Engineering approach and the Quantification Theory Type I analysis. The result of the experimental analysis shows that the quantitative models and design support information can be used to find out the optimal combination of product form elements in terms of a set of given product images. This approach provides an effective mechanism for facilitating the new product design process, and can be applied to other consumer products with various design elements and product images.
1 Introduction The consumer plays a vital role to determine a product successful or not in the highly competitive market [2]. There are various factors or variables that influence the consumer’s purchase decision [5]. However, the decision making process is usually uncertain and hard to describe clearly, which is regarded as something of a black box [8]. Consequently, it is essential for companies or manufacturers to comprehend the consumer’s thinking or feeling [7]. For example, what does the consumer like? Why does the consumer not buy this product, but that one? Which factor is the high priority when the consumer purchases a product? Some studies have shown that the first impression of a product as perceived by the consumer is an important topic for manufacturing companies and product designers Chun-Chun Wei · Min-Yuan Ma Department of Industrial Design, National Cheng Kung University, Tainai, 701, Taiwan e-mail:
[email protected],
[email protected] *
Yang-Cheng Lin Department of Arts and Design, National Dong Hwa University, Hualien, 970, Taiwan e-mail:
[email protected] * Corresponding author. J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 85–94. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
86
C.-C. Wei, M.-Y. Ma, and Y.-C. Lin
[1]. The first impression and subjective experiences of the consumer are strongly influenced by the appearance of the product, called “visual aesthetics” [12]. The Apple products (e.g. iPod or iPhone) are a good example to illustrate the visual aesthetics has become a major factor in the consumer’s purchase decision [12]. Yamamoto and Lambert [14] also find that aesthetically pleasing properties have a positive influence on the consumer’s preference of a product and the decision process when the consumer purchases it [5]. In the field of product design, the “visual aesthetics” (or visual appearance) is usually concerned with “product form” [8]. The product form is defined as the collection of design features that the consumer will appreciate. Shieh and Yang have claimed a good product form design can grab the consumer’s attention and then evoke their pleasurable feelings or perceptions [11]. Furthermore, a product can enhance the consumer’s trustworthiness if it is made visually conspicuous [3]. The form design (visual appearance) of the fragrance (or perfume) product is a good example to illustrate the concept mentioned above. As such, we conduct an experimental study on the fragrance product. In order to comprehend the consumer’s psychological feelings, Kansei Engineering [10] is used in this study. Kansei Engineering is as an ergonomic methodology and design strategies for affective design to satisfy the consumer’s psychological feelings [10]. The word “Kansei” indicates the consumer’s psychological requirements or emotional feelings of a product. Kansei Engineering has been applied successfully in the field of product design to explore the relationship between the consumer’s feelings and product forms [4, 7, 8, 9, 11]. In this study, we present an experimental study with fragrances to describe how Kansei Engineering can be used to extract representative samples and form elements as numerical data sets required for the quantitative analysis (Quantification Theory Type I) [6], and then to build a decision making support model to help product designers meet the consumer’s desirable feelings.
2 Quantification Theory Type I The Quantification Theory Type I (QTTI) can be regarded as a method of qualitative and categorical multiple regression analysis method [6], which allows inclusion of independent variables that are categorical and qualitative in nature, such as product form elements and quantitative criterion variables within Kansei Engineering. The QTTI consists of the followings six steps [13]: Step 1: Define the Kansei relational model associated with the Kansei measurement scores of experimental samples with respect to an image word pair. Step 2: Calculate the standardized regression coefficients and the standardized constant in the model. Step 3: Determine the matrix CCR of correlation coefficient of all variables. Step 4: Calculate the multiple correlation coefficient R that is regarded as the relational degree of external criterion variable and explanatory variables. Step 5: Calculate the partial correlation coefficients (PCC) of design elements to clarify the relationships between product form elements and a product image. Step 6: Determine the statistical range of a categorical variable (product form element) by the difference between the maximum value and minimum value of the
Applying Kansei Engineering to Decision Making in Fragrance Form Design
87
category score. The range of the categorical variable indicates its contribution degree to the prediction model with respect to a given product image.
3 Experimental Procedures of Kansei Engineering We conduct an experimental study using the concept of Kansei Engineering in order to collect numerical data about the fragrances for the QTTI analysis.
3.1 Experimental Samples and Morphological Analysis In the experimental study, we investigate and categorize various world-famous fragrances. We first collect 46 fragrances and then classify them based on their similarity degree by a focus group [8] that is formed by several experts with at least two years’ experience of product design. The focus group eliminates some highly similar samples through discussions, and then selects 36 fragrance samples (as shown in Fig. 1) to perform the morphological analysis [7]. The morphological analysis, concerning the arrangement of objects and how they conform to create a whole of Gestalt, is used to explore all possible solutions in a complex problem regarding a product form [13]. The morphological analysis is used to extract the product form elements of the fragrance samples. The focus group is asked to decompose the fragrance samples into several dominant form elements and form types according to their knowledge and experience. Table 1 shows the result of the morphological analysis, with seven product form elements and 21 associated product form types being identified. The form type indicates the relationship between the outline elements. For example, the “The shape of bottle body (X3)” form element has four form types, including “Sphere”, “Cylinder”, “Cuboid”, and “Trapezoid”. A number of design alternatives can be generated by various combinations of morphological elements. According to the result of the morphological analysis, the fragrance sample can be coded by the value of 1, 2, 3, 4 or 5, if it has a particular design element type for each of its seven product form elements. Then the hierarchy cluster analysis is used to extract the 27 representative fragrance samples (asterisked in Fig. 1).
3.2 Emotional Feelings of Consumers In Kansei Engineering, emotion assessment experiments are usually performed to elicit the consumer’s psychological feelings or perceptions about a product using the semantic differential method [10]. Image words are often used to describe the consumer’s feelings of the product in terms of ergonomic and psychological estimation. With the identification of the form elements of the product, the relationship between the image words and the form elements can be established [8]. The procedure of extracting image words includes the followings four steps: Step 1: Collect a large set of image words from websites, magazines, and product catalogs. In this study, we collect 50 image words which are described the fragrance, e.g. sexy, rational, passionate, etc. Step 2: Evaluate collected image words using the semantic differential method.
88
C.-C. Wei, M.-Y. Ma, and Y.-C. Lin
Fig. 1 The 36 experimental fragrance samples (including the 27 representative fragrance samples, asterisked)
Applying Kansei Engineering to Decision Making in Fragrance Form Design
89
Table 1 The morphological analysis of fragrances
The transparency of bottle top (X1)
Type 1
Type 2
Type 3
Type 4
Type 5
Transparent
Opaque
Sphere
Pie
Cylinder
Cuboid
Irregular
Sphere
Cylinder
Cuboid
Trapezoid
Smooth
Textured
Transparent
Matte
Narrow
Wide
Connected the bottle
Independent bottleneck
The shape of bottle top (X2)
The shape of bottle body (X3)
The texture of bottle body (X4) The transparency of bottle body (X5) Width ratio of bottle body (X6)
Bottleneck (X7)
Opaque
No bottleneck
90
C.-C. Wei, M.-Y. Ma, and Y.-C. Lin
Step 3: Apply factor analysis and cluster analysis according to the result of semantic differential obtained at Step 2. Step 4: Determine five representative image words, including “Sexy–Pure (S-P)”, “Quiet–Energy (Q-E)”, “Masculine–Feminine (M-F)”, “Rational–Emotional (R-E)”, and “Vulgar– Elegant (V-E)”, based on the analyses performed at Step 3. To obtain the assessed values for the emotional feelings of 27 representative fragrance samples, a 7-scale (1-7) of the semantic differential method is used. 26 young female subjects (with ages ranging from 25 to 30) are asked to assess the form (look) of fragrance samples on an image word scale of 1 to 7. The last five columns of Table 2 show the five assessed image values of the 27 samples. For each selected fragrance sample in Table 2, the first column shows the fragrance sample number and Columns 2-8 show the corresponding type number for each of its seven product form elements, as given in Table 1. Table 2 provides a numerical data source for the quantitative analysis (Quantification Theory Type I), which can be used to develop a design making support model for the new design and development of fragrances. Table 2 Product image assessments of 27 representative fragrance samples No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
X1 2 2 2 2 2 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1
X2 2 2 1 2 2 2 1 2 2 3 2 3 3 3 3 3 3 3 3 3 4 5 4 5 3 3 5
X3 2 2 2 2 2 2 1 3 3 3 3 2 2 3 2 2 2 2 2 2 3 3 4 4 4 4 4
X4 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 2 2
X5 2 2 2 2 1 3 1 1 1 3 3 1 1 2 1 2 2 3 2 2 2 2 2 1 2 1 2
X6 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 1
X7 2 2 2 2 1 2 2 2 2 2 2 1 1 1 3 3 3 3 3 3 3 3 3 2 2 2 2
S-P 5.04 4.19 3.92 3.42 4.00 2.12 4.31 5.00 3.23 3.42 2.65 4.88 4.54 3.31 3.69 3.81 3.58 2.85 4.38 3.81 3.35 3.88 3.23 3.73 3.77 3.62 3.88
Q-E 3.08 4.12 3.23 3.96 3.62 4.27 4.38 4.35 3.73 3.81 3.19 3.85 4.31 3.42 4.00 4.04 3.50 3.73 3.50 4.19 4.19 3.77 4.73 3.69 3.42 4.38 3.77
M-F 4.04 4.54 5.62 2.58 4.38 2.69 5.35 5.12 3.54 1.81 1.85 4.19 3.73 3.62 4.92 4.12 2.38 5.54 4.46 3.96 3.92 3.65 4.62 4.54 4.15 5.08 3.65
R-E 3.19 4.15 5.35 2.77 3.65 3.19 5.77 5.04 3.27 1.77 2.00 3.50 3.15 3.50 5.04 4.27 2.54 5.65 4.08 3.85 4.38 3.12 5.12 4.42 4.46 5.12 3.12
V-E 5.04 4.73 5.19 4.23 4.88 5.12 4.96 5.31 4.50 3.27 4.15 3.04 3.96 4.31 5.19 4.50 3.42 5.35 2.96 3.38 4.54 4.12 5.15 5.04 3.42 5.08 4.12
Applying Kansei Engineering to Decision Making in Fragrance Form Design
91
4 The QTTI Analysis and Results We use the QTTI analysis to examine the relationship between the seven product form elements and five product images. In this paper, seven independent variables (i.e. the seven product form elements) and five dependent variables (i.e. “Sexy–Pure”, “Quiet–Energy”, “Masculine–Feminine”, “Rational–Emotional”, and “Vulgar– Elegant” product images) are used. The result of QTTI analysis is given in Table 3. In Table 3, the partial correlation coefficients indicate the relationship between the seven product form elements and each product image. For example, the highest variable of the partial correlation coefficient in the R-E image is the “The shape of bottle top” form element (X2= 0.74), meaning that “The shape of bottle top” primarily affects the “Rational–Emotional” image of the product, followed by the “Bottleneck” form element (X7= 0.55), “The transparency of bottle body” form element (X5= 0.54), and “The shape of bottle body” form element (X3= 0.53). This implies that the product designers should focus their attention more on these most influential elements, when the objective of designing a new fragrance is to achieve a desirable R-E image. Table 3 The result of QTTI analysis
X1
X2
X3
X4 X5 X6 X7
X11 X12 X21 X22 X23 X24 X25 X31 X32 X33 X34 X41 X42 X51 X52 X53 X61 X62 X71 X72 X73 Cons. R R2
-0.10 0.07 0.02 -0.60 -0.38 0.31 0.21 0.58 0.20 0.75 0.28 0.51 0.06 -1.02 -0.05 0.21 0.22 0.40 0.75 0.09 -1.21 -0.21 0.19 0.06 -0.18 0.38 0.39 -0.53 3.73 0.767 0.589
0.53 0.54 -0.09 -0.71 -0.25 0.69 0.05 1.07 0.22 0.43 0.14 0.45 -0.22 -0.17 -0.06 0.44 0.27 0.12 0.36 -0.04 -0.15 0.06 0.10 -0.02 -0.05 0.29 0.11 -0.15 3.91 0.851 0.724
-0.05 0.03 0.01 2.10 -0.03 0.67 -0.29 0.28 -0.33 -0.57 0.02 0.28 -0.21 0.36 -0.20 0.52 0.89 0.62 0.62 -0.18 -0.76 -0.32 0.22 0.09 -0.08 0.39 -0.23 0.39 4.01 0.816 0.666
0.44 0.25 -0.08 2.11 -0.09 0.74 -0.21 0.51 -0.65 -0.50 -0.11 0.53 -0.25 0.76 -0.18 0.50 0.77 0.49 0.54 -0.17 -0.52 -0.28 0.22 0.08 -0.33 0.55 -0.26 0.55 3.95 0.857 0.734
Form element grade
Form type grade
V-E Form element grade
Form type grade
R-E Form element grade
Form type grade
M-F Form element grade
Form type grade
Q-E (Partial correlation coefficient )
Form element grade
Form type grade (Category grade)
Form type
Form element
S-P
0.22 0.14 -0.04 1.03 0.27 0.69 -0.47 1.09 -0.25 -0.35 -0.05 0.27 0.24 -0.14 -0.08 0.30 0.37 0.57 0.60 -0.36 -0.01 -0.87 0.55 0.25 -0.53 0.36 0.07 0.13 4.37 0.778 0.605
92
C.-C. Wei, M.-Y. Ma, and Y.-C. Lin
In the last second row of Table 3, R means the correlation between the observed and predicted values of the dependent variable, and R2 is the square of this correlation. R2 ranges from 0 to 1. The category grade (form type grade) shown in Table 3 indicates the preference degree of the consumer’s emotional feelings on the each category of independent variables. If the grade is positive, the consumer’s emotional feeling leans towards the “right” image. On the contrary, the negative grade indicates that the consumer’s emotional feeling favors the “left” image. For example, the category grades of 3 selected values of “The transparency of bottle body (X5)” in the S-P image are 0.40, 0.09, and -1.21 respectively. The result shows that the consumer’s emotional feeling prefers the “Pure” image if “The transparency of bottle body (X5)” is “Transparent” or “Matte”, and feels the “Sexy” while “The transparency of bottle body (X5)” is “Opaque”. As the result of the QTTI analysis, Models (1), (2), (3), (4), and (5) indicate the relationship between product form elements and the given product images. We can use these five models to input the values of seven product form variables, and then output the prediction values of the five product images. S-P= 3.73-0.1X11+0.02X12-0.6X21-0.38X22+0.21X23+0.58X24+0.2X25+0.75X31 +0.28X32+0.06X33-1.02X34-0.05X41+0.22X42+0.4X51+0.09X52-1.21X53-0.21X61 (1) +0.06X62-0.18X71+0.39X72-0.53X73 Q-E= 3.91+0.53X11-0.09X12-0.71X21-0.25X22+0.05X23+1.07X24+0.22X25+0.43X31 +0.14X32-0.22X33-0.17X34-0.06X41+0.27X42+0.12X51-0.04X52-0.15X53+0.06X61 (2) -0.02X62-0.05X71+0.11X72-0.15X73 M-F= 4.01-0.05X11+0.01X12+2.1X21-0.03X22-0.29X23+0.28X24-0.33X25-0.57X31 +0.02X32-0.21X33+0.36X34-0.2X41+0.89X42+0.62X51-0.18X52-0.76X53-0.32X61 (3) +0.09X62-0.08X71-0.23X72+0.39X73 R-E= 3.95+0.44X11-0.08X12+2.11X21-0.09X22-0.21X23+0.51X24-0.65X25-0.5X31 -0.11X32-0.25X33+0.76X34-0.18X41+0.77X42+0.49X51-0.17X52-0.52X53-0.28X61 (4) +0.08X62-0.33X71-0.26X72+0.55X73 V-E= 4.37+0.22X11-0.04X12+1.03X21+0.27X22-0.47X23+1.09X24-0.25X25-0.35X31 -0.05X32+0.24X33-0.14X34-0.08X41+0.37X42+0.57X51-0.36X52-0.01X53-0.87X61 (5) +0.25X62-0.53X71+0.07X72+0.13X73 These models can also be used to examine the effect of the corresponding product images for a given combination of product forms. Consequently, the QTTI models enable us to build a fragrance design support database that can be generated by inputting each of all possible combinations (1440, 2×5×4×2×3×2×3) of product form elements to the QTTI models individually for generating the associated S-P, Q-E, M-F, R-E, and V-E image values. Table 4 shows the design support information for product designers to find out the optimal combination of product form elements in terms of a set of given product images. In addition, the design support database can be incorporated into a computer-aided design system to facilitate the product form in the new fragrance development process. To illustrate, Tables 5 shows the new fragrance form design with the optimal combination of product form elements for the desirable “Sexy” image.
Applying Kansei Engineering to Decision Making in Fragrance Form Design
93
Table 4 The design support information for the new fragrance design X1
X2
X3
X4
X5
X6
X7
The transparenc y of bottle body
Width ratio of bottle body
Bottleneck
The transparency of bottle top
The shape of bottle top
The shape of bottle body
The texture of bottle body
Sexy
Transparent
Sphere
Trapezoid
----
Opaque
Wide
Pure
----
Cuboid
Sphere
Textured
Transparent
----
Quiet
----
Sphere
Cuboid
----
Opaque
----
Energy
Transparent
Cuboid
Sphere
Textured
Transparent
Wide
Masculine
----
Irregular or Cylinder
Sphere
Smooth
Opaque
Wide
Feminine
----
Sphere
Trapezoid
Textured
Transparen t
----
Rational
----
Irregular
Sphere
Smooth
Opaque
Wide
Emotional
Transparent
Sphere or Cuboid
Trapezoid
Textured
Transparent
----
No bottleneck
Vulgar
----
Cylinder
Sphere
----
Matte
Wide
Connected the bottle
Elegant
Transparent
Cuboid or Sphere
Cuboid
Textured
Transparen t
----
No bottleneck
Product image
No bottleneck Independent bottleneck No bottleneck Independent bottleneck Independent bottleneck No bottleneck Connected the bottle or Independent bottleneck
(Note: The word in bold italic indicates that the “form type grade” is larger than the value of 0.5 as shown in Table 3; the “----” symbol means the grade is less than 0.1 and has a little influence). Table 5 The Optimal combination of product form elements for the “Sexy” image Form element X1
Form type
The transparency of bottle top
Transparent X2
The shape of bottle top
Sphere X3
The shape of bottle body
Trapezoid X5
The transparency of bottle body
Opaque X6
Width ratio of bottle body
Wide X7
Bottleneck
No bottleneck
New fragrance form design
94
C.-C. Wei, M.-Y. Ma, and Y.-C. Lin
5 Conclusion In this paper, we have conducted an experimental study on fragrances to demonstrate how Kansei Engineering based on the QTTI analysis can be used to assist product designers in decision making for the new product design. According to the result of QTTI analysis, five linear quantitative models have been built to explore the relationship between the consumer’s emotional feelings and product form elements. In addition, a fragrance design support database in conjunction with the computer-aided design system has been used to facilitate the product form design in the new fragrance development process. The result of the experimental study has also shown that the QTTI analysis is a promising technique to determine the optimal combination of product form for a particular design concept of product image.
References 1. Basso, A., Goldberg, D., Greenspan, S., Weimer, D.: First impressions: Emotional and cognitive factors underlying judgments of trust e-commerce. In: Proceedings of the 3rd ACM Conference on Electronic Commerce, Tampa, FL, USA, pp. 137–143 (2001) 2. Cross, N.: Engineering Design Methods: Strategies for Product Design. John Wiley and Sons, Chichester (2000) 3. Helander, M.G., Khalid, H.M.: Modeling the customer in electronic commerce. Applied Ergonomics 31, 609–619 (2000) 4. Huang, J.-S., Ma, M.-Y., Chen, C.-H.: Research on Predicting Models of Annoyance under the Operation of Digital Hi-tech Products. Design Studies 28, 39–58 (2008) 5. Kim, J.U., Kim, W.J., Park, S.C.: Consumer perceptions on web advertisements and motivation factors to purchase in the online shopping. Computers in Human Behavior 26, 1208–1222 (2010) 6. Komazawa, T., Hayashi, C.: In: de Dombal, F.T., Gremy, F. (eds.) A Statistical Method for Quantification of Categorical Data and its Applications to Medical Science, North-Holland Publishing Company, Amsterdam (1976) 7. Lai, H.-H., Lin, Y.-C., Yeh, C.-H., Wei, C.-H.: User Oriented Design for the Optimal Combination on Product Design. International Journal of Production Economics 100, 253–267 (2006) 8. Lin, Y.-C., Lai, H.-H., Yeh, C.-H.: Consumer-oriented product form design based on fuzzy logic: A case study of mobile phones. International Journal of Industrial Ergonomics 37, 531–543 (2007) 9. Ma, M.-Y., Chen, C.-Y., Wu, F.-G.: A Design Decision-making Support Model for Customized Product Color Combination. Computers in Industry 58, 504–518 (2007) 10. Nagamachi, M.: Kansei engineering: A new ergonomics consumer-oriented techology for product development. International Journal of Industrial Ergonomics 15, 3–10 (1995) 11. Shieh, M.-D., Yang, C.-C.: Classification model for product form design using fuzzy support vector machines. Computers and Industrial Engineering 55, 150–164 (2008) 12. Walker, G.H., Stanton, N.A., Jenkins, D.P., Salmon, P.M.: From telephones to iPhones: Applying systems thinking to networked, interoperable products. Applied Ergonomics 40, 206–215 (2009) 13. Wang, C.-C.: Development of an Integrated Strategy for Customer Requirement Oriented Product Design. Ph.D. Dissertation, Department of Industrial Design, National Cheng Kung University, Tainan, Taiwan (2008) 14. Yamamoto, M., Lambert, D.R.: The impact of product aesthetics on the evaluation of industrial products. Journal of Product Innovation Management 11, 309–324 (1994)
Biomass Estimation for an Anaerobic Bioprocess Using Interval Observer Elena M. Bunciu
*
Abstract. This work deals with the analysis of an anaerobic digestion process model and its biomass estimation using an interval observer. In this paper a two step mass balance model is presented. Due the fact that we don’t know the influent substrate, as quantity or shape, but we know its upper and lower limit, we can estimate the quantity of biomass only by using an interval observer. As we know, this two step model has incorporated electrochemical equilibrium, for including the alkalinity that helps to a better control strategy of the plant in which the reactions are taken place. Keywords: Interval observer, anaerobic process, acidogenesis-methanization.
1 Introduction In the last few years the interest for biological systems has increased. Due to the fact that biological systems are dealing with living organisms, these are not yet perfectly described by the physical laws. This is one strong motive for using robust methods for the control of this type of systems and for the estimation of the variables that are not measured [1]. Controlling this kind of systems is a very difficult problem because it is necessary to deal with highly nonlinear systems described by poor quality models. In [2] a more and more popular bioprocess is presented, which treats wastewater and produces energy through methane (CH4) and hydrogen (H2) under specific conditions that will be used in the present paper. In biological processes, the lack of cheap and reliable on-line sensors does not permit real-time monitoring and on-line measurements of biological process variables [3]. However, if the sensors are available, they are providing measurements at a relatively low sampling rate. Many state estimation techniques required the knowledge of the main biological phenomena, but the initial conditions, inputs, model parameters and measurements are only known to a certain level. Taking all these uncertainties into consideration, interval observers have been introduced, which are based on positive differential systems and offer a way to deal with uncertainty in the system, when known bounds of the uncertain terms are available [4]. Elena M. Bunciu Department of Automatic Control, University of Craiova, A.I. Cuza 13, Craiova, Romania e-mail:
[email protected] *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 95–102. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
96
E.M. Bunciu
Interval observers provide state limits that will be estimated: the upper bound of the state vector provided by the upper observer, and a lower bound determined by the lower observer [4]. The paper is organized as follows: section 2 is dedicated to process modelling of the system formed by two reactions, in Section 3 we have presented some information about the interval observer for the considered system; simulation results presented in Section 4 illustrates the performance of the described algorithm and, finally, Section 5 concludes the paper.
2 Process Modelling For the anaerobic digestion bioprocess there are numerous dynamical models. In [5] the basic ones are presented, where we considered only one biomass. In time, the model for anaerobic digestion bioprocess has taken more detailed forms by including several bacterial populations and several substrates. For many types of models the IWA Anaerobic Digestion Model 1 [6] is the most used in simulating a digestion plant. This model is an excessive complex one that makes its mathematical analysis an advanced one [7]. In [7] we described a simplified macroscopic model of the anaerobic process based on 2 main reactions presented in [8]. The model that we want to consider in this paper is described by the following reactions [7]: 1 (S1 ) X 1 k1 S1 ⎯μ⎯ ⎯⎯→ X 1 + k 2 S 2 (1) μ 2 (S 2 ) X 2 k 3 S 2 ⎯⎯ ⎯⎯→ X 2 + k 4 CH 4 The first reaction describes acidogenesis, and the second reaction describes methanogenesis. This model is a very simple one, because only two main bacterial populations are considered: acidogenetic and methanogenic bacteria population [10]. In equation (1) the following symbols appear:
X1 – biomass concentration of acidogenetic bacteria [g/l] S1 – organic substrate concentration [g/l] X2 – biomass concentration of methanogenetic bacteria [g/l] S2 – total concentration of methanogenetic bacteria [g/l] µ 1, μ2 – growth rate for acidogenetic bacteria and methanogenetic bacteria [h-1] kj – yield coefficients [g/g] j = 1,4
In this process the organic substrate (S1) is degraded into volatile fatty acids (VFA, denoted S2) by acidogenetic bacteria (X1), and then the S2 is degraded into methane CH4 and CO2 by methanogenetic bacteria (X2) [7]. Both bacterial growth rates are described by the equation (2) [7].
μ1 = μ1* μ 2 = μ 2*
S1 S1 + k s1 S2 S 2 + k s 2 + S 22 k i
(2)
Biomass Estimation for an Anaerobic Bioprocess Using Interval Observer
97
The maximum specific growth rate for each bacteria is noted with μ1* , μ 2* ; k s1 ,k s 2 is a notation for half saturation constant associated with S1 , S 2 , and with ki is noted the inhibition constant [10]. The dynamic model for the anaerobic digestion bioprocess is presented in equations (3) and (4) [7]:
dX 1 = μ1 ( S1 ) X 1 − αDX 1 dt dS1 = −k1 μ1 ( S1 ) X 1 − D(S1 − S1in ) dt dX 2 = μ 2 ( S 2 ) X 2 − αDX 2 dt dS 2 = −k 3 μ 2 ( S 2 ) X 2 + k 2 μ1 ( S1 ) X 1 − D(S 2 − S 2in ) dt In equations (3) and (4) the following symbols appear:
(3)
(4)
D – dilution rate [h-1] S1in – concentration of influent organic substrate [g/l] S2in – concentration of influent volatile fatty acids [g/l] α – the fraction of the biomass which is not retained in the digester [dimensionless] In this paper, we consider that the biomass isn’t affected by the dilution rate it means that we consider α = 1 . This value corresponds to an ideal fixed bed reactor where the biomass is attached on a support [9].
3 Interval Observer As it is mentioned in section one, the lack of reliable and cheap sensors makes controlling the system very difficult. This difficulty is due to the fact that this lack of sensors has two important consequences: the modelling of the growth rate expressions and the identification of their parameters is extremely hard to do and the observation and control techniques cannot be used without additional assumption because this lack of sensors is leading to non-detectable systems [9]. These disadvantages can be ignored, if it is possible to suppose that the lower and upper limit are known for the unknown inputs and exist the possibility to design a system formed by two observers, with the objective of guaranteeing the limits of the unmeasured variables. This kind of observers is called "interval observers" [9]. This type of observers was used for the first time for a class of nonlinear systems with the proposed robust observation and control of these kinds of systems [9]. To estimate the biomasses concentration, denoted with X 1 and X 2 in the equations (3) and (4) using any types of observers, it is necessary to have the measure of influent substrates. Supposing that those measures are not available, but in the same time we consider that for influent substrates some upper and lower
98
E.M. Bunciu
limits are available, we can estimate the two biomasses concentrations X 1 and X 2 using interval observers. Therefore in the following we consider that the conditions S1−in ≤ S1in ≤ S1+in and S 2−in ≤ S1in ≤ S 2+in are fulfilled, where with “–” we denote the lower limit and “+” it is an upper limit notation. Now, knowing this, it is possible to synthesize four observers, each of them used to estimate one limit for the two biomass concentrations presented in the mathematical model in the equations (3) and (4): dzˆ 1+ dt dzˆ 1− dt dzˆ +2 dt dzˆ −2
(
= − D zˆ 1+ − S1+in
(
= − D zˆ 1− − S1−in
) (5)
)
(
= − D zˆ +2 − k 2 S1+in − k1S 2+in
(
−
k 2 S1−in
k1S 2−in
) (6)
)
= − D zˆ 2 − − dt where z1 and z2 are two auxiliary variables assumed to be in the following form: z1 = k1 X 1 + S1
(7)
z 2 = k1k 3 X 2 + k1S 2 + k 2 S1
Because S1 and S 2 depend of the variation of S1in and S 2in respectively, these are not measurable, the auxiliary variables z1 and z2 are represented by their limits. From the equations (5) and (6), considering the form from equation (7), we obtain the following form for the biomass estimation:
(
)
1 + Xˆ 1+ = zˆ1 − S1 , k1
(
1 − Xˆ 1− = zˆ1 − S1 k1
(
)
(
)
1 Xˆ 2+ = zˆ1+ − k 2 S1 − k1S 2 k1k3 1 Xˆ 2− = zˆ1− − k 2 S1 − k1S 2 k1k 3
)
(8)
(9)
4 Simulation Results and Comments The interval observer presented in Section 3 is used to estimate the upper and lower bounds of two cell populations in the case of an anaerobic digestion process (3), (4). The influent substrate shapes for both S1in and S 2in are not known for this process, but the upper and lower limits are. Therefore, we can estimate the values of the two bacterial populations only by using an interval observer given by (8) and (9). The simulations have been made using the model composed by the equations (3) and (4) with the parameter values from [11], considering that the process is done in 100 hours and that the input concentrations are in a guaranteed interval.
Biomass Estimation for an Anaerobic Bioprocess Using Interval Observer
99
The values of the kinetic parameters used in simulations are [11]: μ1* = 0.2 , μ 2* = 0.35 , ks1 = 0.5 , ks2 = 4 , ki = 21 , X1(0) = 0.7 , S1 (0) = 0.1 , X 2 (0) = 0.2 , S2 (0) = 1 . The yield coefficients for the proposed model are [11]: k1 = 3.2 , k2 = 1.035,
k3 = 16.7 . Using an interval observer we have only the upper and lower limit for both influent substrates. The four known limits are: S1+in = 2.32 , S1−in = 2.2 , S 2+in = 2.6 , S 2−in = 2 . The graphics resulted from the simulation are: biomass concentration using interval observer 0.75 real value superior limit inferior limit
0.7
X1 [g/l]
0.65
0.6
0.55
0.5
0
20
40
60
80
100
Time [h]
Fig. 1 Estimation of acidogenetic bacteria population using an interval observer biomass concentration using interval observer 1.4 1.2 real value superior limit inferior limit
X2 [g/l]
1 0.8 0.6 0.4 0.2 0
0
20
40
60
80
100
Time [h]
Fig. 2 Estimation of methanogenic bacteria population using an interval observer
100
E.M. Bunciu
The forms of S1in and S 2in are presented in Fig. 3: Influent substrat concentration 2.32 2.3
Sin1 [g/l]
2.28 2.26 2.24 2.22 2.2 real value 2.18
0
20
inferior limit 40
superior limit
60
80
100
time [h] Influent substrat concentration 2.6 2.5
Sin1 [g/l]
2.4 2.3 2.2 2.1 2 real value 0
20
inferior limit 40
superior limit
60
80
100
time [h]
Fig. 3 Influent substrate concentration and its limits
5 Conclusion In this paper, we have estimated the biomass concentration for two main bacterial populations from a system described by the dynamics of anaerobic digestion process. The system’s behaviour was analyzed, considering that the real forms of the influent substrates are represented by the following equations: S1in (t ) = S1in 0 ⋅ (1 + 0.02 sin(π t / 15)) + n1 S 2in (t ) = S 2in 0 ⋅ (1 − 0.1cos(π t / 10)) + n2
(10)
Biomass Estimation for an Anaerobic Bioprocess Using Interval Observer
101
In equation (10) the values for the unknown parameters are: S1in0 = 2.3 and S2in0 = 2.1 . These real forms, drawn with blue in figure 3, are composed from a sinusoidal wave over which is superimposed over a noise signal, denoted with n1 and n2 in the two equations from (10). To estimate the unknown parameters we can use neural networks with radial basis functions (RBF), or dynamic neural networks, as we can see in [12], where a process similar with the process chosen by us in this paper is described. In figure 1 and 2 we can observe that the real shape of the biomass concentrations is between the two limits determined with the help of the interval observer. Changing the initial conditions, we can bring closer both estimated limits of the interval of the real value. Acknowledgments. This work was partially supported by the strategic grant POSDRU/88/1.5/S/50783, Project ID50783 (2009), co-financed by the European Social Fund – Investing in People, within the Sector Operational Programme for Human Resources Development 2007 – 2013.
References 1. Moisan, M., Bernard, O., Gouze, J.-L.: A high/low gain bundle of observers: application to the input estimation of abioreactor model. In: Proceedings of the 17th World Congress The International Federation of Automatic Control, Seoul, Korea, pp. 15547–15552 (2008) 2. Angelidaki, I., Ellegaard, L., Ahring, B.K.: Applications of the anaerobic digestion process. In: Biomethanation II, pp. 1–33. Springer, Heidelberg (2003) 3. Goffaux, G., Vande Wouwer, A.: Bioprocess State Estimation: Some Classical and Less Classical Approaches. In: Control and Observer Design for Nonlinear Finite and Infinite Dimensional Systems, pp. 111–128. Springer, Heidelberg (2005) 4. Moisan, M., Bernard, O.: Interval observers for non monotone systems. application to bioprocess models. In: Proc. of the 16th IFAC World Conf., Prague, Czech Republic (2005) 5. Andrews, J.F.: A Mathematical Model for the Continous Culture of Microorganisms Utilizing Inhibitory Substrates. Biotechnology and Bioengineering 10, 707–723 (1968) 6. Batstone, D.J., Keller, J., Angelidaki, I., Kalyuzhnyi, S.V., Pavlostathis, S.G., Rozzi, A., Sanders, W.T.M., Siegrist, H., Vavilin, V.A.: Anaerobic Digestion Model No. 1 (ADM1). Scientific and Technical Report, vol. 13. IWA Publishing (2002) 7. Hess, J., Bernard, O.: Design and study of a risk management criterion for an unstable anaerobic wastewater treatment process. Elsevier, Amsterdam (2007) 8. Bernard, O., Le Dantec, B., Chachuat, B., Steyer, J.-P., Lardon, L., Lambert, S., Ratini, P., Lema, J., Ruiz, G., Rodriguez, J., Vanrolleghem, P., Zaher, U., De Pauw, D., De Neve, K., Lievens, K., Dochain, D., Schoefs, O., Farina, R., Alcaraz- Gonzalez, V., Gonzlez-Alvarez, V., Lemaire, P., Martinez, J.A., Duclaud, O., Lavigne, J.F.: An integrated system to remote monitor and control anaerobic wastewater treatment plants through the internet. Water Science and Technology 52, 457–464 (2005) 9. Gonzales, V.A.: Estimation et commande robust non-lineaires des procedes biologiques de depollution des eaux uses. Application a la digestion anaerobie. These de l’Université de Perpignan (2001)
102
E.M. Bunciu
10. Bernard, O., Sadok, Z., Dochain, D.: Dynamical model development and parameter identidication for an anaerobic wastewater treatment process 11. Petre, E.: Nonliniar automated systems. Applications in Bioengineering. Universitaria Craiova (2008) 12. Petre, E., Selisteanu, D., Sendrescu, D., Ionete, C.: Neural networks based adaptive control for a class of nonlinear bioprocesses. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part II. LNCS (LNAI), vol. 5178, Springer, Heidelberg (2008)
Building Multi-Attribute Decision Model Based on Kansei Information in Environment with Hybrid Uncertainty Junzo Watada and Nureize Arbaiy
Abstract. The objective of this paper is to build multi attribute decision model considering Kansei information in hybrid uncertain environment. First, fuzzy random variable is explained to deal with the models in hybrid uncertain environment. Second, using fuzzy random variables, linear regression model (FRRM) is formulated. Third, multi-attribute decision model (MADM) is built based on linear regression model. Finally, multi-attribute decision model is presented in presence of Kansei information given by experts in an environment with hybrid uncertainty involving both randomness and fuzziness.
1 Introduction The generalization of uncertainty theory was presented by Zadeh [20, 21], where granularity and generalized constraints form the crux of the way in which uncertainty is being handled. Fuzzy random variables serve as basic tools for modeling optimization problems with such two-fold uncertainty. The concept of fuzzy random variable was introduced by Kwakernaak [2, 3], who defined these variables as a measurable function linking a probability space with a collection of fuzzy numbers. To deal with fuzzy random variables, a series of optimization models have been proposed models [13], and fuzzy random renewal processes [12, 22]. The objective of this paper is to build multi-attribute decision model considering Kansei information in hybrid uncertain environment. First, fuzzy random variable is explained to deal with the models in hybrid uncertain environment. Second, using fuzzy random variables, linear regression model (FRRM) is formulated. Third, multi-attribute decision model (MADM) is built based on linear regression model. Junzo Watada · Nureise Arbaiy Graduate School of Informtion, Production and System, Waseda University, 2-7 Hibikino, Wakamatsu, Kitakyushu 808-0135, Fukuoka, Japan e-mail:
[email protected],
[email protected]
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 103–112. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
104
J. Watada and N. Arbaiy
Finally, multi-attribute decision model is built using Kansei information given by experts in an environment with hybrid uncertainty involving both randomness and fuzziness. The remaining of this paper consists of the following. In Section 2, the fuzzy random variables are explained to build three models under fuzzy random environment including linear regression model (FRRM), multi-attribute evaluation model (MADM) and are briefly illustrated in Sections 3.
2 Fuzzy Random Variables Given some universe Γ , let Pos be a possibility measure defined on the power set P(Γ ) of Γ . Let ℜ be the set of real numbers. A function Y : Γ → ℜ is said to be a fuzzy variable defined on Γ (see [6]). The possibility distribution μY of Y is defined by μY (t) = Pos{Y = t}, t ∈ ℜ, which is the possibility of event {Y = t}. For fuzzy variable Y with possibility distribution μY , the possibility, necessity and credibility of event {Y ≤ r} are given, as follows Pos{Y ≤ r} = sup μY (t), t≤r
Nec{Y ≤ r} = 1 − sup μY (t), t>r 1 Cr{Y ≤ r} = 1 + sup μY (t) − sup μY (t) . 2 t>r t≤r
(1)
From (1), we note that the credibility measure is an average of the possibility and the necessity measure, i.e., Cr{·} = (Pos{·} + Nec{·})/2, and it is a self-dual set function (see [5]), i.e., Cr{A} = 1 − Cr{Ac } for any A in P(Γ ). The motivation behind the introduction of the credibility measure is to develop a certain measure which is a sound aggregate of the two extreme cases such as the possibility (expressing a level of overlap and being highly optimistic in this sense) and necessity (articulating a degree of inclusion and being pessimistic in its nature). Based on credibility measure, the expected value of a fuzzy variable is presented as follows. Definition 1 ([5]). Let Y be a fuzzy variable. The expected value of Y is defined as E[Y ] =
∞ 0
Cr{Y ≥ r}dr −
0 −∞
Cr{Y ≤ r}dr
(2)
provided that the two integrals are finite. What follows is the definitions of fuzzy random variable and its expected value and variance operators. For more theoretical results on fuzzy random variables, one may refer to Gil et al. [1], Liu and Liu [4], and Wang and Watada [15],[14].
Building Multi-Attribute Decision Model Based on Kansei Information
105
Definition 2. Suppose that (Ω , Σ , Pr) is a probability space, Fv is a collection of fuzzy variables defined on possibility space (Γ , P(Γ ), Pos). A fuzzy random variable is a mapping X : Ω → Fv such that for any Borel subset B of ℜ, Pos {X(ω ) ∈ B} is a measurable function of ω . Let X be a fuzzy random variable on Ω . From the above definition, we know for each ω ∈ Ω , X(ω ) is a fuzzy variable. Furthermore, a fuzzy random variable X is said to be positive if for almost every ω , fuzzy variable X(ω ) is positive almost surely. For any fuzzy random variable X on Ω , for each ω ∈ Ω , the expected value of the fuzzy variable X(ω ) is denoted by E[X(ω )], which has been proved to be a measurable function of ω (see [4, Theorem 2]), i.e., it is a random variable. Given this, the expected value of the fuzzy random variable X is defined as the mathematical expectation of the random variable E[X(ω )]. Definition 3. Let X be a fuzzy random variable defined on a probability space (Ω , Σ , Pr). The expected value of X is defined as ⎡ ⎤
E[ξ ]=
Ω
∞
⎣
Cr{ξ (ω ) ≥ r}dr−
0
Cr{ξ (ω ) ≤ r}dr⎦ Pr(dω ).
(3)
−∞
0
Definition 4 ([4]). Let X be a fuzzy random variable defined on a probability space (Ω , Σ , Pr) with expected value e. The variance of X is defined as Var[X] = E[(X − e)2 ]
(4)
where e = E[X] is given by Definition 3.
3 Regression Model [19] The objective of this section is to design a fuzzy regression analysis technique, based on fuzzy random variables with confidence intervals, which will be referred to as confidence-interval-based fuzzy random regression analysis (FRRM). This study can be regarded as the generalization of our previous work [18], which focuses on a fuzzy regression model with an expected value approach to fuzzy random data. The confidence interval is defined by the expected value and variance of a fuzzy random variable. In the realization of fuzzy random regression, it is difficult to calculate the product between a fuzzy coefficient and a confidence interval. We consider two approaches: a vertices method to describe the model, and a realistic heuristic method to solve optimization of the fuzzy random regression model.
106
J. Watada and N. Arbaiy
3.1 Confidence Intervals In the sequel, possibilistic system has been applied to the linear regression analysis [10],[11], [16]. Our main concern here it to build a new fuzzy regression model for fuzzy random data, which is based on the possibilistic linear model. 3.1.1
Fuzzy Ransom Regression Model with Confidence Interval
Table 1 illustrates a format of data to be dealt with here, where input data Xik and output data Yi , for all i = 1. · · · , N and k = 1, · · · , K are fuzzy random variables, which are defined as
MYi
Yi=
Yit ,Yit,l ,Yit,r
t=1 MXik
Xik =
t,l
T t,r
Xikt , Xik , Xik
t=1
, pti ,
T
(5)
, qtik ,
(6)
respectively. This means that all values are given as fuzzy numbers with probabilt,l t,r t,l t,r ities, where fuzzy variables (Yit ,Yi ,Yi )T and (Xikt , Xik , Xik )T are associated with t t probability pi and qik for i = 1, 2, · · · , N, k = 1, 2, · · · , K and t = 1, 2, · · · , MYi or t = 1, 2, · · · , MXik , respectively. Table 1 Fuzzy random input-output data Sample Output i Y X1 1 Y1 X11 2 Y2 X21 .. .. .. . . . i Yi Xi1 .. .. .. . . . N YN XN1
X2 X12 X22 .. . Xi2 .. . XN2
Inputs · · · Xk · · · X1k · · · X2k .. . · · · Xik .. . · · · XNk
· · · , XK · · · , X1K · · · , X2K .. . · · · , XiK .. . · · · XNK
Let us denote fuzzy linear regression model with fuzzy coefficients A¯ 1 , · · · , A¯ K as follows: Y¯i = A¯ 1 Xi1 + · · · + A¯ K XiK , (7) A¯ lk +A¯ rk ¯l ¯r where Y¯i denotes an estimate of the output and A¯ k = are symmet2 , Ak , Ak T
ric triangular fuzzy coefficients when triangular fuzzy random data Xik are given for i = 1, · · · , N and k = 1, · · · , K as shown in Table 1.
Building Multi-Attribute Decision Model Based on Kansei Information
107
Table 2 Input-output data with confidence interval Sample i 1 2 .. . i .. . N
Output I[eY , σY ] I[eY1 , σY1 ] I[eY2 , σY2 ] .. . I[eYi , σYi ] .. . I[eYN , σYN ]
Inputs I[eX1 , σX1 ] · · · I[eXK , σXK ] I[eX11 , σX11 ] · · · I[eX1K , σX1K ] I[eX21 , σX21 ] · · · I[eX2K , σX2K ] .. .. .. . . . I[eXi1 , σXi1 ] · · · I[eXiK , σXiK ] .. .. .. . . . I[eXN1 , σXN1 ] · · · I[eXNK , σXNK ]
MY
When outputs Yi = t=1i (Yit ,Yit,l ,Yit,r )T , pti , i = 1, 2, · · · , N are given at the same time, we can determine the fuzzy random linear model so that the model includes all given fuzzy random outputs. Therefore, the following relation should hold: Y¯i = A¯ 1 Xi1 + · · · + A¯ K XiK ⊃ Yi , FR
i = 1, . . . N,
(8)
where ⊃ is a fuzzy random inclusion relation whose precise meaning will be exFR
plained later on. Following the principles of fuzzy arithmetic, the problem to obtain a fuzzy linear regression model results in the following mathematical programming problem: [Regression model with fuzzy random data] K ¯ = ∑ A¯ r − A¯ l min J(A) k k A¯
k=1
subject to
⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬
⎪ A¯ rk ≥ A¯ lk , ⎪ ⎪ ⎪ Y¯i = A¯ 1 Xi1 + · · · + A¯ K XiK ⊃ Yi , ⎪ ⎪ ⎪ FR ⎭ for i = 1, . . . N, k = 1, . . . , K.
(9)
Here, the fuzzy random inclusion relation ⊃ is critical to the model (9), which FR
can be defined in various ways. Watada and Wang [18] used the expectation-based inclusion, and converted the fuzzy random regression model (9) to the following expected value regression model which corresponds to the conventional fuzzy regression model: Before building the fuzzy random regression model with confidence interval, we define the confidence interval which is induced by the expectation and variance of a fuzzy random variable. When we consider the one sigma confidence (1 × σ ) interval of each fuzzy random variable, we can express it as the following interval
108
J. Watada and N. Arbaiy
I[eX , σX ] E(X ) − Var(X ), E(X ) + Var(X ) ,
(10)
which is called a one-sigma confidence interval. Similarly, we can define two-sigma and three-sigma confidence intervals. All of these confidence intervals are called σ -confidence intervals. Table 2 shows the data with one-sigma confidence interval. Based on σ -confidence intervals, we formulate a new fuzzy random regression as follows: [Confidence-interval-based fuzzy random regression model (FRRM)] ⎫ K ⎪ ⎪ r l ¯ = ∑ A¯ k − A¯ k ⎪ min J(A) ⎪ ⎪ ⎪ A¯ k=1 ⎪ ⎪ ⎪ ⎪ ⎬ subject to A¯ rk ≥ A¯ lk , ⎪ ⎪ ⎪ K ⎪ ⎪ ¯ ⎪ Y¯i = ∑ Ak I[eXik , σXi1 ] ⊃ I[e , σ ], Y Y i i ⎪ ∼ ⎪ k=1 ⎪ h ⎪ ⎭ for i = 1, . . . N, k = 1, · · · , K.
(11)
Since the product of a fuzzy number (fuzzy coefficient) and an interval (confidence interval) is influenced by the signs of each component, to solve the FRRM (11) , we need to take into account all the cases corresponding to different combinations of the signs of the fuzzy coefficients as well as the σ -confidence intervals of the fuzzy random data. The detailed computing associated with the the FRRM (11) will be discussed in the next section.
4 Multi-Attribute Decision Model The ultimate goal in multi-attribute decision model are describing the decision making with more than single attributes to obtain the best alternatives among the set of evaluated alternatives. In a linear evaluation problem the final ratings of each alternative are assessed by a linear function of relative merits of attributes. The relative merits of the alternatives are judged by comparing and ranking the final ratings. Therefore, weighting the alternatives plays a pivotal role in multi-attribute decision making. It requires weight information for attributes. Typically, decision makers play an essential role in deciding the weight. However, in real situations it is sometimes difficult to estimate the weights when the appropriate value can not be provided. Even though mathematical analysis may contribute to determine these weight, historical data used may contain fuzzy and random properties and should be treated properly. In this section, we propose the multi-attribute decision making scheme to accommodate the evaluation process in the hybrid uncertainty situation. We highlights two issues that should be addressed. First, it is necessary to derive a set of numerical weights representing the importance of the attributes with respect to the total evaluation. Second, when both of the random and fuzzy information are present in
Building Multi-Attribute Decision Model Based on Kansei Information
109
Table 3 Fuzzy Evaluation Scale. Intensity of Importance Crisp Value Fuzzy Value Fuzzy membership A = (a, λ ) 1 1˜ (1, 1) 2 2˜ (2, 1) 3 3˜ (3, 1) 4 4˜ (4, 1) 5 5˜ (5, 1) 6 6˜ (6, 1) 7 7˜ (7, 1) 8 8˜ (8, 1) 9 9˜ (9, 1)
Definition
Very poor Fairly poor Poor Below acceptance Acceptable Fairly good Good Fairly excellent Excellent
the observed data, it is required to characterize the data not only by using the formalism of random variables. Hence, the objective of this study is to build a fuzzy random regression model with confidence intervals in the fuzzy multi-attribute decision making design. The use of the fuzzy multi-attribute decision making scheme enables decision makers to evaluate and find the importance weight and further provide a ranking for selective samples.
4.1 Multi-Attribute Decision Model In this section, we explain a fuzzy multi-attribute decision model under hybrid uncertainty circumstances. We introduce the fuzzy evaluation scale as listed in Table 1 to express the expert’s evaluation. The triangular fuzzy numbers are used instead of crisp numbers to describe the fuzzy importance level. Let us regards the following notation for convenience for model definition. Xi : decision variables A∗ : attribute’s weight vector eXik : expected value of attribute k for sample i σXik : variance of attribute k for sample i where i = 1, · · · , N, and k = I[eXik , σXik ] : 1 × σ − confidence interval [al , au ] : interval numbers 1, · · · , K. The proposed methodology is as follows: 1. Problem Description and Data Preparation. The multi-attribute evaluation model of this system consists of total evaluation, attributes, and alternatives to be evaluated. Data are collected from human graders and the values for each criterion were assigned in a straightforward manner based on fuzzy evaluation scale as shown in Table 3.
110
J. Watada and N. Arbaiy
2. Weight Estimation. The following steps explain the Fuzzy Random Regression Model, which is used to estimate the weight of attributes. a. Organize the fuzzy random data that result in the form of (Yi , Xik ) for all i = 1, · · · , N and k = 1, · · · , K. Yi denotes the total evaluation of each ith sample, and Xik represents the assessment of the attributes. b. Compute the confidence interval of each fuzzy random values to construct the 1 × σ −confidence interval. i. Calculate the expected value of triangular fuzzy random variable E[x] as equation (3). ii. Calculate the variance var[x] as equation (4). c. Estimate the attribute’s weight by using Fuzzy Random Regression Model (9). Let X∗i denote an attribute vector of sample i and Yi∗ is the total evaluation of sample i where for i = 1, · · · , N and N is the number of candidate alternatives to be evaluated. Fuzzy random multi-attribute decision model is described in terms of fuzzy regression model (9) and inclusion relation in (9) is described as follows. All confidence intervals I[eXik , σXik ] are non-negative values in this problem. ⎫ min J(A) = A
K
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬
∑ (auk − alk )
k=1
subject to auk ≥ alk , y∗i + (eYi + σYi ) ≤
K
∑ auk (eX
+ σXik ) ⎪ ⎪ ⎪ ⎪ k=1 ⎪ ⎪ K ⎪ ⎪ l ∗ ⎪ yi − (eYi − σYi ) ≥ ∑ ak (eXik − σXik ) ⎪ ⎪ ⎪ ⎪ k=1 ⎪ ⎭ for i = 1, · · · , N and k = 1, · · · , K.
(12)
ik
where alk and auk are the lower and upper boundary for estimated weight. The solution for fuzzy random variables results in an interval numbers [alk , auk ]. 3. Ranking the alternatives. The weights [alk , auk ] are obtained from fuzzy random regression model (12) and can be used to calculate the final score of evaluation for ranking purpose. In order to build the multi-attribute decision model, let us denote a judgment matrix by A = [aik ]n×K and a fuzzy weight vector of attribute selection by W = [Wi ]1×K . The total score vector R = [ri ]n×1 of alternatives can be calculated with the following expressions: ⎫ ⎪ R = [ri ] = A ·W T ⎬ ri =
K
∑ (aik · wi ), ⎪ ⎭
(13)
k=1
where T is the transpose of a matrix or vector. In Equation (13), the total score R can be used to rank the alternatives.
Building Multi-Attribute Decision Model Based on Kansei Information
111
4. Decision analysis. Decision analysis can be made based on the result obtained from Step 1 to Step 4. The analysis may contain the most appropriate alternative (choice), a complete order of the alternatives (rank) and a ordered list of best alternatives (sort). Hence, the solution’s steps embrace two main points. The analysis begins by establishing attributes using fuzzy random variables-based regression that can measure relevant goal accomplishments. Second, we provide a multi-attribute decision scheme to evaluate and rank the alternatives under consideration of multiple attributes. In this study, when the fuzzy importance weights and fuzzy scale ratings are given, the total scores for alternatives can be analytically obtained as fuzzy numbers.
5 Conclusions The multi-attribute decision model is one of useful methods to assess Kansei information relating to the evaluation’s attributes. The methodology should then be useful to model problems incorporating knowledge and judgments of Kansei. A better understanding of this judgment and knowledge can be represented by weights of attributes during a decision making process. This uncertainty element is important to properly treated, as the judgment evaluation somewhat strongly involves individual human preferences. Hence, a multi-attribute decision model is built based on Fuzzy Random Regression Model (9). Our work described in this paper reveals that fuzzy decision in a multi-attribute structure can be effectively used to better facilitate the decision making process during the evaluation of contractors.
References 1. Gil, M.A., Miguel, L.D., Ralescu, D.A.: Overview on the development of fuzzy random variables. Fuzzy sets and systems 157(19), 2546–2557 (2006) 2. Kwakernaak, H.: Fuzzy random variables–I. Definitions and theorems. Information Sciences 15(1), 1–29 (1978) 3. Kwakernaak, H.: Fuzzy random variables–II. Algorithm and examples. Information Sciences 17(3), 253–278 (1979) 4. Liu, Y.K., Liu, B.: Fuzzy random variable: A scalar expected value operator. Fuzzy Optimization and Decision Making 2(2), 143–160 (2003) 5. Liu, B., Liu, Y.K.: Expected value of fuzzy variable and fuzzy expected value models. IEEE Transaction on Fuzzy Systems 10(4), 445–450 (2002) 6. Nahmias, S.: Fuzzy variable. Fuzzy Sets and Systems 1(2), 97–101 (1978) 7. Nureize, A., Watada, J.: A fuzzy regression approach to hierarchical evaluation model for oil palm grading. Fuzzy Optimization Decision Making 9(1), 105–122 (2010) 8. Arbaiy, N., Watada, J.: Approximation of goal constraint coefficients in fuzzy goal programming. In: Second International Conference on Computer Engineering and Applications (ICCEA 2010), vol. 1, pp. 161–165 (2010) 9. Nureize, A., Watada, J.: Building fuzzy random objective function for interval fuzzy goal programming. In: IEEE International Conference on Industrial Engineering and Engineering Management (IEEM 2010), pp. 980–984 (2010)
112
J. Watada and N. Arbaiy
10. Tanaka, H., Uejima, S., Asai, K.: Linear regression analysis with fuzzy model. IEEE Transactions on Systems, Man and Cybernetics (SMC) 12(6), 903–907 (1982) 11. Tanaka, H., Shimomura, T., Watada, J., Asai, K.: Fuzzy linear regression analysis of the number of staff in local goverment. In: Proceedings of FIP 1984, Kauai, Hawaii, July 22-26 (1984) 12. Wang, S., Liu, Y.-K., Watada, J.: Fuzzy random renewal process with queueing applications. Computers & Mathematics with Applications 57(7), 1232–1248 (2009) 13. Wang, S., Watada, J.: Reliability optimization of a series-parallel system with fuzzy random lifetimes. International Journal of Innovative Computing. Information & Control 5(6), 1547–1558 (2009) 14. Wang, S., Watada, J.: Studying distribution functions of fuzzy random variables and its applications to critical value functions. International Journal of Innovative Computing, Information & Control 5(2), 279–292 (2009) 15. Wang, S., Watada, J.: T -independence condition for fuzzy random vector based on continuous triangular norms. Journal of Uncertain Systems 2(2), 155–160 (2008) 16. Watada, J., Tanaka, H., Asai, K.: Analysis of time-series data by posibilistic model. In: Proceedings of International Workshop on Fuzzy System Applications, Fukuoka, pp. 228–233 (1988) 17. Watada, J., Pedrycz, W.: A fuzzy regression approach to acquisition of linguistic rules. In: Pedrycz, W., Skowron, A., Kreinovich, V. (eds.) Handbook on Granular Comutation, ch. 32, pp. 719–740. John Wiley & Sons, Chichester (2008) 18. Watada, J., Wang, S.: Regression model based on fuzzy random variables. In: Rodulf, S. (ed.) Views on Fuzzy Sets and Systems from Different Perspectives, vol. ch. 26, Springer, Berlin (2009) 19. Watada, J., Wang, S., Pedrycz, W.: Building confidence-interval-based fuzzy random regression model. IEEE Transactions on Fuzzy Systems 11(6), 1273–1283 (2009) 20. Zadeh, L.A.: Toward a generalized theory of uncertainty (GTU) - an outline. Inforamation Science 172(1-2), 1–40 (2005) 21. Zadeh, L.A.: Generalized theory of uncertainty (GTU)- principal concepts and ideas. Computational Statistics and Data Analysis 51(1), 15–46 (2006) 22. Zhao, R.Q., Tang, W.S.: Some properties of fuzzy random renewal processes. IEEE Transactions on Fuzzy Systems 14(2), 173–179 (2006)
Building on the Synergy of Machine and Human Reasoning to Tackle Data-Intensive Collaboration and Decision Making Nikos Karacapilidis, Stefan Rüping, Manolis Tzagarakis, Axel Poigné, and Spyros Christodoulou *
Abstract. This paper reports on a hybrid approach aiming to facilitate and augment collaboration and decision making in data-intensive and cognitively-complex settings. The proposed approach exploits and builds on the most prominent highperformance computing paradigms and large data processing technologies to meaningfully search, analyze and aggregate data existing in diverse, extremely large and rapidly evolving sources. It can be viewed as an innovative workbench incorporating and orchestrating a set of interoperable services that reduce the dataintensiveness and complexity overload at critical decision points to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative activities.
1 Introduction Individuals, communities and organizations are currently confronted with the rapidly growing problem of information overload. An enormous amount of content already exists in the digital universe (i.e. information that is created, captured, or replicated in digital form), which is characterized by high rates of new information that is being distributed and demands attention. This enables us to have instant access to more information than we can ever possibly consume. As characteristically pointed out in a recent study [8], the amount of information created, captured, or replicated exceeded available storage for the first time in 2007, while by 2011 the digital universe will be 10 times the size it was in 2006 (the digital universe is expanding by a factor of 10 every five years). People have to cope with such a diverse and exploding digital universe when working together; they need to efficiently and effectively collaborate and make decisions by appropriately assembling and analyzing enormous volumes of complex Nikos Karacapilidis · Manolis Tzagarakis · Spyros Christodoulou University of Patras and RA CTI, 26504 Rio Patras, Greece *
Stefan Rüping · Axel Poigné Fraunhofer IAIS, Schloss Birlinghoven, 53754 Sankt Augustin, Germany J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 113–122. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
114
N. Karacapilidis et al.
multi-faceted data residing in different sources. For instance, imagine a marketing and consultancy company being able to effortlessly forage the Web (blogs, forums, wikis, etc.) for high-level knowledge, such as public opinions about its products and services; it is thus able to capture tractable, commercially vital information that can be used to quickly monitor public response to a new marketing launch; having the means to meaningfully filter, collate and analyse the associated findings; and use the information to inform new strategy. The goal of the work reported in this paper is to turn this vision into reality. This work is performed in the context of an FP7 EU project (namely Dicode, http://dicode-project.eu/) that aims at facilitating and augmenting collaboration and decision making in data-intensive and cognitively-complex settings. To do so, the Dicode project will follow a hybrid approach, in that it will exploit and build on the most prominent high-performance computing paradigms and large data processing technologies - such as cloud computing, MapReduce (http://labs. google.com/papers/mapreduce.html), Hadoop (http://hadoop.apache.org), and Mahout (http://lucene.apache.org/mahout) – to meaningfully search, analyze and aggregate data existing in diverse, extremely large, and rapidly evolving sources.
2 Problems and Requirements to Be Addressed Collaboration and decision making settings are often associated with huge, ever-increasing amounts of multiple types of data, obtained from diverse and distributed sources, which often have a low signal-to-noise ratio for addressing the problem at hand. In many cases, the raw information is so overwhelming that stakeholders are often at a loss to know even where to begin to make sense of it. In addition, these data may vary in terms of subjectivity and importance, ranging from individual opinions and estimations to broadly accepted practices and indisputable measurements and scientific results. Their types can be of diverse level as far as human understanding and machine interpretation are concerned. At the same time, the associated data are in most cases interconnected, in a vague or explicit manner. Besides, it is nowadays easier to get the data in than out. Big volumes of data can be effortlessly added to a database (e.g. in transaction processing); the problems start when we want to consider and exploit the accumulated data, which may have been collected over a few weeks or months, and meaningfully analyze them towards making a decision. Admittedly, when things get complex, we need to identify, understand and exploit data patterns; we need to aggregate big volumes of data from multiple sources, and then mine it for insights that would never emerge from manual inspection or analysis of any single data source. In the settings under consideration, the way that data will be structured for query and analysis, as well as the way that tools will be designed to handle them efficiently are of great importance and certainly set a big research challenge. Generally speaking, information management related tasks need to be streamlined and automated. Recent findings clearly indicate that information management costs too much when it is not well organized and meaningfully automated [4]. They also call for investments in innovative software that reduces or eliminates time wasted,
Building on the Synergy of Machine and Human Reasoning
115
reduces management overheads, streamlines collaborative processes, and automates the overall workflow. Return on such investments can be both tangible (e.g. time or money saved) and intangible (e.g. more valuable information, easier extraction of hidden information, increase of information workers’ satisfaction and creativity, improved collaboration). As results from the above, issues related to the guidance of the information worker through the space of available data and the indication of relevant information to facilitate and augment collaboration and decision making activities are of major importance. Towards this direction, we foresee a semi-automatic, adaptive approach that makes use of both semantic metadata and pre-structured data patterns to provide plausible recommendations, while also learning from the users’ feedback to better target their information interests [1]. This will be enabled by innovative data mining techniques and services such as local pattern mining, similarity learning, and graph mining, coupled with a flexible framework where all these services are seamlessly integrated and orchestrated. Recent research in data mining is geared towards the extraction of more semantic information. Since data and information is today available in large volumes and diverse types of representation, intelligent integration of these data sources to generate new knowledge (towards serving collaboration and decision making requirements) remains a key challenge. Data pre-processing requirements, associated with data cleansing as well as handling of noise and uncertainty in various data sources, are inherent here. While data mining aims for detecting hidden relations on structured data, text mining analyses the semantics of unstructured data like natural language sentences and documents. Both of these technologies are complementary, rendering different kinds of information. For text mining, one area of particular interest is the extraction and identification of named entities (e.g. persons, organizations). In the settings under consideration, data sources are associated with various types of information, each of them covering distinct aspects. A systematic way is needed to generate different points of view for such kind of data. Contemporary approaches need to help users utilizing complex multi-source data in a reasonable way by supporting them in finding relevant information and by providing personalized recommendations. Another big category of requirements concerns the exploration, delivery and visualization of the pertinent information. These should be based on: (i) an intelligent semantic annotation, structuring and aggregation of voluminous and complex data, (ii) the meaningful analysis and exploitation of data patterns and interrelations, (iii) the capturing of stakeholders’ tacit knowledge, as far as information analysis and problem solving are concerned, through a social web approach, and (iv) the exploitation of particular user and group characteristics to properly direct or adapt data. Generally speaking, semantics to be deployed should come out of a joint consideration of stakeholders, their actions in the settings under consideration, and data considered each time. As far as collaboration and decision making support is concerned, stakeholders require solutions that easily enable them to create and maintain private or public workspaces, where the most pertinent information about the problem at hand can be gathered, linked, synthesized and assessed. Through such workspaces, they
116
N. Karacapilidis et al.
need to carry out synchronous or asynchronous collaboration to accommodate and elaborate the outcomes of data mining, get recommendations, identify inconsistencies, spot and repair information gaps, reason about actions etc. A goaldependant integration of data coming from heterogeneous databases is also required (to complement goal-driven data search and acquisition). Data visualization issues also impose a series of important requirements here. At the same time, these solutions should permit the definition of activity patterns (workflows) for the efficient orchestration of the complex data processing. These patterns should take into account an underlying model of semantics to intelligently organize and systematize the available resources (data and stakeholders) and data flows (e.g. to purposefully filter a big volume of data and direct the relevant information to an expert’s workspace; or, to meaningfully aggregate data coming out of diverse sources, each focusing on a distinct aspect of the problem under consideration). The formal definition of these workflows will also allow the triggering of search for the most pertinent information, in cases of information gap. The above bring up the need for development of innovative services that shift in focus from the mere collection and representation of large-scale information to its meaningful assessment, aggregation and utilization in contemporary collaboration and decision making settings.
3 The Foreseen Solution In the context of the Dicode project, we exploit a cloud infrastructure to adapt and refine computationally expensive algorithms for semantic data mining to new paradigms for distributed computing, such as the MapReduce paradigm (as implemented in frameworks like Hadoop and its Mahout derivative). The Apache Mahout project aims at building a scalable machine learning library on top of Hadoop. In the context of this project, a number of machine learning algorithms have already been parallelized. Based on the outcomes of the Mahout project so far, our goal is to set up a basic analysis infrastructure for the Dicode project. Aiming at fully supporting the data mining process (including pre-processing, modelling, validation and deployment), we integrate, adapt and extend the Mahout machine learning library, e.g. by developing advanced machine learning algorithms for large scale data. For instance, Mahout may significantly help towards grouping similar items (being they related raw data or users with similar expertise or interests), identifying main or “hot” topics, assigning items to predefined categories, recommending important data to diverse stakeholders, and discovering frequent and meaningful patterns. The advent of multicores and general-purpose computing opens new additional venues that will be in parallel explored. Several data mining algorithms have been transported to complex architectures exploiting the interplay of interprocessor message passing and chip parallelisation using multicores and graphics processor units (GPUs) with considerable speedups. Dicode will take advantage of these developments and try to integrate these into the foreseen workflows.
Building on the Synergy of Machine and Human Reasoning
117
The ultimate goal of the Dicode project is to support users in collaboratively solving problems and making decisions on complex scenarios with very large, potentially conflicting and incomplete amounts of information. To achieve this goal, Dicode will exploit and significantly advance the state-of-the-art in three relevant directions: • New techniques for scalable high-performance data mining: The envisioned interactive use of the Dicode platform requires quick reactions of the system on complex tasks that involve very large masses of data. Hence, a particular focus of the project is the adaptation of state-of-the-art paradigms for highperformance computing and their exploitation for complex data mining tasks. • Data Mining to make sense of real-world multi-faceted data: Innovative approaches are followed to extract information from complex heterogeneous data. A focus of the work is text data, which is the predominant form of storing and exchanging knowledge today. In addition, work is performed on the comprehensive linking and structuring of very different forms of data. • Collaboration and Decision Making Support: Exploiting and building on the synergy of human and machine reasoning capabilities, the Dicode project develops innovative services that meaningfully accommodate the output of data mining to enable stakeholders tame information overload and complexity in their working environments. Interactive data analysis, collaboration monitoring and recommendation techniques are the foci points of this direction.
3.1 New Techniques for Scalable High-Performance Data Mining A particular challenge of Dicode is to develop data mining approaches that scale well to extremely large, rapidly evolving data sets in the terabyte range. To respond to this challenge, the project develops a flexible large-scale data mining platform based on the MapReduce framework and related technology. MapReduce has been successfully deployed within Google and runs a myriad of different programming tasks. Many of the building blocks that are necessary to cover the requirements mentioned above do not rely on simple filters or selection rules; instead, they can only be efficiently implemented by employing appropriate machine learning algorithms (e.g. classification of objects into predefined categories, clustering objects by similarity, identification of trends in document streams, extracting topics from documents). There are some approaches on scaling traditional machine learning algorithms to large scale datasets [3, 13]; however, most research still leaves the aspect of scalability and performance optimization out of scope. The most comprehensive and active project aiming at implementing machine learning algorithms for large scale problems today is Apache Mahout [9]. However, Mahout is unsuitable for out-of-the-box, ad-hoc data analysis of arbitrary data: the project includes only very basic modules for data pre-processing, data conversion and integration with existing systems. In addition, the tools available so far are much focussed on text mining. The Dicode project aims to extend the data mining capabilities of Mahout in the following directions:
118
N. Karacapilidis et al.
• For data mining, the system should efficiently combine data acquisition, data pre-processing, data mining and model deployment under one framework. • It should be easy to pre-process and load existing data into Mahout. For instance, the data mining module must be able to interact with databases like HBase and use vectors generated by UIMA [5] as input for later text mining stages. • Data mining often involves a lot of manual work, expert knowledge and time. It is desirable to store validated solutions (both models and processes) for distributed data mining, and reuse them if the same task has to be solved later on. We aim at providing such a repository of solutions. • There exist some algorithms which cannot be readily converted to the MapReduce framework. For these algorithms, the system will include interfaces to other, more specialised frameworks for distributed computing. In parallel, Dicode will investigate parallelisation and distribution mechanisms for data mining exploiting the computational power of multicores and graphics processor units as demonstrated in [3, 10, 15].
3.2 Data Mining to Make Sense of Real-World Multi-faceted Data Dicode also aims to support users in solving problems and making decisions on complex scenarios with large, potentially conflicting and incomplete amounts of information. A main challenge in this setting is that - in reality - data is not as nicely structured as it seems to be in current solutions and closed systems. Instead, data in real-world applications are complex and multi-faceted, where important information is spread among multiple data sources and formats, and relations between these instances are not always obvious. In the following, we briefly mention two approaches that are particularly well suited to bridge the gap between data of different types and characteristics. • While it is typically not a problem to show up many possible links between two
pieces of information using standard features, it is hard to identify the really relevant links. A candidate technique to address this problem is similarity learning, which strives at finding an optimal, user-centric measure of similarity on the basis of many candidate similarity measures. Similarity measure learning has been proven to be successful in many different applications with complex items, ranging from the identification of similar people [11] to the identification of similar process workflows [6]. • Text data is the most central data type in many applications, because it is usually the one that is most suited for users. To bridge the gap between humanreadable texts and structured data that is suited for machine processing, text mining technologies are of much importance. In the context of Dicode, approaches to extract semantic information from text allow to generate higherlevel knowledge, that can be fed to the user as additional input for his decision making process without the need to read all texts (which can be millions of web pages in the extreme case) by himself. An important task in text mining is the recognition of named entities (such as people, companies, or even specific
Building on the Synergy of Machine and Human Reasoning
119
entities like gene names), which allows to uniquely identify the entity under discussion [14]. Building upon it, the problem of extracting semantic relations between different objects in a text is aimed at extracting structured knowledge in the form of relationship graphs [7].
3.3 Collaboration and Decision Making Support Web 2.0 has introduced a plethora of collaboration tools which provide engagement at a massive scale and feature novel paradigms. These tools cover a broad spectrum of needs, ranging from exchanging, sharing and tagging, social networking, to authoring, mind mapping and discussing. For instance, Delicious (http://delicious.com) and CiteULike (http://www.citeulike.com) provide services for storing, sharing and discovering of user generated Web bookmarks and academic publications, respectively. A different set of applications focuses on building online communities of people who share interests and activities (social networking applications). MySpace (http://www.myspace.com) and LinkedIn (http:// www.linkedin.com) are representative examples in this category. Another set of Web 2.0 tools aim at collectively organize, visualize and structure concepts via maps to aid brainstorming, study and problem solving. Tools such as Thinkature (http://www.thinkature.com) fall into this category. Finally, systems such as online discussion forums, Debatepedia (http://wiki.idebate.org) and Cohere (http://cohere.open.ac.uk/) support online discussions over the Web. Although all the above tools enable the massive and unconstraint collaboration of users, this very feature is the source of a problem that these tools introduce to their users: the problem of information overload. Current Web 2.0 collaboration tools exhibit two important shortcomings making them prone to the problem of information overload. First, these tools are “information islands”, thus providing only limited support for interoperation, integration and synergy with third party tools. Second, Web 2.0 collaboration tools are rather passive media; they lack reasoning services with which they could meaningfully support the collaboration. As far as existing decision making support technologies are concerned, data warehouses and on-line analytical processing have been broadly recognized as technologies playing a prominent role in the development of current and future DSS [12]. However, there is still room for further developing the conceptual, methodological and application-oriented aspects of the problem. One critical point that is still missing is a holistic perspective on the issue of decision making. This originates out of the growing need to develop applications by following a more human-centric (not problem-centric) view, in order to appropriately address the requirements of the contemporary, knowledge-intensive organization’s employees. Dicode will advance decision making support technologies by adopting a knowledge-based decisionmaking view, enabled by the meaningful accommodation of the results of the data mining processes. In such a way, the decision making process is able to produce new knowledge, such as evidence justifying or challenging an alternative or practices to be followed or avoided after the evaluation of a decision. Knowledge management activities such as knowledge elicitation, representation and distribution influence the
120
N. Karacapilidis et al.
creation of the decision models to be adopted, thus enhancing the decision making process [2]. With respect to collaboration and decision making support, the Dicode project provides a series of innovative features: First, Dicode will introduce advanced decision making services into collaborative environments in order to help control the impact of voluminous and complex data. Second, Dicode will not treat collaboration services as standalone applications that operate autonomously and in isolation from other services, but rather as ones that coexist. Third, Dicode will enable both human and machine understandable argumentative discourses to support ease-ofuse and expressiveness for users, as well as advanced reasoning by the machine.
4 Building a Hybrid Approach As noted above, the Dicode project aims at offering an innovative solution that reduces the data-intensiveness and overall complexity of real-life collaboration and decision making settings to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative activities. Towards this direction, the project will provide a suite of innovative, adaptive and interoperable services that satisfies the full range of the requirements reported above. The foreseen services will be running on the Web. The Dicode suite of services comprises: • Data acquisition services, which enable the purposeful capturing of tractable information that exists in diverse data sources and formats. Particular attention will be paid to web resources and the development of the associated spider components (web crawler). • Data pre-processing services, which efficiently manipulate raw data before their storage to the foreseen solution. Transformation of different kinds of documents into a canonical form, structuring of documents from layout information (e.g. detection of navigation, comments, abstracts), data cleansing (e.g. removing noise from web pages, discarding useless database records), as well as language detection and linguistic annotations, are some of the functionalities foreseen in this category of services. • Data mining services, which exploit and are built on top of a cloud infrastructure and other most prominent large data processing technologies to offer functionalities such as high performance full text search, data indexing, classification and clustering, directed data filtering and fusion, and meaningful data aggregation. Advanced text mining techniques such as named entity recognition, relation extraction and opinion mining will help to extract valuable semantic information from unstructured texts. Intelligent data mining techniques to be elaborated include local pattern mining, similarity learning, and graph mining. Scalable data storage issues will be also handled. • Collaboration support services, which facilitate the synchronous and asynchronous collaboration of stakeholders through adaptive workspaces, efficiently handle the representation and visualization of the outcomes of the data mining services (through alternative and dedicated data visualization schemas), and
Building on the Synergy of Machine and Human Reasoning
121
accommodate a workflow engine that enables the orchestration of a series of actions for the appropriate handling of data in each case. • Decision making support services, which augment both individual and group sense- and decision-making by supporting stakeholders in locating, retrieving and arguing about relevant information and knowledge, as well as by providing them with appropriate notifications and recommendations (taking into account parameters such as preferences, competences, expertise etc.). The services to be developed in this category will primarily exploit the reasoning capabilities of humans.
5 Conclusion Dealing with data-intensive and cognitively complex settings is not a technical problem alone. Building on current advancements, the approach described in this paper brings together the reasoning capabilities of the machine and the humans. It can be viewed as an innovative workbench incorporating and orchestrating a set of interoperable services that reduce the data-intensiveness and complexity overload at critical decision points to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative and innovative activities. Aiming at facilitating and augmenting collaboration and decision making, the proposed approach will enhance the quality of these processes and render time and cost savings.
Acknowledgments This publication has been produced in the context of the EU Collaborative Project “DICODE - Mastering Data-Intensive Collaboration and Decision” which is co-funded by the European Commission under the contract FP7-ICT-257184. This publication reflects only the author’s views and the Community is not liable for any use that may be made of the information contained therein.
References [1] Adomavicius, G., Tuzhilin, A.: Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering 17(6), 734–749 (2005) [2] Bolloju, N., Khalifa, M., Turban, E.: Integrating Knowledge Management into Enterprise Environments for the Next Generation Decision Support. Decision Support Systems 33, 163–176 (2002) [3] Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) Proceedings of the Twentieth Annual Conference on Advances in Neural Information Processing Systems, Vancouver, Canada, December 4-7, vol. 19, MIT Press, Cambridge (2006)
122
N. Karacapilidis et al.
[4] Eppler, M.J., Mengis, J.: The Concept of Information Overload: A Review of Literature from Organization Science, Accounting, Marketing, MIS, and Related Disciplines. The Information Society 20, 325–344 (2004) [5] Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering archive 10(3-4), 327–348 (2004) [6] Friesen, N., Rüping, S.: Workflow Analysis Using Graph Kernels. In: Proceedings of the ECML/PKDD Workshop on Third-Generation Data Mining: Towards ServiceOriented Knowledge Discovery (SoKD 2010), Barcelona, Spain (2010) [7] Horváth, T., Paass, G., Reichartz, F., Wrobel, S.: A logic-based approach to relation extraction from texts. In: De Raedt, L. (ed.) ILP 2009. LNCS, vol. 5989, pp. 34–48. Springer, Heidelberg (2010) [8] IDC , The Diverse and Exploding Digital Universe, White Paper (March 2008), http://www.idc.com [9] Ingersoll, G.: Introducing Apache Mahout, IBM developer works, Java Technical library (2009), http://www.ibm.com/developerworks/ja-va/library/j-mahout/ [10] Rao, S.N.T., Prasad, E.V., Venkateswarlu, N.B.: A scalable k-means clustering algorithm on Multi-Core architecture. In: Proc. of International Conference on Methods and Models in Computer Science (ICM2CS 2009), pp. 1–9 (2009) [11] Rüping, S., Punko, N., Günter, B., Grosskreutz, H.: Procurement Fraud Discovery using Similarity Measure Learning. Transactions on Case-based Reasoning 1(1), 37–46 (2008) [12] Shim, J.P., Warkentin, M., Courtney, J.F., Power, D.J., Sharda, R., Carlsson, C.: Past, Present and Future of Decision Support Technology. Decision Support Systems 33, 111–126 (2002) [13] Wegener, D., Mock, M., Adranale, D., Wrobel, S.: Toolkit-based high-performance Data Mining of large Data on MapReduce Clusters. In: Proc. of the 1st IEEE ICDM Workshop on Large-scale Data Mining: Theory & Applications (2009) [14] Whitelaw, C., Kehlenbeck, A., Petrovic, N., Ungar, L.: Web-Scale Named Entity Recognition. In: Proceedings of CIKM, pp. 123-132 (2008) [15] Yan, F., Xu, N., Qi, Y.: Parallel inference for latent Dirichlet allocation on graphics processing units. In: Advances in Neural Information Processing Systems, vol. 22, pp. 2134–2142 (2009)
Derivations of Information Technology Strategies for Enabling the Cloud Based Banking Service by a Hybrid MADM Framework Chi-Yo Huang, Wei-Chang Tzeng, Gwo-Hshiung Tzeng, and Ming-Cheng Yuan
*
Abstract. The cloud computing has emerged and become one of the most influential information technology (IT) development trends recently. The cloud computing can be used to enhance the operation efficiency and reduce the operating cost concurrently for firms in general and financial institutes in special. Albeit the cloud computing seems attractive to banking services from the aspects of cost minimization, service value maximization, and thus, competitive advantage achievement, barriers still exist for the introduction of cloud computing by banks. In order to enable the cloud based banking services, IT strategies should be introduced for overcoming the obstacles, including the security, system configuration, and other related issues. Based on the empirical study results by surveying an expert from one of the Taiwanese leading banks, the rapid deployment strategy, the multi-tenant technology, and huge data processing capabilities were regarded as the most important strategies. Keywords: Cloud Computing, Cloud Based Banking Services, Information Technology, Multiple Criteria Decision Making (MCDM).
1 Introduction The cloud computing, a new term for a long-held dream of computing as a utility (Parkhill 1966), has recently emerged as a commercial reality (Armbrust et al. 2009). Chi-Yo Huang · Wei-Chang Tzeng · Ming-Cheng Yuan Department of Industrial Education, National Taiwan Normal University No. 162, Hoping East Road I, Taipei 106, Taiwan e-mail:
[email protected] *
Gwo-Hshiung Tzeng Department of Business and Entrepreneurial Administration, Kainan University No. 1, Kainan Road, Luchu, Taoyuan County 338, Taiwan Gwo-Hshiung Tzeng Institute of Management of Technology, National Chiao Tung University Ta-Hsuch Road, Hsinchu 300, Taiwan e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 123–134. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
124
C.-Y. Huang et al.
The cloud computing enables the convenient, on-demand network access to a shared pool of the configurable computing resources which can be rapidly provisioned and released with minimal management effort or service provider interaction (Mell and Grance 2009). The cloud computing implies a service oriented architecture, reduced information technology overhead for the end-user, greater flexibility, reduced total cost of ownership, on demand services and many other things (Vouk 2008). One of the most basic business values that the cloud will tend to overturn is the perception that major computer resources are expensive and are reserved for a specially trained cadre of database administrators and business intelligence experts who know how to use them (Babcock 2010). The advantages of the economy of scale and statistical multiplexing may ultimately lead to a handful of Cloud Computing providers who can amortize the cost of their large datacenters over the products of many “datacenter-less” companies (Armbrust et al. 2009). Apparently, the clout computing is one of the prominent technology trends (Meeker et al. 2008; Buyya et al. 2008) which can enhance firms’ operation efficiency and reduce the operating cost concurrently. According to a recent research by Gartner, this market for cloud services will increase from $36 billion today to $160 billion by 2015 while 20 percent of companies will be using cloud computing for significant parts of their technology environment by 2012 (Benton and Negm 2010). Furthermore, according to Suresh (2010), the IT industry forecasts exponential investments and adoption rates for Cloud Computing while the Banking and Financial Service Institutions (BFS) stand to benefit significantly from Cloud computing capabilities. Benton and Negm (2010) mentioned that the cloud offers a host of opportunities for banks to reuse IT resources more efficiently and build a more flexible, nimble and customer-centric business model that can drive profitable growth. The cloud computing seems attractive to banking services from the aspects of cost minimization, service value maximization, and thus, competitive advantage achievement. However, barriers still exist for the introduction of cloud based services by banks. Benton and Negm (2010) mentioned that cloud computing is much more than simply renting servers and storage on-demand to reduce infrastructure costs — as many believe. Furthermore, it’s not simply a technology issue. According to a recent work by Suresh (2010), significant challenges, including legal, security, performance, reliability, transformation complexity, operating control & governance and most importantly proof for the promised cost benefits, exist for banking & financial service institutions in adopting the cloud based service models. Therefore, to derive the IT strategies for enabling the cloud based banking services, a hybrid multiple criteria decision making (MCDM) framework will be proposed. At first, the possible barriers and strategies for enabling the cloud based banking services will be derived by using literature review. Then, the modified Delphi method will be introduced for deriving the barriers and strategies for enabling the cloud based banking services. The relationships between the barriers will be derived by using the DEMATEL (Decision Making Trial and Evaluation Laboratory) method while the network relationship map (NRM) can be constructed accordingly. Then the weights being associated with each barrier will be derived by
Derivations of Information Technology Strategies for Enabling the Cloud
125
using the Analytic Network Process (ANP) based on the NRM. Finally, the Grey Relation Analysis (GRA) will be used for deriving correlations between each the IT strategies and all barriers for enabling the cloud based banking services. The IT strategies with the highest grey grades will be selected as the most suitable ones for enabling the cloud based banking services. In the pilot study, an expert from one of the leading Taiwanese leading banks was invited for providing opinions for the proposed hybrid MCDM framework. An empirical study on developing the IT strategies for enabling the cloud based banking services of a leading Taiwanese bank will be introduced for verifying the hybrid MCDM framework. The remainder of this paper is organized as follows. In Section 2, the concepts of the Cloud computing and strategies are introduced. In Section 3, a hybrid MCDM method based analytic framework for defining the cloud computing strategies will be proposed for defining strategies. Then, in Section 4, an empirical study example will be given for verifying the analytic framework. Discussions will be presented in Section 5. Section 6 will conclude the whole article with observations, conclusions and recommendations for further study.
2 Cloud Computing, Barriers and It Strategies In the following Section, the definitions of cloud computing will first be reviewed. Then, recent researches on the cloud based services will be re-visited. Obstacles and barriers for a firm’s migrating to the cloud and possible IT strategies for overcoming the barriers will be reviewed finally as a basis of this research. Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction (Mell and Grance 2009). Cloud computing builds on decades of research in virtualization, distributed computing, utility computing, and more recently networking, web and software services (Vouk 2008). It implies a service oriented architecture, reduced information technology overhead for the end-user, great flexibility, reduced total cost of ownership, on-demand services and many other things (Vouk 2008). Cloud computing denotes the infrastructure as a “Cloud” from which businesses and users are able to access applications from anywhere in the world on demand (Rajkumar Buyya et al. 2009). Cloud computing is the IT foundation for cloud services; it consists of technologies that enable cloud Services (Furht and Escalante 2010). The cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models (Mell and Grance 2009). Cloud computing services integration and provisioning experts should be able to focus on creation of composite and orchestrated solutions needed for an enduser which sample and combine existing services and images, customize them, update existing services and images, and develop new composites (Vouk 2008).
126
C.-Y. Huang et al.
Characteristics of cloud services include: (1) little or no capital investment required, (2) variable pricing based on consumption, (3) buyers “pay per use”, (4) rapid acquisition and deployment, (5) lower ongoing operating costs and (6) programmable (Benton and Negm 2010). Banks are looking at Cloud based solutions with the following set of distinct expectations: (1) dynamic and flexible technology model to fully align with the changing needs of the business, (2) highly optimized and virtualized infrastructure enabling scale and cost efficiency, (3) fully automated service provision, monitoring and management for achieving agility, (4) shared services delivered across trusted domains delivering security of data, transaction & operations, (5) Internet or Intranet based access model using high capacity bandwidth and ubiquitous connectivity, (6) service based acquisition model providing functionally rich capabilities on demand, (7) significantly low start up costs and rapidly expandable capabilities that shifts capital expenses, (8) ability to deliver tailored products and services across the business value chain and customer segments by composition of advanced capabilities provided by businesses and their partners, (9) usage based Business model that enables businesses to spend based on consumption, (10) economies of scale and core competency of the service providers, and (11) rapid innovation in services, features and operating models leveraging the capabilities of service provider(s) - as compared to the need for internal and isolated investments and enabling the full impact of Innovation to reach all consumers rapidly. Further, according to Benton and Negm (2010), banks are looking for opportunities in cost saving, establishment of a frictionless and flexible ecosystem, consumer cloud computing, and applications when you need them. As summarized by Armbrust et al. (2009), the ten obstacles for adoption and growth of cloud computing include availability of service, data lock-in, data confidentiality and auditability, data transfer bottlenecks, performance unpredictability, scalable storage, bugs in large-scale distributed systems, scaling quickly, reputation fate sharing, software licensing. The IT strategic proposals for the cloud based banking services include: (1) rapid deployment: using virtual machine, cloud computing is possible to deploy servers and to expand network bandwidth as well as processing power in a wide area dynamically (Kamiya et al., 2010); (2) resource scheduling: service-oriented cloud computing architecture consisting of service consumer’s brokering and provider’s coordinator services that support utility-driven internetworking of clouds: application scheduling, resource allocation, and workload migration (Buyya et al. 2009); (3) multi-tenant technology: multi-tenancy is a critical technology of SaaS (Software as a Service) to allow one instance of application serving multiple customers at the same time to share cloud resources and achieve high operational efficiency (Hong et al. 2010); a collaboration service system supporting multiple tenants can significantly reduce cost of customization, deployment and operation of a great number of middle-small size enterprises; (4) huge data processing: for data intensive applications, the concept of cloud computing is emerging where data and computing are co-located at a large centralized facility, and accessed as
Derivations of Information Technology Strategies for Enabling the Cloud
127
well-defined services (Szalay et al. 2009); (5) large message communication: Youseff et al. (2008) argued that as the need for a guaranteed quality of service (QoS) for network communication grows for cloud systems, communication becomes a vital component of the cloud infrastructure; (6) large-scale distributed storage: for data intensive applications, the concept of cloud computing is emerging where data and computing are co-located at a large centralized facility, and accessed as well-defined services (Szalay et al. 2009); and (7) authorization management and billing: a structure of the policy based security system is required for the purpose of authorization management to reduce the churn during service creation (Yildiz et al. 2009); the accounting service is fundamental for estimating the cost to be charged to each user and determining how the applications are responsible of the user expenses (Vaquero et al. 2008). Based on the above literature review results, criteria and strategies can be derived based on the hybrid MCDM framework to be proposed in the following Section 3. The evaluation criteria and the IT strategies overcoming the barriers of introducing the cloud based banking services will first be derived based on literature review and recognized by an expert. Then, appropriate IT strategies will be selected based on the hybrid MCDM framework.
3 Analytic Framework for Deriving the IT Startegies for Enabling the Cloud Based Banking Services The analytical process for deriving the IT strategies for enabling the cloud based banking services is initiated by deriving the possible IT strategies and evaluation criteria based on literature review results and confirmation by experts. The structure of the MCDM problem will be derived using the DEMATEL. The weights versus each strategy will be derived by using the ANP. Finally, the IT strategies for enabling the cloud based banking services will be derived by using the Grey Relational Analysis by introducing the weights corresponding to each criterion being derived by the ANP in the former stages.
3.1 DEMATEL Method The DEMATEL method was developed by the Battelle Geneva Institute: (1) to analyze complex ‘world problems’ dealing mainly with interactive man-model techniques; and (2) to evaluate qualitative and factor-linked aspects of societal problems (Gabus and Fontela 1972). To apply the DEMATEL method smoothly, the authors refined the definitions by Hori and Shimizu (1999), Chiu et al. (2006), Huang et al. (2007), and Huang et al. (2011), and produced the essential definitions indicated below. Definition 1: The pair-wise comparison scale may be designated as eleven levels, where the scores 0,1,2,…,10 represent the range from ‘no influence’ to ‘very high
128
influence’.
C.-Y. Huang et al.
Definition
2:
The
initial
direct
relation/influence
matrix
A = [aij ]n×n , i, j ∈ {1,2,..., n} is obtained by pair-wise comparisons, in terms of in-
fluences and directions between the objectives, in which aij is denoted as the degree to which the ith objective affects the j th objective. Definition 3: The normalized direct relation/influence matrix N can be obtained through following equations in which all principal diagonal elements are equal to zero: N = zA where z = (max
1≤ i ≤ n
∑
n j =1
aij )−1 . In this case, N is called the normalized matrix. Since
lim N k = [0]n×n . Definition 4: Then, the total relationship matrix T can be ob-
k →∞
tained using T = N + N 2 + ... + N k = N ( I - N )-1 , where I stands for the identity matrix. Here, k → ∞ and T is a total influence-related matrix; N is a direct in-
(
fluence matrix and N = [ xij ]n×n ; lim N 2 + " + N k k →∞
)
stands for a indirect influ-
ence matrix and 0 ≤ xij < 1 . So, lim N k = [0]n×n . The (i, j ) element tij of matrix k →∞
T denotes the direct and indirect influences of factor i on factor j . Definition 5: The row and column sums are separately denoted as r and c within the total⎡ n ⎤ and relation matrix T through T = [tij ], i, j ∈ {1, 2,..., n} , r = [ri ]n×1 = ⎢ tij ⎥ ⎢ ⎥ ⎣ j =1 ⎦ n×1
∑
⎡ n ⎤′ c = [c j ]n×1 = ⎢ tij ⎥ . Here, the r and c vectors denote the sums of the rows ⎢ i =1 ⎥ ⎣ ⎦ 1×n and columns, respectively. Definition 6: Suppose ri denotes the row sum of the
∑
i th row of matrix T . Then, ri is the sum of the influences dispatching from factor i to the other factors, both directly and indirectly. Suppose that c j denotes the col-
umn sum of the j th column of matrix T . Then, c j is the sum of the influences that factor i is receiving from the other factors. Furthermore, when i = j (i.e., the sum of the row sum and the column sum (ri + c j ) represents the index representing the strength of the influence, both dispatching and receiving), (ri + c j ) is the degree of the central role that factor i plays in the problem. If (ri - c j ) is positive, then factor i primarily is dispatching influence upon the other factors; and if ( ri - c j ) is negative, then factor i primarily is receiving influence from other factors (Huang et al. 2007; Tamura et al. 2002).
3.2 The ANP Method The ANP method, a multi criteria theory of measurement developed by Saaty (1996), provides a general framework to deal with decisions without making
Derivations of Information Technology Strategies for Enabling the Cloud
129
assumptions about the independence of higher-level elements from lower level elements and about the independence of the elements within a level as in a hierarchy (Saaty 2005). In this section, concepts of the ANP are summarized based on Saaty’s earlier works (Saaty 1996, 2003, 2005). A component of a decision network which was derived by the DEMATEL method in Section 3.1 will be denoted by C h , h = 1, 2,", m, and assume that it has nh elements, which we denote by eh1, eh 2 ,", ehm . The influences of a given set of elements in a component on any element in the decision system are represented by a ratio scale priority vector derived from paired comparisons of the comparative importance of one criterion and another criterion with respect to the interests or preferences of the decision makers. This relative importance value can be determined using a scale of 1–9 to represent equal importance to extreme importance (Saaty 1996). The influence of elements in the network on other elements in that network can be represented in the supermatrix as Wij = [ wij ], i ∈ {1, 2,..., m} , j ∈ {1,2,..., m} . A typical entry Wij = [ winx jny ] , nx ∈ {1,2,..., ni } , ny ∈ {1, 2,..., n j } in the
supermatrix, is called a block of the supermatrix in the following form where each column of Wij is a principal eigenvector of the influence of the elements (objective) in the ith component of the network on an element (objective) in the j th component. Some of its entries may be zero corresponding to those elements
(objective) that have no influence. After forming the supermatrix, the weighted supermatrix is derived by transforming all columns sum to unity exactly. This step is very much similar to the concept of the Markov chain in terms of ensuring that the sum of these probabilities of all states equals 1. Next, the weighted supermatrix is raised to limiting powers, such as lim W θ to get the global priority vector or called weights (Huang θ →∞
et al. 2005). In addition, if the supermatrix has the effect of cyclicity, the limiting supermatrix is not the only one. There are two or more limiting supermatrices in this situation, and the Cesaro sum would need to be calculated to get the priority. The weights of the k th objective being derived by using the above ANP processes, namely ωk , k ∈ {1,2,..., n} , will be used as the weight for the k th objective in the following Section 3.3.
3.3 Grey Relational Analysis The GRA is used to determine the relationship between two sequences of stochastic data in a Grey system. The procedure bears some similarity to pattern recognition technology. One sequence of data is called the ‘reference pattern’ or ‘reference sequence,’ and the correlation between the other sequence and the reference sequence is to be identified (Deng 1986; Mon et al. 1995; Tzeng and Tasur 1994; Wu et al. 1996).
130
C.-Y. Huang et al.
Deng also proposed a mathematical equation for the grey relation coefficient, as follows: γ ( x0 ( k ), xi ( k )) =
min min ( x0 ( k ) − xi ( k )) + ζ max max ( x0 ( k ) − xi ( k )) ∀i
∀k
∀i
∀k
( x0 ( k ) − xi ( k )) + ζ max max ( x0 ( k ) − xi ( k )) ∀i
∀k
where ζ is the distinguished coefficient ( ζ ∈ [0,1] ). Generally, we pick ζ =0.5. When the grey relational coefficient is conducted with respect to strategies to enhance marketing imagination capabilities, we then can derive the grade of the grey relation between the reference alternative γ ( x0 , xi ) γ ( x0 , xi ) =
∑
n
ωk × γ ( x0 (k ), xi (k )). where k is the number of criteria, ωk ex-
k =1
presses the weight of the k th criterion, and γ ( x0 , xi ) represents the grade of grey relation in xi (the k th strategy) correspondence to x0 . In this study, the IT strategies for enabling the cloud based banking services will be derived based on the Grey grades.
4 Empirical Study Following, an empirical study based on the management and executive of all banking industries will be leveraged for verifying the feasibility of this proposed framework. At first, eleven criteria and six strategies were derived based on literature review and an expert’s confirmation. The criteria includes system configuration ( c1 ), flexibility and reliability ( c2 ), stable network throughput ( c3 ), data security ( c4 ), system compatibility ( c5 ), transaction security ( c6 ), system performance ( c7 ), transaction efficiency ( c8 ), message exchange efficiency ( c9 ), system conversion cost ( c10 ), and system maintenance cost ( c11 ). The six strategies include rapid deployment ( s1 ), resource scheduling ( s2 ), multi-tenant technology ( s3 ), huge data processing ( s4 ), large-scale distributed storage ( s5 ), and authorization management and billing ( s6 ). Then, net relationship map (NRM) between the criteria (Figure 1) can then be derived by using the DEMATEL being introduced in Section 3.2. The criteria can further be prioritized by using the ANP being introduced in Section 3.3. Finally, the IT strategies with the highest Grey relationships to the criteria can be derived based on the Grey grades being introduced in Section 3.3. The weight being associated with each criterion as well as the Grey grade being associated with each strategy are demonstrated in Table 1. Based on the empirical study results, the rapid deployment ( s1 ), the multi-tenant technology ( s3 ) and the huge data processing capability ( s4 ) should be the strategies to be implemented.
Derivations of Information Technology Strategies for Enabling the Cloud
131
Fig. 1 NRM Table 1 The weight versus each criterion and the Grey grades versus each strategy Criteria
c1
c2
c3
c4
c5
c6
c7
c8
c9
c 10
c 11
Weight
0.155
0.081
0.000
0.202
0.048
0.362
0.067
0.041
0.043
0.000
0.000
c1
c2
c3
c4
c5
c6
c7
c8
c9
c 10
c 11
0.155 0.067 0.067 0.155 0.067 0.067
0.035 0.035 0.049 0.081 0.035 0.035
0.000 0.000 0.000 0.000 0.000 0.000
0.202 0.087 0.121 0.121 0.067 0.067
0.026 0.026 0.048 0.048 0.018 0.018
0.217 0.155 0.362 0.217 0.121 0.121
0.067 0.029 0.040 0.040 0.029 0.029
0.025 0.025 0.041 0.025 0.018 0.018
0.024 0.043 0.024 0.043 0.024 0.024
0.000 0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000
Criteria Strategy s1 s2 s3 s4 s5 s6
Grey Grade 0.751 0.466 0.751 0.731 0.377 0.377
5 Discussion Following, the managerial implications and future research possibilities will first be discussed from the aspect of the decision criteria. Moreover, the criteria being regarded as unimportant will be discussed. Finally, implications from the IT strategic manipulation aspects will further be addressed. Based on the empirical study results, system configuration ( c1 ), data security ( c4 ), transaction security ( c6 ) were ranked as the most important criteria which accounted for more than 70% of the total weight. The results are consistent with the systems requirements by the banking systems in the real world, which the security issues, including data security (20.2%) and transaction security (36.2%), have accounted for the most important role. Almost all the related works (e.g. Benton and Negm (2010), Suresh (2010), Shah and Clarke (2009), KPMG (2009), etc.) on e-banking or cloud based banking services made the same statements. Benton and Negm (2010) mentioned that security and data privacy remain prime concerns for cloud implementers in the banking sector. According to Shah and Clarke (2009), new issues such as privacy, security and reliability of information
132
C.-Y. Huang et al.
processing have become more prominent as the Internet and wireless technologies have widely been adopted in banking. Further, according to Suresh (2010), security of data, applications and process and the overall management of the solution platforms, with the ability to support customer, business and compliance specific requirements have become one of the major challenges and risks being associated with cloud solutions from the perspective of financial institutions. According to KPMG (2009), a central component of Cloud Computing is an economic agenda that promises cost, service and advantages over traditional IT architectures which are based on having dedicated resources for each business unit in an enterprise. According to (Suresh 2010), the evolution of financial institutes into the cloud based models is positively influenced by the current system configuration related attempts at right sizing the environment, consolidating computing capabilities across the enterprise, reducing the diversity in the operating models & technology and the deployment of service oriented architectures. Thus, Benton and Negm (2010) observed that banks will need cloud skills to help them choose among platform providers and determine the “glue” across these loosely coupled systems. Apparently, from the IT perspective, the statement is consistent with the first important criteria, system configuration ( c1 ). The criteria, including the stable network throughput ( c3 ), system conversion cost ( c10 ), and system maintenance cost ( c11 ), were regarded by the experts as unnecessary and weighted as 0 (Table 1). For the IT advanced countries, including Taiwan, the stable network throughput is by no doubt the criteria to be neglected. However, this criterion should be considered for the financial institutes being located in the less developed countries. Finally, for the cloud based services, there is very limited requirement for system conversion and maintenance costs. Thus, the expert also regarded these two criteria as not important. From the IT strategic manipulation aspects, the introduction of the rapid deployment strategy ( s1 ) is consistent with the requirements to maximize the advantages of rapid provisioning and elastic scaling of banking services on the cloud being identified by KPMG (2009). Meanwhile, the rapid deployment strategy can definitely shorten the time to market of the cloud based banking services. The cloud is a true “multitenanted” environment (Benton and Negm 2010). The multitenant technology ( s3 ), the critical technology of the SaaS to allow one instance of application serving multiple customers concurrently to share cloud resources and achieve high operational efficiency (Cai et al., 2010), is apparently the core technology for enabling the banking services on the cloud. Finally, the cloud based huge data processing capabilities ( s4 ) can assist financial services or banking institutes, which require the burst supercomputing to minimize processing time of complex calculations (PEER 1 Hosting 2011) with the least cost.
6 Conclusions The cloud computing are attractive to banking services from the aspects of cost minimization, service value maximization, and thus, competitive advantage achievement.
Derivations of Information Technology Strategies for Enabling the Cloud
133
However, barriers still exist for the introduction of cloud computing by financial institutes. The IT strategies for enabling the cloud based banking services were proposed based on a hybrid MCDM framework consisting of DEMATEL, ANP and GRA. The rapid deployment strategy, the multi-tenant technology, and huge data processing capabilities were regarded as the most important strategies. The analytic framework can be used for cloud computing strategy definitions for firms or institutes in other sectors or industries.
References Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Konwinski, A., Lee, G., et al.: Above the Clouds: A Berkeley View of Cloud Computing. Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, C.A (2009) Babcock, C.: Manamgenet Strategies for the Cloud Revolution. McGraw-Hill, New York (2010) Benton, D., Negm, W.: Banking on the Cloud. Accenture (2010) Buyya, R., Ranjan, R., Calheiros, R.N.: Modeling and simulation of scalable Cloud computing environments and the CloudSim toolkit: Challenges and opportunities. In: International Conference on High Performance Computing & Simulation, HPCS 2009 (2009) Buyya, R., Yeo, C.S., Venugopal, S.: Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities. Paper presented at the Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications, Dalian, China (2008) Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems 25(6), 599–616 (2009), doi:10.1016/j.future.2008.12.001 Chiu, Y.-J., Chen, H.-C., Tzeng, G.-H., Shyu, J.Z.: Marketing strategy based on customer behaviour for the LCD-TV. International Journal of Management and Decision Making 7(2/3), 143–165 (2006) Deng, J.L.: Grey Forecasting and Decision. Huazhong University of Science and Technology Press, Wuhan (1986) Furht, B., Escalante, A.: Handbook of Cloud Computing. Springer, Heidelberg (2010) Gabus, A., Fontela, E.: World Problems, an Invitation to Further Thought Within the Framework of DEMATEL. Batelle Geneva Research Center, Geneva (1972) Hong, C., Ning, W., Ming, Z.: A Transparent Approach of Enabling SaaS Multi-tenancy in the Cloud. In: 6th World Congress on Services, SERVICES-1 (June 2010) Hori, S., Shimizu, Y.: Designing methods of human interface for supervisory control systems. Control Engineering Practice 7(11), 1413–1419 (1999) Huang, C.Y., Hong, Y.H., Tzeng, G.H.: Assessment of the Appropriate Fuel Cell Technology for the Next Generation Hybrid Power Automobiles. Journal of Advanced Computational Intelligence and Intelligent Informatics (2011) (forthcoming) Huang, C.Y., Shyu, J.Z., Tzeng, G.H.: Reconfiguring the Innovation Policy Portfolios for Taiwan ’s SIP Mall Industry. Technovation 27(12), 744–765 (2007) Huang, J.-J., Tzeng, G.-H., Ong, C.-S.: Multidimensional data in multidimensional scaling using the analytic network process. Pattern Recognition Letters 26(6), 755–767 (2005) KPMG, Technology Paradigms for the Banking Industry: KPMG (2009)
134
C.-Y. Huang et al.
Meeker, M., Joseph, D., Thaker, A.: Technology Trends: Morgan Stanley (2008) Mell, P., Grance, T.: NIST Definition of Cloud Computing. National Institute of Standards and Technology (2009) Mon, D.L., Tzeng, G.H., Lu, H.C.: Grey decision making in weapon system evaluation. Journal of Chung Chen Institute of Technology 26(1), 73–84 (1995) Parkhill, D.F.: The challenge of the computer utility,vol. 246 . Addison-Wesley, Reading (1966) PEER 1 Hosting, Burst supercomputing launched on pay-as-you-go basis for financial services industries (2011), http://www.cloudcomputing365.net/ (accessed) Saaty, R.W.: The Analytic Hierarchy Process (AHP) for Decision Making and The Analytic Network Process (ANP) for Decision Making with Dependence and Feedback. Creative Decisions Foundation, PA (2003) Saaty, T.L.: Decision Making with Dependence and Feedback: The Analytic Network Process. RWS Publication, Pittsburgh (1996) Saaty, T.L.: Theory and Applications of the Analytic Network Process - Decision Making with Benefits, Opportunities, Costs, and Risks. RWS Publications, Pittsburg (2005) Shah, M., Clarke, S.: E-Banking Management: Issues, Solutions, and Strategies. Information Science Reference (2009) Suresh, M.C.: Cloud Computing - Strategic considerations for Banking & Financial Services Institutions. Tata Consultancy Services (2010) Szalay, A.S., Bell, G., Vandenberg, J., Wonders, A., Burns, R., Dan, F., Heasley, J., et al.: GrayWulf: Scalable Clustered Architecture for Data Intensive Computing. In: 42nd Hawaii International Conference on System Sciences, HICSS 2009 (2009) Tzeng, G.H., Tasur, S.H.: The multiple criteria evaluation of Grey relation model. The Journal of Grey System 6(2), 87–108 (1994) Vaquero, L.M., Rodero-Merino, L., Caceres, J., Lindner, M.: A break in the clouds: towards a cloud definition. SIGCOMM Comput. Commun. Rev. 39(1), 50–55 (2008), doi:10.1145/1496091.1496100. Vouk, M.A.: Cloud Computing – Issues, Research and Implementations. Journal of Computing and Information Technology 16(4), 235–246 (2008) Wu, H.S., Deng, J.L., Wen, K.L.: Introduction of Grey Analysis. Gau-Li Publication Inc, Taiwan (1996) Yildiz, M., Abawajy, J., Ercan, T., Bernoth, A.: A Layered Security Approach for Cloud Computing Infrastructure. In: 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN) (2009) Youseff, L., Butrico, M., Da Silva, D.: Toward a Unified Ontology of Cloud Computing. In: Grid Computing Environments Workshop, GCE (2008)
第
頁
Difficulty Estimator for Converting Natural Language into First Order Logic Isidoros Perikos, Foteini Grivokostopoulou, Ioannis Hatzilygeroudis, and Konstantinos Kovas *
Abstract. The NLtoFOL system is an interactive web-based system for learning to convert natural language (NL) sentences into first order logic (FOL). In this paper, we present a difficulty estimating expert system that determines the difficulty level of a sentence’s conversion process. Our approach is based on the complexity of the corresponding FOL formula instead of the NL sentence itself. Parameters like the number, the type and the order of quantifiers, the number of implications and the number of different connectives are taken into account. Experimental results show that for a significant part of sentences the difficulty estimating system produces the correct outputs. Keywords: Difficulty Estimation, Natural Language Formalization, First Order Logic.
1 Introduction Knowledge Representation & Reasoning (KR&R) is a fundamental topic of Artificial Intelligence (AI). A basic KR language is First-Order Logic (FOL), the main representative of logic-based representation languages, which is part of almost any introductory AI course and textbook. So, teaching FOL as a KR&R language a vital aspect. Teaching FOL as a knowledge representation and reasoning language includes many aspects. One of them is translating natural language (NL) sentences into FOL formulas, often called logic formalization of NL sentences. It is an ad-hoc process; there is no specific algorithm that can be automated within a computer. This is mainly due to the fact that NL has no clear semantics as FOL does. Most of existing textbooks do not pay the required attention to the above aspect. They simply provide the syntax of FOL and definitions of the logical symbols and terms [10]. Even more specialized textbooks do the same [4]. At best, they provide a Isidoros Perikos · Foteini Grivokostopoulou · Ioannis Hatzilygeroudis · Konstantinos Kovas School of Engineering Department of Computer Engineering & Informatics University of Patras 6500 Patras, Hellas (Greece) e-mail: {perikos,grivokwst,ihatz,kobas}@ceid.upatras.gr *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 135–144. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
136
I. Perikos et al.
kind of more extended explanations and examples [6]. They do not provide any systematic guidance towards it. In [7], we introduced a structured process for guiding students in translating a NL sentence into a FOL one. This process was implemented as a web-based interactive system NLtoFOL presented in [8]. However, the system in [8] cannot automatically determine the difficulty level of a sentence’s conversion. In this paper, we present this new capability of the system, an expert system that determines the formalization difficulty level of a sentence based on its corresponding FOL expression(s). The structure of the paper is organized as follows. Section 2 presents related work. Section 3 deals with the NL to FOL conversion by presenting the SIP (Structured and Interactive Process) formalization process via an example. In Section 4, the revised architecture of the NLtoFOL system is presented. Section 5 deals with the difficulty estimating expert system. Section 6 presents and discusses experimental results. Finally, Section 7 concludes and provides directions for future research.
2 Related Work KRRT (Knowledge Representation and Reasoning Tutor) [1] is a web-based system that that aims at helping students to learn FOL as a KR&R language. It is based on FITS [1], its predecessor system, and deals with both knowledge representation and reasoning with FOL. The translation from NL to FOL takes place in its KR part. The student gives his/her FOL proposal sentence and the system checks its syntax and whether it is the correct one. However, it does not provide any information about the difficulty of the NL sentence’s translation into FOL. In [3], a work that deals with the difficulty of the translation of NL into FOL is presented. Its main characteristic is that it tries to determine a NL sentence’s formalization difficulty based on the students’ responses. For this purpose, a corpus of twenty NL sentences was used and based on the students’ answers two metrics were calculated. Those two metrics concern the proportion of students who get a particular sentence’s formalization wrong and the number of attempts needed to determine the correct formula. Finally, based on those two metrics the NL sentences are characterized as Easy/Hard to get wrong and Easy/Hard to resolve. According to our knowledge, there is no system that automatically determines the difficulty level of a sentence’s conversion process.
3 A Structured and Interactive Process for NL to FOL Conversion One problem in converting natural language into first order logic has to do with the unclear semantics that natural language has. Natural language has no clear semantics as FOL does. The main difficulty comes from the lack of a systematic way of doing the conversion. In a previous work [7], we introduce NLtoFOL SIP (Structured and Interactive Process) for translating NL sentences into FOL formulas. It is a process that guides a student in translating (or converting) a NL sentence into a FOL one and consists of ten steps. To demonstrate it, we present the conversion of the NL sentence “All farmers who own donkeys beat them” into a FOL formula based on the NLtoFOL SIP process.
Difficulty Estimator for Converting Natural Language into First Order Logic
137
Step 1: Spot the verbs, the nouns and the adjectives in the sentence and specify the corresponding predicates or function symbols. There are four such elements: farmers Æpredicate: farmer donkeys Æpredicate: donkey own Æpredicate: owns beat Æpredicate: beats Step 2: Specify the number, the types and the symbols of the arguments of the function symbols (first) and the predicates (next). They are presented in the following table: Predicate Arity Types farmer 1 variable donkey 1 variable owns 2 variable,variable farmer 1 variable beats 2 variable, variable
Symbols x y x, y x x, y
Step 3: Specify the quantifiers of the variables.
∀ (because of “All”), y →∀ (because of an implicit “(all of) them”)
x→
Step 4: Construct the atomic expressions (or atoms) corresponding to predicates. We construct as many atoms as the predicates: Atom 1: farmer(x) Atom 2: donkey(y) Atom 3: owns(x,y) Atom 4: beats(x, y) Step 5: Divide produced atoms in groups of the same level atoms. This mainly refers to grouping atoms that should be connected with each other with some connective: AtomGroup1: {farmer(x), donkey(y), owns(x,y)} AtomGroup2: {beats(x,y)} Step 6: Specify the connectives between atoms of each group and create corresponding logical formulas. We form the formulas corresponding to the groups of step 5: AtomGroup1 ÆForm1: farmer(x) ∧ donkey(y)∧ owns(x,y) AtomGroup2 ÆForm2: feeds(x,y) Step 7: Divide produced formulas in groups of the same level formulas. This usually corresponds to specifying the left and right parts of an implication:
138
I. Perikos et al.
FormGroup1-1: {farmer(x) ∧donkey(y) ∧owns(x,y)} FormGroup1-2: {beats(x,y)} Step 8: If only one group of formulas is produced, specify the connectives between formulas of the group, create the next level formula and go to step 10. Not applicable. Step 9: Specify the connectives between formulas of each group, create the next level formulas and go to step 7. FormGroup1-1Æ Form1-1: farmer(x) ∧donkey(y) ∧owns(x,y) FormGroup1-2Æ Form1-2: beats(x,y) Step 7: FormGroup2-1: {(farmer(x) ∧donkey(y),owns(x,y)), beats(x,y)} Step 8: FormGroup2-1: ÆForm1-3: {(farmer(x) ∧ donkey(y) ∧ owns(x,y)) ⇒ beats(x,y)} Step 10: Place quantifiers in the right points in the produced formula to create the final FOL formula. (∀x) (∀y) (farmer(x) ∧ donkey(y) ∧ owns(x,y)) ⇒beats(x,y)
4 System Architecture The basic architecture of NLtoFOL system is presented in [8], which we revise here. Student
Tutor
Student Interface (SI)
Student Interface Configurator
Intelligent Data Analysis
Tutor Interface (TI)
Difficulty Estimating System (DES)
System Database (SD)
Fig. 1 Extended Architecture of the NLtoFOL System
Difficulty Estimator for Converting Natural Language into First Order Logic
139
More specifically, we introduce a new component, which estimates the difficulty level of the conversions of the stored NL sentences. So, the system consists of two interfaces, the Student Interface (SI) and the Tutor Interface (TU). SI is dynamically configured [8] during the NL to FOL conversion session via the Student Interface Configurator . This is achieved via the guidance given by the Intelligent Data Analysis unit, which is a rule-based system that based on the input data from SI decides on which reconfigurations should be made to it or which kind of interaction will be allowed or given to the user. It is also responsible for tracing user’s mistakes and handling them in terms of appropriate feedback to the user. The Difficulty Estimating System (DES) is an expert system that is used to determine the difficulty level of a sentence’s formalization process. This can be done automatically without tutor intervention. Tutor can accept or not the result of DES. Finally, the System Database (SD) is used to store sentences and related information, including estimated difficulty level.
5 Difficulty Estimating System We have developed the Difficulty Estimating System (DES), which is able to automatically determine the difficultly level of a NL to FOL sentence conversion. The structure of DES is illustrated in Figure 2. DES is a rule-based expert system implemented in Jess, an expert system shell [5]. It consists of the Difficulty Parameters Fact Base (DPFB), where the values of the parameters which the estimation is based on are stored, the Difficulty Estimation Rule Base (DERB), where the rules for estimating the difficulty level are stored, and the Jess Inference Engine (JIE), which performs and controls the difficulty estimation process. The corresponding FOL formula of each NL sentence is analyzed via the FOL Formula Analyzer (FFA), the values for its estimation parameters are extracted and stored in DPFB as Jess facts. The output of the system is the difficulty level of the NL sentence conversion process. Knowledge Base
FOL Formula Analyzer (FFA)
FOL Formula
Difficulty Parameters Fact Base (DPFB)
Difficulty Estimation Rule Base (DERB)
Jess Inference Engine (JEI)
Fig. 2 The structure of Difficulty Estimating System
Difficulty Level
140
I. Perikos et al.
We should also notice that the values of the parameters used for difficulty estimation are aslo permanently stored in SD for each sentence. To develop DES, we consulted an expert-tutor in the field. Most tutors empirically estimate the difficulty of a sentence’s conversion/formalization. In cooperation with the expert-tutor, we tried to specify which factors/parameters have an impact on the difficulty level of sentences’ conversions. Finally, we came up with the following difficulty estimation related parameters, which refer to the converted (FOL) sentence: • the number, the type and the order of the quantifier(s) • the number of the implication symbols • the number of the different connectives Connectives include { ∧, ∨, ¬, ⇒}. So, DES determines the difficulty level of a conversion process based on the above parameters, which are all related to the resulted FOL formula. Based on the resulted FOL formula was the easiest and most effective choice. Afterwards, we consulted the expert-tutor to acquire the necessary rules for the difficulty estimation based on the above parameters. Table 1 presents the resulted rules, which classify sentences into five categories as far as the difficulty of their conversion process is concerned: very easy, easy, medium, difficult and advanced. Table 1 Rules for determine difficulty level exercises
No
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Number Number Universal Existential of of quantifiers quantifiers implications quantifiers
0 0 0 0 0 1 0 0 ≤1 ≤1 ≤1 ≤1 1 1 1 1 ≥2 ≥2
0 ≤1 ≤1 ≥1 ≥1 ≤1 ≥2 ≥2 ≤1 ≤1 ≥1 ≥1 ≥2 ≥2 ≤1 ≤1 ≥2 ≥3
no yes no yes no yes yes yes yes no yes no yes yes yes no yes yes/no
no no yes no yes no yes yes no yes no yes yes yes no yes yes yes/no
Number of ∃ different before connective ∀ s
0 0 0 ≤1 ≤1 ≤2 0 0 3 3 ≤3 ≤2 ≤2 ≤2 3 3 ≤3 ≤3
no yes no yes -
Difficulty Level
Very Easy Very Easy Very Easy Easy Easy Easy Easy Medium Medium Medium Medium Medium Medium Difficult Difficult Difficult Advanced Advanced
Difficulty Estimator for Converting Natural Language into First Order Logic
141
For example, for the NL sentence "Every city has a dog catcher that has been bitten by any dog living in the city" with corresponding FOL formula: (∀x) city(x) ⇒ ((∃y) dog-catcher(y,x)∧(∀z) (dog(z)∧lives-in(z,x)) ⇒ has-bitten(z, y)) the total number of implications are two, the total number of quantifiers are three, there are three quantifiers, two universal and one existential, and there are two different connectives (∧, ⇒). So, rule 18 of Table 1 applies and the difficulty level of the sentence is estimated to be "Advanced". Table 2 presents some example sentences and corresponding difficulty levels of their conversion processes as produced by DES. Table 2 Examples of difficulty levels estimated by DES
Natural Language (NL) Pluto is a dog There is at least one thief Some cats are black Every apple is delicious Every gardener likes the sun. Ancestors of my ancestor are ancestors of mine No purple mushroom is poisonous. There is a barber in town who shaves all men in town who do not shave themselves.
First Order Logic (FOL) dog(Pluto) (∃x) thief(x) (∃x) cat(x) ∧black(x) (∀x)apple(x) ⇒delicious(x) (∀x) gardener(x) ⇒likes(x,sun) (∀x)(∀y)(∀z) (ancestor (x, y) ∧ancestor (y,z) ⇒ancestor(x,z)) (∀x)(mushroom(x)∧purple(x) )⇒¬poisonous(x) (∃x)(barber(x)∧ inTown(x)∧ (∀y)(man(y) ∧inTown(y)∧ ¬shave(y,y)⇒shave(x,y)))
Difficulty Level Very Easy
Rule applied 1
Very Easy
3
Easy
5
Easy
6
Easy
6
Medium
11
Medium
9
Difficult
14
6 Experimental Results and Discussion To measure the performance of DES, an evaluation was performed. A corpus of 88 NL sentences and their corresponding FOL formulas was created and used for the system evaluation. The system was used to determine the difficulty level of those sentences, for which the expert-tutor had determined their difficulty levels. All sentences were given as inputs to DES and its outputs were produced. Evaluation of DES was based on the following metrics: accuracy, precision, sensitivity and specificity, which for two classes are defined as follows:
acc =
a+d , a+b+c+d
prec =
a a , sen = , a+c a+b
spec =
d c+d
142
I. Perikos et al.
where, a is the number of positive cases correctly classified, b is the number of positive cases that are misclassified, d is the number of negative cases correctly classified and c is the number of negative cases that are misclassified. By ‘positive’ we mean that a case belongs to the class of the corresponding difficulty level and by negative that it doesn’t. In case of multiple classes, as ours, the above metrics are calculated as follows: m
acc =
∑ acc i =1
m
i
, prec =
∑ prec i =1
m
i
, sen =
∑ sen i =1
m
i
, spec =
∑ spec i =1
i
m m m m where m is the number of output classes. They represent the average values of the metrics, across all classes. The results are presented in Table 3. Table 3 Evaluation metrics for DES
Difficulty class Evaluation metric Accuracy Precision Sensitivity Specificity
Very Easy 0.909 0.600 1.000 0.894
Easy
Medium
Difficult
Advanced
Average
0.898 0.923 0.774 0.965
0.920 0.882 0.909 0.927
0.920 0.667 0.250 0.988
0.966 0.600 0.750 0.976
0.9226 0.7344 0.7366 0.9500
The results show a very good performance of DES. A noticeable point is the low sensitivity value in the case of ‘Difficult’ class, which however is due to the small number of available sentences belonging to that class (only four). The general accuracy of the system shows the percentage of the sentences for which the system and the expert-tutor have determined the same difficulty level. From the corpus of 88 sentences that were tested, the system correctly identified the difficulty of 71 sentences. Thus, the general accuracy of the system is 0.8068, which also shows a very good performance. Looking at the sentences that DES failed to determine the right difficulty level (class), we came to the conclusion that it is mainly due to the fact that our approach does not take into account the NL version of the sentence, which plays a role in some cases. More specifically, for some NL sentences happens that while they have difficult semantics, which makes the expert-tutor to give a higher level of difficulty to them, they have rather simple corresponding FOL formulas. Thus, the system cannot give a correct estimation of their difficulty. One such case is the following: Natural Language (NL) Not all men that are vegeterian are happy
First Order Logic (FOL) (∃x) male(x) ∧ vegeterian(x) ∧ ¬happy(x)
DES Result
Expert Classification
Easy
Medium
Difficulty Estimator for Converting Natural Language into First Order Logic
143
We also conducted a second experiment to test whether the proposed classification is valid. For this evaluation, a corpus of 20 sentences was used. The corpus created in such a way that four NL sentences from each difficulty level were randomly selected. A group of 40 students was given that corpus and tried to convert the NL sentences into the corresponding FOL ones using the process presented in Section 3. In Table 4, the percentages of the correctly accomplished conversions per difficulty class are presented. It shows that difficulty levels specified by DES are reasonable and correspond more or less to reality. One could expect a less percentage of correct conversions in the difficult and the advanced levels. Table 4 Statistics based on students’ answer analysis.
Difficulty Level Very easy Easy Medium Difficult Advanced
Average correct conversions (%) 87 80 68 47 35
7 Conclusion and Future Work The NLtoFOL system is a web based interactive system for helping students to translate (or convert) natural language (NL) sentences into first-order logic (FOL) formulas. In this paper an expert system that determines sentences formalization/conversion difficulty is presented. The system takes as input the corresponding first order logic (FOL) formula of a NL sentence and gives as output an estimation of the difficulty of its conversion process. To do so, it computes a set of parameters, like the number and the type of the quantifier(s), the number of the implications and the different connectives of the FOL expression. Experimental results validate our approach in a large degree. However, there are some points that the system can be improved at. As mentioned above, DES works by analysing the FOL formula of the NL sentence. An improvement will be to take into account the natural language structure of the sentence and parameters related to the semantics of the sentence. Moreover, it could take into account the existence of some keywords like “everybody” or “somebody”, their number and their order. This is a direction for further research. Alternatively, use of a method similar to that in [9] could be investigated. This means to use a student-based approach instead of a sentence-based approach or a combination of them. This is another direction for further research.
Acknowledgement This work was supported by the Research Committee of the University of Patras, Greece, Program “Karatheodoris”, project No C901.
144
I. Perikos et al.
References [1] Alonso, J.A., Aranda, G.A., Martín-Matceos, F.J.: FITS: Formalization with an Intelligent Tutor System. In: Proceedings of the IV International Conference on Multimedia and Information and Communication Technologies in Education (2006) [2] Alonso, J.A., Aranda, G.A., Martn–Mateos, F.J.: KRRT: Knowledge representation and reasoning tutor system. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds.) EUROCAST 2007. LNCS, vol. 4739, pp. 400–407. Springer, Heidelberg (2007) [3] Barker-Plummer, D., Cox, R., Dale, R.: Dimensions of Difficulty in Translating Natural Language into First-Order Logic. In: 2nd International conference on Educational Data Mining, Cordoba, Spain, pp. 220–228 (2009) [4] Brachman, R.J., Levesque, H.J.: Knowledge Representation and Reasoning. Elsevier, Amsterdam (2004) [5] Friedman-Hill, E.: Jess in Action: Rule-Based Systems in Java. Manning Publications Company (2003) [6] Genesereth, M.R., Nilsson, N.J.: Logical Foundations of AI. Morgan Kaufmann, Palo Alto (1987) [7] Hatzilygeroudis, I.: Teaching NL to FOL and FOL to CL Conversions. In: Proceedings of the 20th International FLAIRS Conference, Key West, FL, pp. 309– 314. AAAI Press, Menlo Park (2007) [8] Hatzilygeroudis, I., Perikos, I.: A web-based interactive system for learning NL to FOL conversion. In: Damiani, E., Jeong, J., Howlett, R.J., Jain, L.C. (eds.) New Directions in Intelligent Interactive Multimedia Systems and Services - 2. SCI, vol. 226, pp. 297–307. Springer, Heidelberg (2009) [9] Koutsojannis, C., Beligiannis, G., Hatzilygeroudis, I., Papavlasopoulos, C., Prentzas, J.: Using a hybrid Al approach for exercise difficulty level adaptation. International Journal of Continuing Engineering Education and Life-Long Learning 17(4-5), 256– 272 (2007) [10] Russell, S., Norvig, P.: Artificial Intelligence: a modern approach, 2nd edn. Prentice Hall, Upper Saddle River (2003)
Emergency Distribution Scheduling with Maximizing Marginal Loss-Saving Function Yiping Jiang and Lindu Zhao*
*
Abstract. In this paper, we focus on the problem that reduces the loss of human lives and properties as much as possible in emergency response management. This problem is formulated as a combinatorial optimal problem of emergency resource allocation and distribution. Firstly we propose an exponential marginal loss-saving function as a decision-making tool to allocate the scarce emergency resource. Secondly, we propose an emergency distribution scheduling model through introducing time-space network, and then construct an integrated model that combined the marginal loss-saving function and time-space-based distribution model. Finally, we explore the optimal solution of this combinatorial problem through a numerical example. Our work in this paper can provide a tactical and operational method to allocate scarce emergency resource and make distribution scheduling for emergency response agencies. Keywords: Distribution Scheduling, Maximizing Marginal Loss Saving, Exponential Function, Time-Space Network, Emergency Response.
1 Problem Setting A number of large-scale disastrous events have been witnessed by the world in recent years, especially frequent huge earthquakes. And the worldwide countries also have grown increasingly concerned about the challenges of dealing with these major disasters [1]. It is generally agreed that a large-scale disaster attack can conceivably cause large numbers of deaths, and drastically destroy local or regional health-care systems. In the event of an extreme disaster, the demand for scarce resources, such as rare medical supplies, can be overwhelming. It is vital to transport various resources including food, shelter, and especially medicine from the receiving supply nodes to the disaster affected areas as quickly as possible to support Yiping Jiang · Lindu Zhao Institute of Systems Engineering, Southeast University, Nanjing, China, 210096 e-mail:
[email protected],
[email protected] *
* Corresponding author. J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 145–154. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
146
Y. Jiang and L. Zhao
rescue operations and save human lives. Therefore, reduce the loss of properties and human lives as much as possible are a key factor to measure the effectiveness and efficiency of a local emergency response system. However, some practical problems still exist, though more and more scholars pay attention to this field. First of all, a large-scale disaster usually happens suddenly and affects many areas simultaneously so that it causes a surge of demand for some particular emergency resources (such as blood, vaccine, camp, etc.) during a very short period of time. Hence, how to make a schedule of allocating and distributing scarcer emergency resource to many affected areas so as to efficiently use the precious and limited resource becomes an important problem. Then, the damage level and destroyed degree are different among various disaster affected areas, in general, the further away the place is from the disaster happened area, the less damage it will be caused by the disaster. Hence, what’s the decision principle for emergency officials to prioritize each disaster area becomes a crux. Finally, time is a critical factor in emergency response, and any delay in distribution can cause more deaths and greater losses. But the affected areas are geographically located in different space. Thus, how to make the emergency distribution schedule considering the time and space simultaneously is a more important problem. This paper focuses on these problems and bridges the gap of current research. The contribution in this paper can be unfolded as three aspects: we focus on the essential problem in emergency response that reduces the loss of human lives and properties as much as possible, and we propose an exponential marginal losssaving function as a decision-making tool to allocate the scarce emergency resource. Then we consider how to distribute the allocated emergency resource under the constraint of distribution time and geographical location, and propose an emergency distribution scheduling model through introducing time-space network. Finally, we construct an integrated model that incorporates the marginal losssaving function into the time-space-based distribution model. The remainder of the paper is organized as follows. Section 2 contains a review of the relevant literature on emergency logistics and emergency response management from the OR/MS perspective. Section 3 introduces our framework for the proposed approach, including the marginal loss-saving function, and the integrated emergency distribution scheduling model based on time-space network. Then, a numerical example and a sample sensitivity analysis are given in Section 4. Finally, the limitations of the proposed models and future research directions are reported in Section 5.
2 Related Works In this section, we only concentrate our attention on the field of emergency logistics and emergency response from the OR/MS perspective. Those work studying on emergency management field by using other qualitative methods will not be contained in the following discussions.
Emergency Distribution Scheduling
147
Typical topics of current research on emergency logistics can be divided into helicopter mission planning during a disaster relief [2], relief delivery systems design [3], scheduling of emergency preparation by using scenario planning model [4], path selection problem [5], disaster relief operation planning [6], evacuation [7] [8] [9], and scheduling of emergency roadway repair and resource distribution by using time-space network [10]. But these works only consider the deterministic situation, and their models have not captured the dynamical and stochastic characteristics varying with the disaster diffusion. Since the information of demand, casualties and road condition is highly uncertainty in the disaster situation, the logistics plan should involve a planning time horizon consisting of a given number of time periods in order to deal with time-variant demand, supply, and other related complexities [11]. The related research efforts of dealing with stochastic or some other time-varying characteristics commonly focus on the demand, the supply or the uncertainty of road condition, and main modeling technologies in emergency logistics research are the stochastic programming model [12], the fuzzy theory [13], the data fusion and entropy [14]. For other research topics about allocating ambulance and emergency resources, readers can refer to Gong and Batta [15], Arora et al. [16] respectively. In addition, we only overview the research efforts on the modeling of emergency logistics, for more broad topic relating disaster management, readers can get more information from some review papers, such as Wright et al. [17], Green et al. [18], Altay et al.[19], and Simpson et al. [20]. To summarize, it is worth mentioning that the previous works are done by mathematical models with objectives of minimizing cost (or time, or path complexity), or maximizing the satisfaction level. However, the primary purpose of emergency logistics is to relieve human suffering and save human lives as much as possible, so life saving is more important than other objectives such as time or cost for emergency logistics. Besides, the utility of emergency resources should be highlighted in the context of demand surge but limited available resources during the immediate post-disaster relief phrase. This attribute determines the objective function for modeling emergency logistics is not just a mathematical formulation but a utility function. In this paper, our work will consider these special characteristics and incorporate them into an integrated emergency distribution model, so as to provide an effective decision-tool of allocating and distributing emergency resources for relevant officials and practitioners.
3 Model Formulations In this section, we discuss in detail how to allocate and distribute the scarce emergency resources to disaster affected areas as soon as possible, through introducing the principle that maximizing marginal loss saving, in order to high effectively make use of every unit of resource and furthest reduce the total loss caused by
148
Y. Jiang and L. Zhao
various natural disasters. Before given the mathematical model and quantitative analysis, we give some realistic assumptions for those factors not considered in our proposed model: (1) our model only focus on the decision-making of immediate post-disaster relief operation; (2) suppose one unit resource can save one person’s life, such as one bottle of vaccine, one package of bread, or one bag of medicine, etc; (3) the supply amount of the emergency resources is less than the demand. And, it is also reasonable for this assumption because the resource is scarce and high shortage confronting the demand surge during immediate postdisaster period; (4) time scheduling in our model is finite; (5) various information like the damage level in different places, the demand and the road network, can be gathered from diverse channels quickly. It is also can be acceptable that nowadays the high development of satellites and sensors can be easily to get all geographical information, and the satellite phones as well as mobile devices also can transfer the detailed data about the damage and the demand in different places.
3.1 Analysis of Maximizing Marginal Loss Saving Based on the above analysis, we consider there are j ∈ D disaster affected nodes, and each node has different disaster damage level, different demand requirements; and the resource supply from supply node i is si , i ∈ S . Then, we formulate the resource allocation and distribution from supply nodes to demand nodes through using the model of maximizing marginal loss saving in the following parts. Assume the marginal loss saving of the arriving resource in demand node j is a continuously decreasing function for each unit resource, i.e.
l j (t ) ≡ Where
dL j (t ) dt
<0
(1)
l j (t ) : R+ → R+ , ∀j ∈ D is a no increasing and convex function; L j (t )
is a nondecreasing and concave function, and represents cumulative loss saving for the arriving resource in demand node j up to time t . Denote
z jt as the arriving amount to demand node j in time t , so L j (t ) can
be written as
L (t ) = ∫ l j (t ) dz jt j
(2)
Emergency Distribution Scheduling
149
And, the total cumulative loss saving in all demand nodes can be obtained in Eq. (3).
L(t ) = ∑ j∈D L j (t ) = ∑ j∈D ∫ l j (t )dz jt
(3)
Therefore, the objective in Eq. (3) is to determine an emergency resource allocation and distribution scheduling under resource constraint that maximizes the cumulative loss saving L (⋅) , possibly at every point in time. As the statement in section 1, the reward of emergency rescue for each wounded person located in various regions is high sensitive with every unit allocated and distributed resource, especially of the scarce medicine for immediately response the aftermath large-scale disaster. In order to solve this crux during the immediate post-disaster response phrase, we here use the exponential function as marginal loss saving in our objective, so as to deeply highlight the marginal rewards of each unit emergency resource in the context of demand surge but limited available resource. The marginal loss-saving function can be written as
l j (t ) = e
− rj t
, ∀j ∈ D
(4)
rj represents the sensitive level of marginal loss-saving function to a unit arriving resource in demand node j at time t (we call it as marginal
Here, the parameter
loss-saving quotient). Fig.1 demonstrates the different levels of marginal loss saving for a unit arriving resource, when parameter rj is given different values in demand node 1, 2, 3 and 4. The areas surrounded by each curve and t-axis are gradually decreasing with the time increasing, i.e., the marginal loss saving to each unit arriving resource is decreasing. Hence, the more delay the resource arrives, the less effectiveness will be gained, which is also obey the timeliness rule in emergency response. Moreover, we can also observe that the four demand nodes have remarkable difference in marginal loss-saving value to unit arriving resource compared with the time horizontal, so those value differences could make the resource allocated and distributed to the demand Fig. 1 Variation Trend of Marginal Loss nodes with high marginal loss-saving Saving with Exponential Function value, so as to maximize the utility of scarcer emergency resource through maximizing marginal loss saving in the objective function. 1
0.9
Marginal Loss Saving Value
0.8 0.7
r1=0.6
0.6
r2=0.8
0.5 0.4
r3=1
r4=1.6
0.3 0.2 0.1 0
0
0.5
1
1.5
2
2.5 3 Time t (hour)
3.5
4
4.5
5
150
Y. Jiang and L. Zhao
Hence, in this section, we have formulated the resource allocation by introducing exponential marginal loss saving into the objective function, which can realize the maximal usage of scarcer emergency resource so as to reduce the total loss caused by a disaster. In the following section, we will model the emergency distribution network for distributing the emergency resource from supply nodes and transfer nodes to demand nodes, through incorporating the time-space network model.
3.2 Emergency Distribution Based on Time-Space Network G = (V , A) operatt ∈ {0," , T } , where V are all
Consider a general emergency distribution logistics network ing during the finite time scheduling horizon
nodes included in the network, and A are arcs of various pairs of nodes. In practice, the emergency distribution logistics network commonly contains three types of nodes, supply nodes, transfer nodes and demand nodes, denoted as S , T ′ and D respectively. Based on this network structure, we redefine it by using the model of time-space network in which a node is defined as ( j , t ) , i.e. the two dimensions of time and space, and an arc is represented as tion connection of node (m, n) to denoted by
(m, n, j , t ) , i.e. a distribu-
( j , t ) . Moreover, the network flow in the arc is
xmnjt . According to those definitions, we can formulate the mathe-
matical model of the emergency distribution network as following:
max
L(⋅)
(5)
Constraints:
∑
( m , n ):( m , n , j ,t )∈A
xmnjt − ∑ ( m ,n ):( j ,t ,m, n )∈A x jtmn = z jt − s jt ,
∑
t
z jt ≤ d j , xmnjt ≥ 0,
∑
∀j ∈ D , t ∈ {1," , T } ∀(m, n, j , t ) ∈ A
∀( j, t ) ∈ V
(6) (7) (8)
z jt ≥ 0,
∀j ∈ D , t ∈ {1," , T }
(9)
z jt = 0,
∀j ∉ D , t ∈ {1," , T }
(10)
j∈S
s jT − z j +1,T = 0,
( j + 1, T ) ∉ V
(11)
In this model, the objective in Eq. (5) is maximizing total cumulative loss saving during the emergency logistics activity. For each constraint, Eq. (6) represents
Emergency Distribution Scheduling
151
the network flow conversation, where the variable
s jt denotes the supply amount
j and time t ; Eq. (7) shows that the total arriving amount of emergency resource to every demand node j should not exceed its demand requirement d j ; Eq. (8), (9) ensure the network flow, and the arriving of emergency resource at node
emergency resource to demand nodes is always nonnegative; Eq. (10) makes sure that the variable denoted the emergency resources arriving in some nodes equals zero if those nodes don't belong to the set of the demand nodes; Eq. (11) is an extra constraint to insure the validity of the network flow when s jt > 0 at the end of the time scheduling horizon, i.e.
t = T , where variable z j +1,T is an artificial node
and not reflected in the objective function. With the Eq. (2), (3) discussed in section 3.2, we can combine the definition of the total cumulative loss-saving function and the time-space-based emergency distribution by the following model.
max
∑ ∫
T
j∈D 0
e
−rjt
dz jt
(12)
st.(6), (7), (8), (9), (10), (11)
4 Numerical Examples In this section, we give some numerical examples to verify the proposed model in this paper. Simply but without loss of generality, here we suppose the geographical emergency distribution network is composed of two supply nodes (A, B), two transfer nodes (C, D) and two demand nodes (E, F). And the scheduling of emergency resource allocation and distribution is considered within five hours (T=5). Moreover, the supply amounts in A, B are 80, 100 at the beginning of the time schedule horizon, and the demand requirements in E, F are 180, 150, and the distribution time is given in Table 1. In order to illustrate the dynamical process of resource arriving in nodes F, G at different times according to the principle of maximizing marginal loss saving, we design the numerical case through setting the marginal loss-saving quotient as rE=1, and give a series of decreasing number for the comparing analysis, such as rF=0.8, 0.5, 0.1, then analyze the combinatorial optimal solution through compare the scheduling of resource allocation and distribution in different scenarios. For the example setting above, this model is solved with the Optimization Table 1 Distribution Time (in hours) Toolbox in MATLAB 2009a, and computed by using Intel(R) Core(TM) 2 Duo C D E F CPU T7100 with 1.8 GHz and 1 GB A 1 2 C 1 2 main memory. The optimal solutions of the emergency resource allocation and B 2 2 D 3 2 distribution scheme are given in Table 2.
152
Y. Jiang and L. Zhao
Table 2 Scheduling of Emergency Resource Allocation and Distribution (in units) Case Design
Scheduling of Emergency Resource
Optimal Value
rE=1, rF=0.8
80 units from A0 to C1, then arrive in E2; 100 units from B0 to C2, then arrive in E3
15.804
rE=1, rF=0.5
80 units from A0 to C1, then arrive in F3; 100 units from B0 to C2, then distribute 30, 70 from C2 to E3, F3 respectively
28.813
rE=1, rF=0.1
80 units from A0 to C1, then distribute 30, 50 from C1 to E3, F3 respectively; 100 units from B0 to C2, then arrive in F4
108.129
As the distribution time given in Table 1, we can know that the shortest path from supply node A to demand E, F is two hours and three hours, and from B to E, F is three hours and four hours, respectively, i.e., compared with demand node E, the distance of demand node F is farther away from the supply location. Therefore, in common sense, the allocation and distribution of resources would trend to firstly consider the near disaster affected areas when the resource is scarce and limited so as to save more people’s lives and properties in a short time. However, according to the results given in Table 2, we can find a phenomenon different with the common intuition that the demand node F is more and more considered with the variation of marginal loss-saving quotient, though the geographical location of node F is far away from supply areas, and the value of the reward is also individually increased with more emergency resources arriving in demand node F. This result is caused by the marginal loss saving discussed in section 3.1. Here we illustrate the reasons through figures of the numerical simulation. Fig.2 graphically depicting the variation process of different marginal losssaving functions, we can observe that the surface in each sub-figure rises higher with the quotient of marginal loss saving decreasing from 1 in Fig.2 (a) to 0.1 in Fig.2 (d). According to the definition in Eq.(3) and (4), the volume surrounded by the curved surface and the surface of z − t quadrant, actually is the cumulative marginal loss saving within the whole time scheduling horizon. Hence, the higher value of marginal loss-saving quotient, the higher the curved surface of cumulative marginal loss saving rises (i.e. higher value of objective function), which results in that more and more resources are allocated and distributed to Fig. 2 Scenarios of Marginal Loss Saving Quotient: the disaster affected area F. (a) r=1; (b) r=0.8; (c) r=0.5; (d) r=0.1 (a)
(b)
1500
total loss saving
total loss saving
1500 1000
500 0 15
1000 500 0 15
15
10
0
time t (in hour)
15
10
10
5
0
10
5
5
5
0
time t (in hour)
arriving resourece z
(c)
0
arriving resourece z
1000 800
(d)
600
1500
total loss saving
total loss saving
1500 1000
500 0 15
400
1000
200
500
0
0 20
15
10
10
5
time t (in hour)
15
10
10
5
0
0
5
arriving resourece z
time t (in hour)
0
0
arriving resourece z
Emergency Distribution Scheduling
153
Therefore, through the principle of maximizing marginal loss saving, the process of scarcer resource allocation and distribution among all the disaster areas is a combinatorial problem, and how many resources one demand node could be allocated and distributed only depends on its corresponding marginal loss-saving value of one unit resource. Obviously, the final solution is a combinatorial optimal result, which not only efficiently uses every unit of scarce resource but also considers the equality of life saving for every disaster area whether they are located close to the supply areas or not.
5 Conclusions and Discussions In this paper, the scarce resource allocation and distribution scheduling problem in immediate post-disaster response is investigated. Taking the effect of maximizing marginal loss saving into consideration, we have proposed an integrated emergency resource allocation and distribution model, with an exponential marginal loss-saving function. Then we explored the combinatorial optimal solution and analyzed the sensitive of marginal loss-saving function through three designed cases, and a relatively good result is achieved from the comparison. The proposed model in this paper not only bridges the gap of emergency resource in emergency logistics, but also provides an effective decision tool of scheduling emergency resources for relevant officials and practitioners. It is necessary to point out some limitations of this research. First of all, the capacity of roads and vehicles are neglected in this paper, but actually these two factors should be considered in practical decision making. So, how to construct a mathematical model with the capacity constraint is a problem that needs be solved. Then, Optimization Toolbox of MATLAB has the limitation of variable amount to solve combinatorial problem. Hence, developing a new heuristic algorithm is the next job. Finally, we don’t consider the emergency inventory in the proposed model. All these areas represent our future research directions.
References 1. Sheu, J.B.: Challenges of emergency logistics management. Transport Res E-Log (2007), doi:10.1016/j.tre.2007.01.001 2. Barbarosoglu, G., Özdamar, L., Çevik, A.: An interactive approach for hierarchical analysis of helicopter logistics in disaster relief operations. Eur. J. Oper. Res (2002), doi:10.1016/S0377-2217(01)00222-3 3. Tzeng, G.H., Cheng, H.J., Huang, T.D.: Multi-objective optimal planning for designing relief delivery systems. Transport Res. E-Log. (2007), doi:10.1016/j.tre.2006.10.012 4. Chang, M.S., Tseng, Y.L., Chen, J.W.: A scenario planning approach for the flood emergency logistics preparation problem under uncertainty. Transport Res. E-Log (2007), doi:10.1016/j.tre.2006.10.013 5. Yuan, Y., Wang, D.W.: Path selection model and algorithm for emergency logistics management. Compu. Ind. Eng. (2009), doi:10.1016/j.cie.2008.09.033
154
Y. Jiang and L. Zhao
6. Nolz, P., Doerner, K., Gutjahr, W., Hartl, R.: A bi-objective metaheuristic for disaster relief operation planning. Adv. Adv. in Multi-Obj. Nature Inspired Computing (2010), doi:10.1007/978-3-642-11218-8_8 7. Yi, W., Kumar, A.: Ant colony optimization for disaster relief operations. Transport Res. E-Log. (2007), doi:10.1016/j.tre.2006.05.004 8. Yi, W., Ozdamar, L.: A dynamic logistics coordination model for evacuation and support in disaster response activities. Eur. J. Oper. Res (2007), doi:10.1016/j.ejor.2005.03.077 9. Chiu, Y., Zheng, H.: Real-time mobilization decisions for multi-priority emergency response resources and evacuation groups: model formulation and solution. Transport Res. E-Log (2007), doi:10.1016/j.tre.2006.11.006 10. Yan, S.Y., Shih, Y.L.: Optimal scheduling of emergency roadway repair and subsequent relief distribution. Comput. Oper. Res (2009), doi:10.1016/j.cor.2008.07.002 11. Ozdarmar, L., Ekinci, D., Kucukyazici, B.: Emergency logistics planning in Natural Disasters. Ann. Oper. Res. (2004), doi:10.1023/B:ANOR.0000030690.27939.39 12. Barbarosoglu, G., Arda, Y.: A two-stage stochastic programming framework for transportation planning in disaster response. J. Oper. Res. Soc. (2004), http://www.jstor.org/stable/4101826 13. Sheu, J.B.: An emergency logistics distribution approach for quick response to urgent relief demand in disasters. Transport Res. E-Log. (2007), doi:10.1016/j.tre.2006.04.004 14. Sheu, J.B.: Dynamic relief-demand management for emergency logistics operations under large-scale disasters. Transport Res. E-Log. (2010), doi:10.1016/j.tre.2009.07.005 15. Gong, Q., Batta, R.: Allocation and reallocation of ambulances to casualty clusters in a disaster relief operation. IIE Trans. (2007), doi:10.1080/07408170600743938 16. Arora, H., Raghu, T.S., Vinze, A.S.: Resource allocation for demand surge mitigation during disaster response. Decis. Support Syst (2010), doi:10.1016/j.dss.2010.08.032 17. Wright, P.D., Matthew, J.L., Robert, L.N.: A survey of operations research models and applications in homeland security. Interfaces (2006), doi:10.1287/inte.1060.0253 18. Green, L.V., Kolesar, P.J.: Improving Emergency Responsiveness with Management Science. Manage. Sci (2004), http://www.jstor.org/stable/30046126 19. Altay, N., Green, W.G.: OR/MS research in disaster operations management. Eur. J. Oper. Res. (2006), doi:10.1016/j.ejor.2005.05.016 20. Simpson, N., Hancock, P.: Fifty years of operational research and emergency response. J. Oper. Res. Soc. (2009), doi:10.1057/jors.2009.3
Fuzzy Control of a Wastewater Treatment Process Alina Chiroşcă, George Dumitraşcu, Marian Barbu, and Sergiu Caraman
*
Abstract. The paper deals with the fuzzy control of a wastewater treatment process in which the organic substances are removed. The process is treated as a multivariable process, having the dilution rate, the aeration rate and the recycling rate as control inputs and the dissolved oxygen and the substrate concentrations in the effluent as output variables. The process sensitivities with respect to the control inputs in the case of indirect control through the control of dissolved oxygen concentration and of the direct control of substrate were studied. In all cases fuzzy controllers were used. It was also shown that fuzzy control provides good results in the presence of disturbances and parametric uncertainties of the model.
1 Introduction The wastewater treatment is extremely important for humanity, having a direct impact on life. Therefore, this problem should be treated with great responsibility and every producer has to improve his treatment processes. During the past years, various types of treatment were developed. This domain has almost no technological secrets, but the big challenge resides in the process control. Generally, the wastewater treatment aims to remove organic substances, ammonium, phosphorus and other residuals from the industrial environment and urban or rural communities. Wastewater treatment processes are very complex, strongly nonlinear and characterized by uncertainties regarding its parameters (Goodman and Englande 1974). There are many models, in specialized literature, that try to capture as closely as possible the evolution of the wastewater treatment processes with active sludge (Henze et al 2000). The modeling of these processes is made globally, considering the nonlinear dynamics, but trying in the same time to simplify the models used in control (Barbu 2009). Alina Chiroşcă · George Dumitraşcu · Marian Barbu · Sergiu Caraman Department of Automatic Control, “Dunărea de Jos” University of Galaţi Domnească Street No. 47, 800008 Galaţi, România e-mail:
[email protected] *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 155–163. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
156
A. Chiroşcă et al.
The problem of wastewater treatment process control is also difficult because of low repeatability rate, slow responses and the lack or the high cost of measuring instruments for the state variables of the bioprocesses (biomass concentration, COD concentration etc.). Therefore, advanced and robust control laws, that usually include in their structure state and parameter observers, are currently used to control these processes. Varies approaches are known in the literature having the control of wastewater treatment processes as main objective: PI and PID-control, fuzzy logic, nonlinear control, model based control etc. (Olsson and Newell 1999). Considering the advantages offered by robust QFT method, this was also used in the control of wastewater treatment processes, but it was applied just in simulation regime (Garcia-Sanz et al 2008), (Barbu and Caraman 2007). As mentioned before, the existing major uncertainties that characterize such processes makes fuzzy control a viable alternative for the control of wastewater treatment processes. The paper’s objective is to perform an analysis on how the wastewater treatment process can be controlled more efficiently using fuzzy control techniques, considering a wastewater treatment process for the removal of organic substances. The paper is structured as follows: the second section presents the wastewater treatment process and its mathematical model, the third section contains an analysis of the process sensitivities using fuzzy control and the last is dedicated to the conclusions.
2 The Model of the Wastewater Treatment Process Figure 1 presents the main components of the wastewater treatment process (Katebi et al 1999):
Fig. 1 Activated Sludge Process
The Aeration Tank is a biological reactor containing a mixture of liquid and suspended solid where a microorganism’s population (the sludge) is developed aiming to remove the organic substrate from the mixture. The Clarifier Tank is a gravity settlement tank where the sludge and the clear effluent are separated. A part of the removed sludge is recycled back to the aeration tank and the other part is removed. The process model has been determined based on mass balance equations. It is given by the following equations:
Fuzzy Control of a Wastewater Treatment Process
157
dX = μ (t )X (t ) − D (t )(1 + r )X (t ) + rD(t )X r (t ) dt
(2.1)
μ (t ) dS =− X (t ) − D(t )(1 + r )S (t ) + D(t )S in dt Y
(2.2)
K μ (t )X (t ) dDO =− 0 − D(t )(1 + r )DO(t ) + αW (DOmax − DO(t )) + D(t )DOin dt Y dX r = D(t )(1 + r )X (t ) − D(t )(β + r )X r (t ) dt
μ (t ) = μ max
S (t ) DO(t ) k s + S (t ) K DO + DO(t )
(2.3)
(2.4)
(2.5)
where: X(t) – biomass (the sludge); S(t) – substrate (organic substance concentration); DO(t) – dissolved oxygen concentration; DOmax – maximum dissolved oxygen concentration; Xr(t) – recycled sludge; D(t) – dilution rate; Sin and DOin – substrate and dissolved oxygen concentrations in the influent; Y – biomass yield factor; μ – biomass growth rate; μmax – maximum specific growth rate; ks and KDO – saturation constants; α – oxygen transfer rate; W – aeration rate; K0 – model constant; r and β – ratio of recycled and waste flow to the influent. The model coefficients are set to the following values: Y=0.65, β=0.2, α=0.018, K0=0.5, KDO=0.5, μmax=0.15mg/l, ks=100mg/l, DOmax=10mg/l, r=0.6. The initial conditions considered in simulation are: X(0)=200mg/l, S(0)=88mg/l, DO(0)=5mg/l, Xr(0)=320mg/l, DOin=0.5mg/l and Sin=200mg/l. Figure 2 presents the systemic scheme of the wastewater treatment process:
Fig. 2 The systemic scheme of the wastewater treatment process
158
A. Chiroşcă et al.
The simulation results of the model are presented in Figure 3.
Fig. 3 The simulation results of the open loop model
3 Fuzzy Control of the Wastewater Treatment Process: Simulation Results Fuzzy theory was introduced by Zadeh in 1965 (Zadeh 1965) and no one foresaw the development of this area and the multitude of applications, many of them being in process control. The first applications in process control are attributed to Mamdani (Mamdani 1976) and they are based on the operator expertise. This expertise can be better modeled by using a fuzzy controller than a conventional one. Practically, within fuzzy control, the human expertise is converted in rules, which is the rule-base of the controller. Research in human expert behavior showed that its behavior is strongly nonlinear, with effects of anticipation, delays and even adapting to the concrete operating conditions. The refinement of the linguistic characterization and the interpretation of the control determining process are parameters that can modify the fuzzy controller properties (Preitl and Precup 1997). As it can be seen in Figure 2, the wastewater treatment process considered in the paper can be controlled through the following three control variables: dilution rate, aeration rate and recycling rate of the sludge. The output variables are dissolved oxygen concentration and substrate concentration (organic substance concentration) from the effluent. The substrate concentration can be controlled in two ways: indirectly through the dissolved oxygen concentration using as control variable the aeration rate and directly, using as control variables the dilution rate or the recycling rate. In both cases, the output variables were controlled using quasiPI fuzzy controllers, as shown in Figure 4:
Fuzzy Control of a Wastewater Treatment Process
159
Fig. 4 Wastewater treatment process control system
Control of dissolved oxygen concentration
The control of dissolved oxygen concentration has been achieved using the aeration rate as control variable. The membership functions for the inputs (error and error derivative) and output are shown in Figures 5a and b. Figure 6 presents the simulation results for a DO setpoint equal to 2mg/l.
Fig. 5a Input membership functions
Fig. 5b Output membership functions
Fig. 6 Simulation results regarding the control of dissolved oxygen concentration
160
A. Chiroşcă et al.
Control of the substrate
The control of the substrate has been achieved using the dilution rate and the recycling rate successively, as a control variable. In both cases, the substrate setpoint was considered as 10mg/l. The simulation results are shown in Figure 7 (the case when the dilution rate was used as control variable) and Figure 8 (the recycling rate was used as control variable).
Fig. 7 Simulation results regarding substrate control (dilution rate is the control variable)
Fig. 8 Simulation results regarding substrate control (recycling rate is the control variable)
Dissolved oxygen and substrate control taking into account process and model uncertainties
The following cases were conducted: • Control of dissolved oxygen concentration when Sin value increases from 200mg/l to 250mg/l (DO setpoint has the same value – 2mg/l) Figure 9 presents the simulation results. • Control of dissolved oxygen concentration when μmax value changes from 0.15 to 0.2 (DO setpoint has the same value – 2mg/l) Simulation results are presented in Figure 10.
Fuzzy Control of a Wastewater Treatment Process
Fig. 9 DO control when Sin value has been been modified from 200mg/l to 250mg/l
161
Fig. 10 DO control when μmax value has changed from 0.15 to 0.2
• Control of substrate concentration using as control variable the dilution rate, when Sin increases its value from 200mg/l to 250mg/l (S setpoint has the same value – 10mg/l) Figure 11 presents the simulation results.
Fig. 11 Substrate control when Sin value has been modified from 200mg/l to 250mg/l (dilution rate is the control variable)
Fig. 12 Substrate control when μmax value has been modified from 0.15 to 0.2 (dilution rate is the control variable)
162
A. Chiroşcă et al.
• Control of substrate concentration using as control variable the dilution rate, when μmax value changes from 0.15 to 0.2 (S setpoint has the same value – 10mg/l)
Simulation results are presented in Figure 12. • Control of substrate concentration using as control variable the recycling rate, when Sin increases its value from 200mg/l to 250mg/l (S setpoint has the same value – 10mg/l)
Figure 13 presents the simulation results. • Control of substrate concentration using as control variable the recycling rate, when μmax value changes from 0.15 to 0.2 (S setpoint has the same value – 10mg/l) Simulation results are presented in Figure 14.
Fig. 13 Substrate control when Sin value has been modified from 200mg/l to 250mg/l (recycling rate is the control variable)
Fig. 14 Substrate control when μmax value has been modified from 0.15 to 0.2 (recycling rate is the control variable)
4 Conclusions The present paper performed an analysis on a fuzzy control of a wastewater treatment process, where the main objective was the removal of organic substances. The process involved, is treated as a MIMO process, having as input variables the dilution rate, the aeration rate and the recycling rate and as output variables the
Fuzzy Control of a Wastewater Treatment Process
163
dissolved oxygen and the substrate concentrations in the effluent. Basically, an analysis of the process sensitivities, with respect to the input variables mentioned before, has been conducted using a simplified model (aiming to reduce organic substances). Numerical simulations have shown that fuzzy control can represent a reliable alternative to control wastewater treatment processes. In both control cases treated in this paper (indirect control through the control of dissolved oxygen concentration and the direct control of substrate), the simulation results are very good, the output variables following up the setpoints. In the second case, good results were obtained considering as control variable the dilution rate as well as the recycling rate. The simulations have shown that when the dilution rate was used as control variable, better results were obtained in terms of dynamic regime, so this control scheme would be recommended. It was also shown that fuzzy control provides good results in the presence of disturbances and parametric uncertainties of the model. In this respect, two modifications have been done: both values of Sin and μmax were increased (about 25%). In all cases considered in simulations, the fuzzy controller has continued to provide the setpoint tracking.
Acknowledgments The work of Alina CHIROŞCĂ was supported by Project SOP HRD - EFICIENT 61445/2009. The work of George DUMITRAŞCU was supported by Project SOP HRD TOPACADEMIC 76822/2010.
References [1] Barbu, M., Caraman, S.: QFT Multivariabil Control of a Biotechnological Wastewater Treatment Process Using ASM1 Model. In: 10th IFAC Symposium on Computer Applications in Biotechnology, Cancun (2007) [2] Barbu, M.: Automatic Control of Biotechnological Processes. Galati University Press, Romania (2009) [3] Garcia-Sanz, M., Eguinoa, I., Gil, M., Irizar, I., Ayesa, E.: MIMO Quantitative Robust Control of a Wastewater Treatment Plant for Biological Removal of Nitrogen and Phosphorus. In: 16th Mediterranean Conference on Control and Automation, Corcega (2008) [4] Goodman, B.L., Englande, A.J.: A Unified Model of the Activated Sludge Process. Journal of Water Pollution Control Fed. 46, 312–332 (1974) [5] Henze, M., et al.: Activated Sludge Models ASM1, ASM2, ASM2d and ASM3. IWA Publishing, London (2000) [6] Katebi, M.R., Johnson, M.A., Wilke, J.: Control and Instrumentation for Wastewater Treatment Plant. Springer, London (1999) [7] Mamdani, E.H.: Applications of fuzzy controllers for control of simple dynamic plant. Proceedings of IEEE 121, 1585–1588 (1976) [8] Olsson, G., Newell, B.: Wastewater treatment systems – modelling, diagnosis and control. IWA Publishing, London (1999) [9] Preitl, S., Precup, R.E.: Introduction in process fuzzy control, vol. 151. Technical Publishing, Bucharest (1997) [10] Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965)
Interpretation of Loss Aversion in Kano’s Quality Model P´eter F¨oldesi and J´anos Botzheim
Abstract. For designing and developing products/services it is vital to know the relevancy of the performance generated by each technical attribute and how they can increase customer satisfaction. Improving the parameters of technical attributes requires financial resources, and the budgets are generally limited. Thus the optimum target can be the achievement of the minimum overall cost for a given satisfaction level. Kano’s quality model classifies the relationships between customer satisfaction and attribute-level performance and indicates that some of the attributes have a non-linear relationship to satisfaction, rather power-function should be used. For the customers’ subjective evaluation these relationships are not deterministic and are uncertain. Also the cost function are uncertain, where the loss aversion of decision makers should be considered as well. This paper proposes a method for fuzzy extension of Kano’s model and presents numerical examples.
1 Introduction In the designing process of products/services, their technical attributes must be determined so that the maximum customer satisfaction can be achieved within acceptable and reasonable financial limits. Technical attributes have different effects on the satisfaction. Kano explored [9, 10] that the features and characteristics of these relationships differ from the point of view of customers the utility functions are different as well. On the other hand customers requirements are not homogenous, they P´eter F¨oldesi Department of Logistics and Forwarding, Sz´echenyi Istv´an University, 1 Egyetem t´er, Gy˝or, 9026, Hungary e-mail:
[email protected] J´anos Botzheim Department of Automation, Sz´echenyi Istv´an University, 1 Egyetem t´er, Gy˝or, 9026, Hungary e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 165–174. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
166
P. F¨oldesi and J. Botzheim
are changing in time and also differences can be detected even in the same market segment. Because of these differences in the mathematical model for Kano’s quality assessment it is worth applying fuzzy numbers instead of crisp values. Results have been devoted to the relationship between technical attributes and customer requirements in correlation terms [6, 7], or represented the uncertainty of budgeting by fuzzy measures [19]. Assuming linear relationship [17] and [2] analyzed the issue. Reference [14] considered it as a linear problem by introducing the customer satisfaction coefficient. Reference [15] explored the asymmetric feature of the relationship between attribute-level performance and overall customer satisfaction and indicated indirectly that linear functions are not appropriate in each case. Application of fuzzy logic for ranking technical attributes is presented in [20].
2 Kano’s Quality Model In his model [10] Kano distinguishes three types of product requirements which influence customer satisfaction in different ways when met. • Degressive or “must-be” requirements are basic criteria of a product. From a given point improving the technical attributes by unit results in minor increment in satisfaction, on the other hand not fulfilling the requirements induces dissatisfaction (“negative satisfaction”). • One-dimensional requirements. Customer satisfaction is proportional to the level of fulfillment, the higher level of fulfillment, the higher the customer satisfaction, and vice versa. • Progressive or attractive/excitement factors: fulfilling these requirement leads to more than proportional satisfaction. In this paper from now on we do not use the term “quality”, because we are dealing only with technical attributes that result quality and have determined relationship to customer satisfaction. (Other cases are investigated in [6, 9, 14, 17].)
2.1 Deterministic Optimization of Kano’s Model Questions of budgeting is not a key element of Kano’s original model but we can reasonably assume that improving the level of technical attributes requires extra costs, so for each technical attribute a cost function can be set. The general target is to achieve the maximum economic result with the minimum use of resources, that is to maximize customer satisfaction with the minimum cost. The task can be mathematically formulated in two ways: (A1) maximize overall satisfaction (S) not exceeding given cost limit (C) or (A2) achieve given overall satisfaction (S) with minimum cost (C). (B) Increase customer satisfaction and decrease the cost at the same time, that is analogously to value analysis maximize S/C. This is the optimum point for the customer, since satisfaction fall versus unit cost is maximal.
Interpretation of Loss Aversion in Kano’s Quality Model
167
Mathematical Model Let
β
Si (xi ) = bi + ai · xi i
i = 1, 2, . . . , n
(1)
be the customer satisfaction generated by technical attribute xi , where • • • • •
0 < xi ∞ is a real number variable ai > 0 is a real constant βi > 0 is a real number bi is a constant such that sgn(bi ) = sgn(βi − 1) n is the number of technical attributes considered in the designing process.
Furthermore let Ci (xi ) = fi + vi · xi
i = 1, 2, . . . , n
(2)
be the cost of manufacturing technical attribute at level xi , fi ≥ 0, vi ≥ 0 are real constants. Let n
β
S = ∑ bi + ai · xi i
(3)
i=1
be the overall satisfaction and n
C = ∑ fi + vi · xi
(4)
i=1
be the total cost, according to the “fixed costs – variable costs” methodology. Then the general formula for (A) is: (A1) Let ∑ni=1 Si (xi ) → max, subject to ∑ni=1 Ci (xi ) ≤ C0 , where C0 is a given constant. (A2) Let ∑ni=1 Ci (xi ) → min, subject to ∑ni=1 Si (xi ) ≥ S0 , where S0 is a given constant. ∑n S (x ) (B) Let ∑ni=1 Cii (xii ) → max, where xi < ∞ is a given constant. i=1 In this paper we present a numerical example for (A2). For the approximation of the optimum the bacterial memetic algorithm [5] is used.
2.2 Fuzzy Extension of Kano’s Model The situation is significantly different when there are several customers at the same time (which is very often the case in practice), and some customers find a given technical attribute linear, some other assess it “must-be”, and again some others possibly consider it as an attractive factor.
168
P. F¨oldesi and J. Botzheim
If β (see equation 1, where βi means the importance of the given technical attribute xi ) is considered as a fuzzy number then the features of each technical attribute are given by the shape of the membership function of β :
μβ (β )
⎧ β −β L ⎪ ⎨ βC −βL
= ββR−−ββ ⎪ R C ⎩ 0
if βL ≤ β ≤ βC , if βC < β ≤ βR ,
(5)
otherwise.
That is (βL , βR ) is the support and βC is the core value of the fuzzy number. Thus β = (βL , βC , βR ). Previous works proved that a simple parametric representation of fuzzy exponents can be formed [3]. When designing a new product, the actual cost of achieving a given level of a technical attribute is also uncertain, and it can be varied around a core value. Thus in Equation (2) instead of crisp numbers fuzzy values must be applied: vi = (viL , viC , viR ).
3 Bacterial Memetic Algorithm Bacterial Evolutionary Algorithm (BEA) [18] uses two operators; the bacterial mutation and the gene transfer operation. The bacterial mutation operation optimizes the chromosome of one bacterium; the gene transfer operation allows the transfer of information between the bacteria in the population. BEA has been applied to a wide range of problems, for instance, optimizing the fuzzy rule bases [18], feature selection [4] and combinatorial optimization problems [12]. Local search approaches might be useful in improving the performance of the basic evolutionary algorithm, which may find the global optimum with sufficient precision in this combined way. Combinations of evolutionary and local-search methods are usually referred to as memetic algorithms [16]. A new kind of memetic algorithm based on the bacterial approach is the bacterial memetic algorithm (BMA) proposed in [5]. Bacterial memetic algorithm has been successfully applied to fuzzy rule base optimization [5] and for the traveling salesman problem and its modifications in [8]. The method seems superior to other kind of evolutionary and memetic algorithms for various problems [1]. The algorithm consists of four steps. First, a random initial population with Nind individuals has to be created. Then, bacterial mutation, local search and gene transfer are applied, until a stopping criterion is fulfilled. In case of continuous problems gradient-based method can be applied as local search. The Levenberg-Marquardt method was proposed as local search technique in the original version of BMA [5] and it is applied in this work, too.
Interpretation of Loss Aversion in Kano’s Quality Model
169
3.1 Encoding Method and Initial Population When applying evolutionary type algorithms first of all the encoding method must be defined. The evaluation of the individuals (bacteria) has to be discussed, too. The operations of the algorithm have to be adapted to the given problem. In our case the task is to find the minimum value of a function. One bacterium will be one vector in the search space, thus the bacterium contains the coordinates of the position described by real numbers. The length of the bacterium is constant and equals to the number of dimensions of the search space. In the initial population Nind random individual (bacterium) are created. We do not apply now any heuristics in the initial population creation in contrast with [8].
3.2 Evaluation of Individuals An individual is better than another individual if its evaluation value is smaller. 3.2.1
Crisp Case
In the first case crisp values are considered. If S > S0 then the evaluation function is: n
Eval1 = ∑ fi + xi · vi .
(6)
i=1
If S < S0 then penalty must be also applied: Eval1 =
n
∑ fi + xi · vi
κ
·e
S0 −S S
,
(7)
i=1
where κ is a parameter referring the strength of penalty. 3.2.2
Fuzzy Case without Loss Aversion
In the second case only the defuzzified values of vi -s are considered. If S > S0 then the evaluation function is: n
Eval2 = ∑ fi + xi · de f uzz( vi ).
(8)
i=1
If S < S0 then penalty must be also applied: Eval2 =
n
∑
κ
fi + xi · de f uzz( vi ) · e
S0 −S S
,
(9)
i=1
where de f uzz( vi ) = vi L +v3iC +viR . The fuzzy exponents in Si are computed by the fuzzy exponent calculation described in [3].
170
3.2.3
P. F¨oldesi and J. Botzheim
Fuzzy Case with Loss Aversion
In the third case the loss aversion is also considered. Let CC = ∑ni=1 fi + xi · viC , CL = CC − ∑ni=1 xi · (viC − viL ), CR = CC + ∑ni=1 xi · (vi R − viC ), C = (CL ,CC ,CR ). Let us denote by D the defuzzified value of C: = D = de f uzz(C)
CL +CC +CR . 3
(10)
There is always a penalty for the loss aversion. If S > S0 then the evaluation function is:
Eval3 = D · 2 − exp −C · (CR −CL ) · (CR − CC )w · K . (11) If S < S0 then the other penalty because of the constraint must be also applied:
S0 −S
κ S Eval3 = D · 2 − exp −C · (CR − CL ) · (CR − CC )w · K · e ,
(12)
where w and K are positive parameters. Multiplication of a triangular fuzzy number C by scalars gives a triangular fuzzy number, and Eval3 is a scalar based on the fuzzy exponent calculation described in [3].
3.3 Bacterial Mutation Bacterial mutation is applied to all bacteria one by one. First, Nclones copies (clones) of the bacterium are created. Then, a random segment of length lbm is mutated in each clone except one clone which is left unmutated. After mutating the same segment in the clones, each clone is evaluated. The clone with the best evaluation result transfers the mutated segment to the other clones. These three steps operations (mutation of the clones, selection of the best clone, transfer of the mutated segment) are repeated until each segment of the bacterium has been mutated once. At the end, the best clone is kept as the new bacterium and the other clones are discharged.
3.4 Levenberg-Marquardt Method The Levenberg-Marquardt method is a gradient based optimization technique proposed originally by Levenberg [11] and Marquardt [13] for least-square estimation of non-linear parameters. Our task is to find the minimum value of a function. The method is applied to all bacteria one by one. For a given bacterium x[k] in the k-th iteration step the update vector is:
Interpretation of Loss Aversion in Kano’s Quality Model
171
s[k] = −[J(x[k]) · J(x[k])T + γ [k] · I]−1 · J(x[k]),
(13)
where J(x[k]) is the gradient vector of x[k], γ is a parameter initially arbitrary set to any positive value (γ [1] > 0) and I is the identity matrix. We approximate the derivates in J(x[k]) by finite differences. After the update vector was computed we calculate the so-called trust region, r[k] as follows: r[k] = Eval(x[k]+s[k])−Eval(x[k]) . The value of parameter γ is adjusted J(x[k])T ·s[k]) dynamically depending on the value of r[k]: • If r[k] < 0.25 then γ [k + 1] = 4γ [k] • If r[k] > 0.75 then γ [k + 1] = γ [k]/2 • Else γ [k + 1] = γ [k] If Eval(x[k] + s[k]) < Eval(x[k]) then x[k + 1] = x[k] + s[k], else x[k + 1] = x[k]. If the stopping condition (||J(x[k])|| ≤ τ ) is fulfilled or a predefined maximum generation number is reached then the algorithm stops, otherwise it continues with the (k + 1)-th iteration step. The search direction varies between the Newton direction and the steepest direction, according to the value of γ . If γ → 0, then the algorithm converges to the Newton method, if γ → ∞, then it gives the steepest descent approach.
3.5 Gene Transfer First, the population must be sorted and divided into two halves according to their evaluation results. The bacteria with better evaluation are called superior half, the bacteria with worse evaluation are referred to as inferior half. Then, one bacterium is randomly chosen from the superior half and another from the inferior half. These two bacteria are called the source bacterium, and the destination bacterium, respectively. A segment of length lgt from the source bacterium is randomly chosen and this segment is used to overwrite the same segment of the destination bacterium. The above steps (sorting the population, selection of the source and destination bacteria, transfer the segment) are repeated Nin f times, where Nin f is the number of “infections” per generation.
4 Numerical Example Find x1 , x2 , . . . , xn that minimize C subject to S0 ≤ S. Limits of search space: Si (xi ) ≥ 0, i = 1, . . . , n that is if bi < 0 then xi >
βi
β
−bi ai .
Assume if x j = 0 and xi = 0 for i = (1, . . . , n) ∧ i = j then b j + a j · x j j ≥ S0 . If S −b x j ≥ β j 0a j j then x j satisfies the constraint alone, thus there is no point in its S further increase. For practical reason the upper limit for x j is reduced to
βj
√0 −b j n
aj
.
172
P. F¨oldesi and J. Botzheim
The function parameters used in the simulations are presented in Table 1. The number of dimensions is 15. Table 2 shows the BMA parameters. The results are presented in Table 3. P1 means the crisp case, where Eval1 is used (equations 6 and 7), P2 is the fuzzy case without loss aversion, where Eval2 is used (equations 8 and 9), and P3 is the fuzzy case with loss aversion, where Eval3 is used (equations 11 and 12). In every case S0 = 2500 and κ = 1.2 in the penalty factors. In the fuzzy exponent calculations no distortion was applied (λ = 1). The parameters of the loss aversion are: w = 0.5 and K = 0.000000001. From Table 3 it can be seen, that technical parameters with good features in fuzzy sense are preferred in the fuzzy evaluation (e.g x2 , x8 ) compared to the crisp version. By increasing the alpha-cuts the difference is decreasing (e.g. x4 ) since the fuzzy features have less impact on the decision making.
Table 1 Function Parameters
b a βL βC βR f vL vC vR
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
50 4.6 1.15 1.25 1.3 20 9 10 13
40 4.4 1.25 1.3 1.4 30 10 12 13
30 4.2 1.3 1.35 1.4 40 13 14 15
20 4.9 1.2 1.4 1.45 50 15 16 19
10 6 1.3 1.45 1.5 50 17 18 21
20 10 0.9 1 1.1 10 8 9 10
20 10 0.9 1 1.1 20 8 9 10
20 8 0.9 1 1.2 10 6 9 10
20 7 0.9 1 1.2 20 7 9 10
20 10 0.9 1 1.1 10 8 9 10
-10 20 0.7 0.75 0.85 10 6 8 9
-8 10 0.65 0.7 0.75 10 6 7 8
-8 22 0.6 0.65 0.75 5 4 6 7
-6 15 0.5 0.6 0.65 5 4 5 7
-5 20 0.4 0.55 0.6 5 3 4 6
Table 2 BMA Parameters Parameters Ngen
Nind
Nclones lbm
Nin f
lgt
γ init. max LM iter.
τ
Value
50
12
12
1
1
0.0001
300
1
8
Table 3 Results x1
x2
x3 x4
P1 48.94 0 0 P2 0 42.99 0 P3 0 44.15 0 P2 0 44.16 0 P3 0 44.16 0 P2 0 41.80 0 P3 0 44.16 0
x5
x6
x7
x8
x9 x10
31.93 24.91 3.95 4.54 0 0 0 24.91 5.77 0.95 59.12 0 0 0.00 0.42 2.89 78.18 0 0 24.91 21.68 31.65 0 0 0 24.91 3.84 33.66 0 0 31.93 24.91 8.28 3.28 0.04 0 31.93 24.91 4.71 9.94 0.16 0
x11
x12 x13
2.50 8.83 0.90 2.10 16.85 0.87 40.87 8.35 0.79 16.20 13.48 0.91 35.49 12.07 0.90 2.32 7.94 0.87 1.31 3.07 0.91
x14 x15 α
11.29 4.19 7.45 2.97 15.13 2.73 10.86 3.59 8.76 2.93 14.14 3.76 12.28 2.55
8.97 5.36 4.90 7.34 5.91 8.03 8.48
– 0 0 0.5 0.5 1 1
Interpretation of Loss Aversion in Kano’s Quality Model
173
5 Conclusions The benefit of fuzzy extension and the interpretation of loss aversion can be measured by the advantage we obtain analyzing the outputs. Difference between overall cost values is to be considered, but what is more important the structure of technical attribution has to be examined. Customer satisfaction is the key element of profitability and in the first step of achieving this satisfaction is based in the designing and resource allocating process. The customers’ assessment of technical attributes is very uncertain especially at the beginning of product life-cycle so in Kano’s model the exponents of satisfaction functions cannot be considered as deterministic values. We propose the fuzzy extension of the model in order to explore the possible alternative sets of technical attributes. In the numerical example the fuzzy solution is significantly different from the crisp (deterministic) version, not only in terms of total cost but – what more important is – in terms of technical attribute levels.
References [1] Bal´azs, K., Botzheim, J., K´oczy, L.T.: Comparative investigation of various evolutionary and memetic algorithms. In: Rudas, I.J., Fodor, J., Kacprzyk, J. (eds.) Computational Intelligence in Engineering. SCI, vol. 313, pp. 129–140. Springer, Heidelberg (2010) [2] Bode, J., Fung, R.Y.K.: Cost engineering with quality function deployment. Computers and Industrial Engineering 35, 587–590 (1998) [3] Botzheim, J., F¨oldesi, P.: Parametric representation of fuzzy power function for decision-making processes. In: Proceedings of the 7th International Symposium on Management Engineering, ISME 2010, Kitakyushu, Japan, pp. 248–255 (2010) [4] Botzheim, J., Drobics, M., K´oczy, L.T.: Feature selection using bacterial optimization. In: Proceedings of the International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, IPMU 2004, Perugia, Italy, pp. 797–804 (2004) [5] Botzheim, J., Cabrita, C., K´oczy, L.T., Ruano, A.E.: Fuzzy rule extraction by bacterial memetic algorithms. In: Proceedings of the 11th World Congress of International Fuzzy Systems Association, IFSA 2005, Beijing, China, pp. 1563–1568 (2005) [6] Chen, L., Weng, M.C.: An evaluation approach to engineering design in QFD processes using fuzzy goal programming models. European Journal of Operational Research 172, 230–248 (2006) [7] Conklin, M., Powaga, K., Lipovetsky, S.: Customer satisfaction analysis: Identification of key drivers. European Journal of Operational Research 154, 819–827 (2004) [8] F¨oldesi, P., Botzheim, J.: Modeling of loss aversion in solving fuzzy road transport traveling salesman problem using eugenic bacterial memetic algorithm. Memetic Computing 2(4), 259–271 (2010) [9] Hauser, J.R., Clausing, D.: The house of quality. Harvard Business Review, 63–73 (1988) [10] Kano, N., Seraku, N., Takahashi, F., Tsuji, S.: Attractive quality and must-be quality. The Journal of Japanese Society for Quality Control 14(2), 39–48 (1984) [11] Levenberg, K.: A method for the solution of certain non-linear problems in least squares. Quart. Appl. Math. 2(2), 164–168 (1944)
174
P. F¨oldesi and J. Botzheim
[12] Luh, G.C., Lee, S.W.: A bacterial evolutionary algorithm for the job shop scheduling problem. Journal of the Chinese Institute of Industrial Engineers 23(3), 185–191 (2006) [13] Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Indust. Appl. Math. 11(2), 431–441 (1963) [14] Matzler, K., Hinterhuber, H.H.: How to make product development projects more successful by integrating kano’s model of customer satisfaction into quality function deployment. Technovation 18, 25–38 (1998) [15] Matzler, K., Bailom, F., Hinterhuber, H.H., Renzl, B., Pichler, J.: The asymmetric relationship between attribute-level performance and overall customer satisfaction: a reconsideration of the importance-performance analysis. Industrial Marketing Management 33, 271–277 (2004) [16] Moscato, P.: On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. Tech. Rep. Caltech Concurrent Computation Program, Report. 826, California Institute of Technology, Pasadena, California, USA (1989) [17] Moskowitz, H., Kim, K.J.: QFD optimizer: A novice friendly quality function deployment decision support system for optimizing product designs. Computers and Industrial Engineering 32, 641–655 (1997) [18] Nawa, N.E., Furuhashi, T.: Fuzzy system parameters discovery by bacterial evolutionary algorithm. IEEE Transactions on Fuzzy Systems 7(5), 608–616 (1999) [19] Tang, J., Fung, R.Y.K., Xu, B., Wang, D.: A new approach to quality function deployment planning with financial consideration. Computers & Operations research 29, 1447–1463 (2002) [20] Zhou, M.: Fuzzy logic and optimization models for implementing QFD. Computers and Industrial Engineering 35, 237–240 (1998)
MCDM Applications on Effective Project Management for New Wafer Fab Construction Mei-Chen Lo and Gwo-Hshiung Tzeng
*
Abstract. This paper presents a multicriteria analysis approach for finding a route of evaluation on construction project in semiconductor manufacturing fabrication (fab). The Analytic Hierarchy Process (AHP) method is used to determine the weights for evaluation criteria among decision makers, including fab construction team, users and top management. The subjectivity and vagueness in the categorization process is dealt with by using fuzzy numbers for linguistic terms. This empirical study orients a project management on design change requirement (DCR) during construction a new wafer fab in Taiwan, and illustrates the effectiveness of the proposed approach. This study provides empirical evidence on project management effectiveness with the intent of contributing to a better understanding and improvement of project management practices. It looks into the causes of DCR and then provides possible solutions to minimize the occurrence of the DCR in the future fab construction project. Keywords: Performance Evaluation, Analytic Hierarchy Process (AHP), Multiple Criteria Decision Making (MCDM), Design Change Requirement (DCR), Wafer fab, Semiconductor.
1 Introduction Semiconductor industry is a huge investment; to get leading position in the industry, to make an expansion of enterprise, to sustain marketing competitiveness, building a new wafer fab is the most popular way. The important considerations for capacity expansion include system design and the utility supply such as, water, gas, electricity and chemical system, civil foundation and the facilities. In fact, Mei-Chen Lo Department of Business Management, National United University No. 1, Lienda Rd., Miaoli 36003, Taiwan e-mail:
[email protected] *
Mei-Chen Lo · Gwo-Hshiung Tzeng Institute of Project Management, Kainan University No. 1, Kainan Road, Luchu, Taoyuan County 338, Taiwan e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 175–184. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
176
M.-C. Lo and G.-H. Tzeng
during the time of building a new fab, the project faces various problems like different timing, different place, different human factors and different background. In wafer fab construction engineering, preliminary planning and design is a highly professional engineering service, which involves enormous amount of specialized effort. Although judging the quality and budget of the new building fab may be subjective, tender evaluation of the design change requirement (DCR) is even more so. In current methods of budgeting DCR, enterprise relies only on a panel of experts to perform the evaluation, Thus, an effective evaluation procedure is essential to promote the decision quality. This work examines the group decisionmaking process and proposes a multi-criteria framework for budgeting the DCR during the building scheme. To deal with the qualitative attributes in subjective judgment, this work employs Analytic Hierarchy Process (AHP) to determine the weights of decision criteria for each related group, including the Facilities’, users’ and top management. Then the Multiple Criteria Decision Making (MCDM) approach is used to synthesize the group decision. This process enables decision makers to formalize and effectively solve the complicated and multi-criteria problem, of managing the capital budgeting for the period of building new wafer fab, decreasing erroneous decisions and the risky significant design changes. An empirical case study for a new building project is illustrated. The underlying concepts applied were intelligible to the decision making groups, and the computation required is straightforward and reasonable. The study shows that the diversity and complexity of DCR occurrences can be understood in terms of five fundamental problems (Lo et al., 2002b): design change, requirement change, construction change, strategic consideration and package schedule issue. DCR are designed to address these problems as they appear in different subsystems of engineering interface. Next, the decision-making process of DCR projects is rigorously modeled as a game played between the company and construction team. The purpose of this modeling is to investigate how key project decisions are made, how they interact with important characteristics of the project, and what is their impact on project performance. This study combines KJ method (Kawakita, 1975), AHP and MCDM to analyze and evaluate the feasible ratio between the capital budget and DCR expenditure. Base on our evaluation result to proceed further intensive discussion with experts, the consensus of acceptable expenditures for the reasonable DCR rate is around 2.5%~3% of total annual capital budget and $ 5~6 million can be saved with management improvement (Lo et al., 2002a). This modeling is proven to be highly flexible and yields insights how DCR works in practice.
2 Construction Project Project management has emerged as field of practice that is being used increasingly by organizations to achieve their business goals. As organizations define more of their activities as projects, the demand for project managers grows, and there is increasing interest in project management competence (Crawford, 2005). Fulmer (2000) describes that business as a complex adaptive system in which the
MCDM Applications on Effective Project Management
177
future cannot be predicted or controlled; all organizations are facing greater volatility and uncertainty and that there is much to be learned about coping with such a world from the companies that are already adapting to it. The concept of facility engineering and management can be applied in similar way. To build a wafer fab, it is not only an important decision to a company’s business strategy but also important event to all employees. Facility supply is very important activities which ensure that the most reliable utilities to be provided.
2.1 The Process Structure of Building a Wafer Fab The process of new build fab is divided into two parts: one is infrastructure engineering, and the other is facility engineering. As of infrastructure engineering, it includes fab design, environmental engineering, civil and constructive engineering and the fundamental work of utility supplies. And for the facility engineering, it covers facility supply systems and lining which states air conditioning system, clean room environment, power supply system, chemical supply, exhaust and system connection etc. In general the identity of the period in new build fab, it will start from fab design, design charts completion and biting, physical object release and trial run, mass production. Due to the implementing process of new build fab it engage with engineering and project work will be critical to schedule and budget control of the whole project of new build fab, therefore, the responsible facility managers of the project should not only be required to comprehensive overall procedure, but also be able to identify each characteristics of between the engineering works for the case of any uncertainty happened occasionally.
2.2 DCR in New Wafer Fab Typically, semiconductor manufacturer mostly concentrate on the operation and development of process technology, promotes the efficiency of production management and technology integration. Fab operation emphasized on capacity, output, yield, process capability and delivery. The function of operation support, so called “Facilities”, is to maintain a stable supply of fab utilities and lining service; and keep the fab user points and tools conditions in balance as well. All the reliable status has based on good arrangement prior to the planning of a new wafer fab. Besides, during the fab construction, the continual quality control is important to ensure the phases of system connections from point to point can be operated smoothly. DCR is part of life in a new fab construction given the difficulty on integrating the construction tasks and caused by the fast change in technologies. Facility construction team has to face various requests from different users with all kinds of functional requirements and modifications. Usually unexpected changes occur during the construction period, so the relevant measures and construction arrangement will be a challenge. This study looks into the causes of DCR and then provides possible solutions to minimize the occurrence of the DCR for the future fab construction project.
178
M.-C. Lo and G.-H. Tzeng
DCR occurs when there is a change to be made in the design specification or layout for a construction project. This study explores the root causes of DCR which occurs during the construction. DCR is a kind of construction amending or changing actions, and it only occurs in the period of initial stage which means from fab design to the wafer pilot/trial run. DCR in the new fab of semiconductor manufacturing company is a critical point for project management which can help construction work smoothly. Despite the differences in wafer size and technology implementation the fundamental of construction is basically the same. However there are still minor differences in requirements from equipment and the process technology, which are important for providing a flexible design of facilities resources. The construction process can be taken within 180-365 days or less and can be totally ruined in a blink of an eye because of accidental event (ex. contamination, fire, utilization shutdown etc.). For the reasons, the wafer fab spends millions of dollars annually to maintain conditioning devices and systems to ensure the cleanliness conditions in the clean room environment. Therefore, fab building team bears a unique scheme to reach the goal and its task includes the details of planning process, implementation procedures, purchase and bid, outsourcing, technology configuration. Additionally the building team has to find a compromise between top management strategies and contractor requirement in the field of cost control. Therefore, the responsible project managers should be capable to take over all the affects of possible threats and avoid their target to be postponed. Both timing and cost are important, while DCR become a sophisticated tool to make the experimental judgments on proceeding the project in efficiency and driving the schedule and cost control in coping with company overall plan and carried out all the work to be tactful. This article explores the causes of DCR and the acceptable amount of DCR to effectively balance between the cost and schedule control for a given wafer fab construction project.
3 DCR Evaluation The purpose of this section is to establish a hierarchical structure for tackling the evaluation problem of building DCR alternative. The contents include four subsections: evaluation approaches, building hierarchical structure of evaluation criteria, determining the evaluation criteria weights and getting the performance values. First of all, the experts’ discussion and brainstorming have to take place for gaining the experience in the field, to ease understanding of the problem. The model establishment stage has three steps: (1) scenario analysis; (2) establishing hierarchy structure; (3) carrying on the questionnaire design, the investigation and comments. Fig. 1 shows the evaluation index diagram of DCR management in fab construction. A special task force to review DCR occurrence and to seek the possible solution was teamed up from this study (Fig. 1). Then, the sketch of the whole story of finding root cause of DCR through a KJ method was drawn. The reasonability of each DCR was inspected. Since the criteria of building DCR evaluation have
MCDM Applications on Effective Project Management
179
diverse significance and meanings, we are hardly assuming that each evaluation criteria is of equal importance. Although, there are many methods that can be employed to determine weights (Hwang and Yoon, 1981) The selection of method depends on the nature of the problem. To evaluate DCR is a complex problem, so this problem requires the most inclusive and flexible method. Since the AHP method can systematize complicated problems, is easy to operate, and integrates most of the experts’ and evaluators’ opinions. Define Categories from construction point of view 1. Definition 2. Event Collection 3. Experts Interview
Issue List
1. Data verification 2. Event study
Apply KJ Method to regroup issues
Apply System Diaphragm to find root causes
1. Apply KJ method 2. Event discussion 3. Categorization 4. Build up Study Structure
1. Categorize Events 2. List root causes 3. Apply ABC method to focus on the problem
The Feasible Solution 1. According to root causes to find its reduction plan 2. Identify Action Items 3. Summary
Fig. 1 Evaluation approaches
3.1 Evaluation Approaches Companies are increasingly using projects in their daily work to achieve company goals. There is a growing need for the management of projects in business organizations. In recent years, researchers have become increasingly interested in factors that may have an impact on project management effectiveness. The practical applications reported in the literature (Tzeng et al., 1994; Teng and Tzeng, 1996; Tsaur et al., 1997; Tang et al., 1999) have shown advantages in handling unquantifiable/qualitative criteria, and obtained quite reliable results. AHP (Saaty, 1977; 1980) is a decision-aiding method. It aims at quantifying relative priorities for a given set of alternatives on a ratio scale, based on the judgment of the decision-maker, and stresses the importance of the intuitive judgments of a decision-maker as well as the consistency of the comparison of alternatives in the decision-making process. The group provides the judgments as if the group has achieved consensus on some judgments via some voting technique, or take the “average” from judgments. The group may decide to give all group members equal weight, or different weights to reflect their different position in the project. The AHP allows group decision-making, where group members can use their experience, values and knowledge to break down a problem into a hierarchy and solve it by the AHP steps. Brainstorming and sharing ideas and insights often lead to a more complete representation and understanding of the issues (Tzeng, 1977). Using KJ method ameliorates the building of hierarchy structure and improves the mutual understanding between top management and fab building team. The group defines the issues to be examined and alters the prepared hierarchy or constructs a new hierarchy to cover all the important issues.
180
M.-C. Lo and G.-H. Tzeng
4 New Wafer Fab Construction Project Management The study focuses to new fab planning because it can be tailored to promote different privatization goals and ease project execution in different context. The study begins with an empirical survey of a wide range discussions of new fab procedures used in designing infrastructure concessions. These procedures and provisions are tools available to the company for working efficiently among the construction team, users and top management for aligning its own objectives.
4.1 Building Hierarchical Structure of Evaluation Criteria The hierarchy structure build through the group experts’ discussion to generate a supply of ideas on board. The method emphasizes on the ideas being relevant, verifiable and important. The several repetitions can be used to precise the hierarchy content. The hierarchical structure adopted in this study to deal with the problems of DCR assessment for new building wafer fab is shown in Fig. 2. Objective
Phase
A. Design Change
DCR Management Evaluation of New Wafer Fab Construction
(0.189)
B. Requirement Change (0.216)
C. Construction Change
Criteria A1 Design buffer and designers custom (0.040) A2 Design bid condition (contract) (0.048) A3 Design match-up (0.037) A4 Regulation change insurance request (0.064) B1 Vendor selection (0.048) B2 Work area improvement (0.065) B3 Future expansion concern (0.060) B4 Special request (0.042) C1 Scope loss (0.035) C2 Schedule control (0.055) C3 Space management conflict (0.051)
(0.141)
D. Strategic Consideration (0.288)
E. Package Schedule Issue (0.166)
D1 Area function change (0.061) D2 Space reserve for functional arrangement (0.037) D3 New process requirement (0.068) D4 Project/Schedule change (0.057) D5 Production plan change (0.065) E1 Political Consideration -environment assessment (0.030) E2 Influence of Competitor/Economy recession (0.028) E3 Associate with Material supplier (0.025) E4 Temporary addendum work (0.031) E5 General construction & preparation (0.025, 0.011) E6 Project management, sequence adjustment (0.026)
Fig. 2 The evaluation index diagram and the weights of 22 criteria
The key dimensions of the criteria for evaluation and selection of DCR alternatives were derived through comprehensive investigation and consultation with several experts. Among them was top management in architectural engineering and in civil engineering, consultant company, experienced architects and several
MCDM Applications on Effective Project Management
181
representative users like experienced managers in professional services procurement, R&D, fab equipment, fab engineering and administration of the company. The experts’ and users’ opinions provided the basis for developing the hierarchical structure used in this study. Furthermore, the five criteria selection principles suggested by Keeney and Raiffa (1976) have been used to formulate the DCR evaluation criteria in this study.
4.2 The Evaluation and Analysis The project of building a new fab lasts for certain time. During this period construction team must all the time keep the project on schedule and the randomly adjust the schedule (“pull ahead” or “push behind”) to ensure whole project will work smoothly. Experienced adjustment under frequent coordination is important to ensure the construction advance. 4.2.1 Determining the Evaluation Criteria Weights and Its Ranking This study is a real task from the experts with the construction knowledge and decent degree of understandings in engineering and technology management as a major requirement, namely, project manager, civil design experts and engineering managers, and users must have the construction involvement, budget control management and engineering hook-up experiences at both fields of the facility and equipment at least 10 year. In objectives selection, the group decision-makers include three groups: (a) construction; (b) users; (c) management; next experts are invited for evaluation. By going through personal interviews and completing the surveys, we summarized their views and thoughts of the assessment criteria. Table 1 The weights and ranking by group Weight (Rank)\Phases A
B
C
D
E
Fab Team
0.165 (4) 0.226 (2) 0.168 (3) 0.283 (1) 0.158 (5)
User
0.232 (2) 0.188 (3) 0.092 (5) 0.309 (1) 0.179 (4)
Top Mgt.
0.200 (3) 0.234 (2) 0.133 (5) 0.267 (1) 0.165 (4)
Average
0.598 (3) 0.648 (2) 0.394 (5) 0.859 (1) 0.502 (4)
From Table 1, the civil managers and the competitors consider the weights of construction change to be higher than the design change which in turn is important for facility managers, design/consultants and top managers. Therefore, it demonstrates the differences which are caused by different attitudes of the considered groups, which in turn comes from different field of responsibilities toward the new fab. These three groups put the highest weight to the “D. Strategy Consideration” – which demonstrates the consistent view of construction project.
182
M.-C. Lo and G.-H. Tzeng
Table 2 Weights and ratios on construction capital expenditure DCR ratio on total construction cost (%)
Weight of criteria Items Fab Team A A1 A2 A3 A4 B B1 B2 B3 B4 C
User
Top Mgt.
Avg.
Original Reviewed
Can be improved
0.165
0.232
0.200
0.189
11.800
10.700
11.800
0.165 0.225 0.168 0.283
0.232 0.188 0.093 0.310
0.200 0.233 0.133 0.267
0.189 0.216 0.141 0.288
5.89 3.54 2.36 0.01
4.30 3.87 2.58 0.01
5.89 3.54 2.36 0.01
0.225
0.188
0.233
0.216
9.400
12.300
9.400
0.159 0.031 0.045 0.035
0.179 0.047 0.060 0.034
0.167 0.062 0.038 0.054
0.166 0.040 0.048 0.037
0.03 5.92 3.38 0.05
0.05 7.77 4.43 0.05
0.03 5.92 3.38 0.05
0.168
0.093
0.133
0.141
0.200
0.200
0.200
C1
0.054
0.092
0.046
0.064
0.09
0.01
0.09
C2
0.043
0.048
0.072
0.048
0.01
0.01
0.01
C3
0.069
0.069
0.045
0.065
0.15
0.22
0.15
0.283
0.310
0.267
0.288
66.100
61.000
66.100
D1
0.067
0.049
0.054
0.060
1.66
2.18
1.66
D2
0.047
0.023
0.063
0.042
48.19
35.13
48.19
D3
0.040
0.024
0.038
0.035
16.24
23.68
16.24
D4
0.065
0.039
0.051
0.055
0.01
0.01
0.01
D5
0.064
0.030
0.044
0.051
0.01
0.01
0.01
D
E E1
0.159
0.179
0.167
0.166
12.500
15.700
12.500
0.074
0.046
0.044
0.061
6.02
8.78
6.02
E2
0.040
0.031
0.036
0.037
4.02
5.86
4.02
E3
0.067
0.077
0.053
0.068
0.01
0.01
0.01
E4
0.047
0.074
0.062
0.057
0.01
0.01
0.01
E5
0.056
0.082
0.071
0.065
1.20
0.17
1.20
E6
0.025
0.040
0.035
0.030
1.26
0.92
1.26
3.2%
2.4%
DCR ratio in total cost of new wafer fab 4.7% * DCR ratio = DCR expenditure ÷ Total cost of new wafer fab
× 100.
4.2.2 Analysis of Performance Indices The study reveals that early stage of fab operation could take more time to reach efficiency in work. Especially, the cost control and coordination work could be a tough job for the project team while focus on keeping the balance of quality and schedule. The results also show that the “Strategic Consideration” took the major
MCDM Applications on Effective Project Management
183
percentage of total DCR cost in 66.1%. The second is “Package Schedule Issue” with 12.5% and the third one is “Design Change” with 11.8% which accumulate to reach over 90% of total DCR costs. This means that improvement in these three phases can reasonable reduce the spending for DCR (Lo et al., 2002b). From this DCR review activities, the original ratio of 4.7% DCR budgeting can be located down to 2.4%-3.2% (and $5~6 million can be saved with management improvement) which provide a reference of future planning and capital budgeting on the new build fab project. For reaching the expected DCR ratio, we suggest the following feasible actions of building new wafer fab with higher efficiency. • Situation Improvement: Change the process of construction and keep design in flexibility; • Procedures Review: Specify the requirements from risk management and insurance company before construction team up; Purchase Procedures Review; • Define Specification: Formulate the standard of operation area; Set up a fabrication standard of design specification. • Training: Establish an internal qualification system for design consultant, architecture and site vendors to qualify in working at construction site, regulation understanding; Familiarity with related safety regulation and Capacity calculation; • Benchmarking: Spec & Design Issue Benchmark to Exist Fabs (Local or oversea); Benchmark with competitors (through local/foreign vendors).
5 Conclusions The DCR hierarchical assessment model is based on real case measures and experts joining the overall evaluation. This research employs a practical case, for assessing scores to achieve overall evaluations. The procedure is based on the database of previous DCR occurrences and combined ABC and AHP methods. The results show that the multi-attribute evaluation process in this research is feasible and applicable. Knowledge and experience gained during previous projects are the sources of information for building the database used in future. The diagnosis procedures provide a reference to future planning and capital budgeting on the new fab. Top 3 phases which cumulate over 90% of total DCR costs establish the foundations for the considerable improvement. The original ratio of the DCR cost to total construction cost were found to be 4.7%. After the reviewing procedure this figure was lessened/lowered to the 3.0%. Our study suggests that the careful analysis supported by the MCDM and AHP method can reduce this ratio reasonable down to the 2.4%. In terms of cost it means the potential savings of about $10 million (from original $20 million to $9.6 million). An acceptable cost of DCR is around 2.4%~3.2% of total budget and $ 5~6 million can be saved with management improvement. This modeling is proven to be highly flexible and yields insights how DCR works in practice.
184
M.-C. Lo and G.-H. Tzeng
References [1] Crawford, L.: Senior management perceptions of project management competence. International Journal of Project Management 23, 7–16 (2005) [2] Fulmer, W.E.: Shaping the Adaptive Organization: Landscapes, Learning, and Leadership in Volatile Times. AMACOM, New York (2000) [3] Hwang, C.L., Yoon, K.: Multiple Attribute Decision Making: Methods and Applications. Springer, New York (1981) [4] Kawakita, J.: The KJ method – a scientific approach to problem solving, Technical report, Kawakita Research Institute, Tokyo (1975) [5] Keeney, R.L., Raiffa, H.: Decisions with Multiple Objective: Preference and Value Tradeoffs. Wiley, New York (1976) [6] Lo, M.C., Chang, C.Y., Tzeng, G.H.: Semiconductor Project Management of Design Change Requirement for New Wafer Fab Construction. In: Proceedings of the International Conference of Asia Pacific Industrial Engineering & Management (APIEMS 2002), Taipei, Taiwan (2002a) [7] Lo, M.C., Wu, H.C., Lai, C.M., Chiang, S.H.: A Study on Resource Management of Design Change Requirement for New Wafer Fab Construction. In: Semiconductor Manufacturing Technology Workshop (SMTW 2002), Hsin-Chu, Taiwan (2002b) [8] Saaty, T.L.: A Scaling Method for Priorities in Hierarchical Structures. Journal of Mathematical Psychology 15(2), 234–281 (1977) [9] Saaty, T.L.: The Analytic Hierarchy Process: Planning, Priority Setting. McGrawHill, New York (1980) [10] Tang, M.T., Tzeng, G.H., Wang, S.W.: A Hierarchy Fuzzy MCDM Method for Studying Electronic Marketing Strategies in the Information Service Industry. Journal of International Information Management 8(1), 1–22 (1999) [11] Teng, J.Y., Tzeng, G.H.: Fuzzy Multicriteria Ranking of Urban Transportation Investment Alternatives. Transportation Planning and Technology 20(1), 15–31 (1996) [12] Tsaur, S.H., Tzeng, G.H., Wang, G.C.: The Application of AHP and Fuzzy MCDM on the Evaluation Study of Tourist Risk. Annals of Tourism Research 24(4), 796–812 (1997) [13] Tzeng, G.H.: A study on the PATTERN Method for the Decision Process in the Public System. Japan Journal of Behaviormetrics 4(2), 29–44 (1977) [14] Tzeng, G.H., Shiah, T.A., Teng, J.Y.: A Multiobjective Decision Making Approach to Energy Supply Mix Decisions in Taiwan. Energy Sources 16(3), 301–316 (1994)
Machine Failure Diagnosis Model Applied with a Fuzzy Inference Approach Lily Lin and Huey-Ming Lee
*
Abstract. This study presented the method of fuzzy failure diagnosis to support the development failure diagnosis system. The fuzzy evaluation is used to process the problems of which the failures and the symptoms are dealing with the uncertainty. In this study, we development machine failure diagnosis model by using both statistics and fuzzy compositional rule of inference methods. We use the statistical confidence interval instead of the point estimate and fuzzify confidence interval to triangular fuzzy numbers. In this study, we apply the centroid method to solve the estimated failure rate in the fuzzy sense to obtain the machine failure degree. Keywords: Fuzzy inference, Failure diagnosis, Membership grade.
1 Introduction The manufacturing technology of machines is mature gradually, and then to handle machine alarm states in real time are very important. If the machine unit gave the alarms, manufacturers had to spend a lot of time to eliminate where the machine is breakdown. If it enables to build a accurate machine failure diagnosis model, the decision makers might be able to obtain the information of manufacturing in real time. The topic merits discussion and study. According to life analysis in reliability theory, certain diagnosis rules can be used to diagnose machines' faults. On this basis, considering the indefiniteness in machine working states, the accurate diagnosis rule was extended to fuzzy diagnosis rule by using basic concepts and methods of fuzzy mathematics. The formulas of fault probability under different conditions were deduced. Lily Lin Department of International Business, China University of Technology 56, Sec. 3, Hsing-Lung Road, Taipei (116), Taiwan e-mail:
[email protected] *
Huey-Ming Lee Department of Information Management, Chinese Culture University 55, Hwa-Kung Road, Yang-Ming-San, Taipei (11114), Taiwan e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 185–190. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
186
L. Lin and H.-M. Lee
2 Preliminaries At first, we assume that one machine with unusual status said is S1 , S 2 , … , S P ,
= {S1 , S 2 ,…, S P } , its possible fault causes we said is F1 , F2 , …, Fq and F = {F1 , F2 ,… , Fq } represents fault causes set, then we
represent as a set as S
use past statistical data to make a fuzzy relation matrix between unusual status and fault causes. Moreover; we assume there are r machines, lets say it is a set of M = {M 1 , M 2 , …, M r }, we apply the fuzzy inference compositional rule to make a fuzzy relation matrix between machine and fault causes, then to do the failure diagnosis for that machine.
3 The Proposed Algorithm We present the fuzzy failure diagnosis method as follows; Step 1: Let fuzzy relation matrix H between M and S is
A(u ) = (a u1 , a u 2 , … , aup ) ,
u = 1,2, … , 0 ≤ a uj ≤ 1, j = 1,2, …, p, a uj means the degree of machine M u with unusual status S j
~ Au is decided by decision maker. We can have the following matrix.
⎡a11 a12 a1 p ⎢ ~ ⎢a 21 a 22 a 2 p H =⎢ ⎢ ⎢⎣a r1 a r 2 a rp
⎤ ⎥ ⎥ ⎥ ⎥ ⎥⎦
(1)
Step 2: Decision maker analysis past statistical data by failure time series order, and appropriately divide all of machines (assume total number is N) into n groups. We assume that there is unusual status in qth group of machines, for each t ∈ {1,2,…, p}v ∈ {1,2,…, q} , and the number of machines with failure Fv is K gtv , then
g = 1,2,… , n t = 1,2,… , p v = 1,2, … , g n
g
g =1
v =1
K tv = ∑ K gtv , K t = ∑ K tv and
γ tv =
K tv
Kt
t = 1,2,… , p , v = 1,2,… q
Machine Failure Diagnosis Model Applied with a Fuzzy Inference Approach
187
Then we can have
0 ≤ γ tv ≤ 1, t = 1,2,… , p v = 1,2, … , q for each
q
t ∈ {1,2, … , p}
∑γ v =1
tv
= 1.
According to Central Limit Theorem, the probability distribution
γ tv
is nearly
normal distribution; therefore, to find (1 − α ) × 100% confidential interval based on normal distribution, we can have equation (2)
⎡ γ tv (1 − γ tv ) γ (1 - γ tv ) ⎤ , γ tv + Ζ(α tv2 ) tv ⎢γ tv − Ζ(α tv1 ) ⎥ K tv K tv ⎢⎣ ⎥⎦
0 < α tvk < 1, k = 1,2, 0 < α < 1 ; α tv1 + α tv2 = α If Z is a random variable of normal distribution
P(Ζ ≥ Ζ(α tvk )) = α tvk , k = 1,2 ,….
N(0,1)
(2)
, Ζ(α
tvk
) to satisfy
To meet the data variation for each group, we find an approximation of Ζ(α tvk1 , k = 1,2) in equation (2). Because we cannot know the probability of
γ tv - Ptv ,
which is the deviation of point approximation
γ tv
and
Ptv
,as above
we meet the data variation to get (1 − α ) × 100% confidential interval as equation (2), which is an interval instead of a point. However, it is not available to use the traditional statistic method to solve the machine failure diagnosis problem. In this study, we transfer the statistical interval to fuzzy number interval, and we can have the following triangular fuzzy number. γ~ = γ − Ζ(α ) γ (1 − γ ) K , γ , tv
tv
tv1
tv
tv
tv
tv
γ tv + Ζ(α tv2 ) γ tv (1 - γ tv ) K tv
(3)
t = 1,2,…, p , v = 1,2,…, q We can have fuzzy relation matrix between F and S as follow.
⎛ γ~11 ⎜ ~ ~ ⎜ γ 21 ℜ=⎜ ⎜ ⎜ γ~ ⎝ p1
γ~12 γ~22
γ~1q ⎞ ⎟ γ~2q ⎟
γ~p2
γ~pq
⎟ ⎟ ⎟ ⎠
(4)
Step 3: fuzzy inference compositional rule Based on equation (1) and (4), we can have fuzzy relation matrix between F and M as follow.
188
~ ~ ~ ⎛ b11 b12 b1q ⎞ ⎟ ⎜ ~ ~ ~ ⎜ b b b2q ⎟ ~ ~ ~ ⎟ B = H ℜ = ⎜ 21 22 ⎟ ⎜ ⎟ ⎜~ ~ ~ ⎟ ⎜b b b rq ⎠ ⎝ r1 r2 ~ b tv = (a t1γ~1v )(+ )(a t2 γ~2v )(+ ) (+ )(a tp γ~pv )
L. Lin and H.-M. Lee
(5)
(6)
t = 1,2, … , r v = 1,2, … , q a tv ≥ 0 for all t, v p p ⎛ p ~ (1) (2) ⎞ ⎟⎟ b tv = ⎜⎜ ∑ a tk γ kv , ∑ a tk γ kv , ∑ a tk γ kv k =1 k =1 ⎠ ⎝ k =1
(7)
γ kv(1) = γ kv - Ζ(d kv1 ) γ kv (1 - γ kv ) K kv γ kv(2) = γ kv + Ζ(d kv2 ) γ kv (1 - γ kv ) K kv 4 Conclusion Finally, we apply the centroid method to defuzzifyequation (7) , and we can get the membership grade of fault causes Fj . It represents the degree of one machine with fault causes Fj .
Acknowledgments This work was supported in part by the national Science council of Taiwan, R.O.C., under grant No. 98-2410-H-163-005-MY2.
References [1] Kanfmaun, A., Gupta, M.M.: Introduction to Fuzzy Arithmetic Theory and Application, Van Nortrand, New York (1991) [2] Gu, B.-f., Wu, J.-f., Liu, B.: Fault Diagnosis of Machine Based on Fuzzy Reliability Theory. International Journal of Plant Engineering and Management (6) (2001) [3] Zimmermaun, H.T.: Fuzzy Set Theory and Its Application. Kluwer Academic Publishers, Boston (1991) [4] Huallpa, B.N., Nobrega, E., Von Zuben, F.J.: Fault Detection in Dynamic Systems Based on Fuzzy Diagnosis. In: Fuzzy Systems Proceedings, IEEE World Congress on Computational Intelligence, vol. 2 (1998)
Machine Failure Diagnosis Model Applied with a Fuzzy Inference Approach
189
[5] Lee, H.-M.: Applying Fuzzy Set Theory to Evaluate the Rate of Aggregative Risk in Software Development. Fuzzy Sets and Systems 79, 323–336 (1996) [6] Lee, H.-M., Lin, L.: A New Algorithm for Applying Fuzzy Set Theory to the Facility Site Selection. International Journal of Innovative Computing Information and Control 5(12), 4953–4960 (2009) [7] Lee, H.-M., Lin, L.: A fuzzy risk assessment in software development defuzzified by signed distance. In: Velásquez, J.D., Ríos, S.A., Howlett, R.J., Jain, L.C. (eds.) KES 2009. LNCS, vol. 5712, pp. 195–202. Springer, Heidelberg (2009) [8] Lee, H.-M., Shih, T.-S., Su, J.-S., Lin, L.: Fuzzy decision making for IJV performance based on statistical confidence-interval estimates. In: Pan, J.-S., Chen, S.-M., Nguyen, N.T. (eds.) ICCCI 2010. LNCS, vol. 6422, pp. 51–60. Springer, Heidelberg (2010) [9] Buchanan, J.L., Turner, P.R.: Numerical Methods and Analysis. McGraw-Hill, New York (1992) [10] Yao, J.F.-F., Yao, J.-S.: Fuzzy Decision Making for Medical Diagnosis Based on Fuzzy Number and Compositional Rule of Inference. Fuzzy Sets and Systems 120, 351–366 (2001) [11] Yao, J.-S., Yu, M.-M.: Decision Making Based on Statistical Data, Signed Distance and Compositional Rule of Inference. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 12(2), 161–190 (2004) [12] Yao, J.-S., Huang, W.-T., Huang, T.-T.: Fuzzy Flexibility and Product Variety in Lotsizing. Journal of Information Science and Engineering 23, 49–70 (2007) [13] Mathews, J.H.: Numerical Methods for Mathematics, Science, and Engineering. Prentice-Hall International, Inc, London (1992) [14] Lin, L., Lee, H.-M.: Fuzzy group assessment for facility location decision. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part III. LNCS (LNAI), vol. 4694, pp. 386–392. Springer, Heidelberg (2007) [15] Lin, L., Lee, H.M.: A New Assessment Model for Global Facility Site Selection Based on Fuzzy Set Theory. International Journal of Innovative Computing Information and Control 4(5), 1141–1150 (2008) [16] Lin, L., Lee, H.M.: A Fuzzy Software Quality Assessment Model to Evaluate User Satisfaction. International Journal of Innovative Computing Information and Control 4(10), 2639–2647 (2008) [17] Lin, L., Lee, H.-M.: Evaluation of Survey by Linear Order and Symmetric Fuzzy Linguistics Based on the Centroid Method. International Journal of Innovative Computing Information and Control 5(12), 4945–4952 (2009) [18] Lin, L., Lee, H.-M.: Fuzzy Assessment Method on Sampling Survey Analysis. Expert Systems With Applications 36, 5955–5961 (2009) [19] Lin, L., Lee, H.-M.: Group Assessment Based on the Linear Fuzzy Linguistics. International Journal of Innovative Computing Information and Control 6(1), 263–274 (2010) [20] Lin, L., Lee, H.-M.: Using Signed Distance for Analyzing Sampling Survey Assessment Answered with Interval Value. ICIC Express Letters 3(4B), 1185–1190 (2009) [21] Lin, L., Lee, H.-M., Su, J.-S.: Fuzzy Opinion Survey Based on Interval Value. ICIC Express Letters 4(5B), 1997–2001 (2010) [22] Lin, L., Lee, H.-M.: Fuzzy Assessment for Sampling Survey Defuzzification by Signed Distance Method. Expert Systems With Applications 37(12), 7852–7878 (2010)
190
L. Lin and H.-M. Lee
[23] Lin, L., Lee, H.-M.: A Fuzzy Assessment for Software Development Risk Rate. ICIC Express Letters 4(2), 319–323 (2010) [24] Yao, J.-S., Wu, K.: Ranking Fuzzy Numbers Based on Decomposition Principle and Signed Distance. Fuzzy sets and Systems 116, 275–288 (2000) [25] Yao, J.-S., Su, J.-S., Shih, T.-S.: Fuzzy System Reliability Analysis Using Triangular Fuzzy Numbers Based on Statistical Data. Journal of Information Science and Engineering 24, 1521–1535 (2008) [26] Zadeh, L.A.: Fuzzy Sets. Information and Control 8, 338–353 (1965) [27] Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning. Information Science 8 (1975), 199–249 (I), 301–357 (I), 9, (1976), 43–58 (III) [28] Zimmermann, H.-J.: Fuzzy Set Theory and Its Applications, 2nd revised edn. Kluwer Academic Publishers, Boston (1991)
Neural Network Model Predictive Control of a Wastewater Treatment Bioprocess Dorin Şendrescu, Emil Petre, Dan Popescu, and Monica Roman
*
Abstract. This paper deals with the design of a nonlinear model predictive control (NMPC) scheme for the regulation of the acetate concentration in a biomethanation process – wastewater biodegradation with production of methane gas that takes place inside a Continuous Stirred Tank Bioreactor. The NMPC control structure is based on a radial basis function neural network used as on-line approximator to learn the nonlinear characteristics of process. Minimization of the cost function is realised using the Levenberg–Marquardt numerical optimisation method. Some simulation results are given to illustrate the efficiency of the proposed control strategy. Keywords: Nonlinear systems, Neural networks, Model predictive control, Wastewater treatment bioprocesses.
1 Introduction During the last years, the control of biotechnological processes has been an important problem attracting wide attention. In industry, the bioprocesses take place in biological reactors, also called bioreactors. A bioreactor is a tank in which several biological reactions occur simultaneously in a liquid medium [1]. These reactions can be classified into two classes: microbial growth reactions and enzyme-catalysed reactions. The bioreactors can operate in three modes: the continuous mode, the fed-batch mode and the batch mode (see [1]–[3]). For example, a Fed-Batch Bioreactor (FBB) initially contains a small amount of substrates and microorganisms and is progressively filled with the influent substrates. When the FBB is full the content is harvested. By contrast, in a Continuous Stirred Tank Bioreactor (CSTB) the substrates are fed to the bioreactor continuously and an effluent stream is continuously withdrawn from the CSTB such that the culture volume is constant. In practice, the bioprocesses control is often limited to regulation of the temperature and pH at constant values favourable to the microbial growth. There Dorin Şendrescu · Emil Petre · Dan Popescu · Monica Roman Department of Automatic Control, University of Craiova, A.I. Cuza 13, Craiova, Romania e-mail: {dorins,epetre,dpopescu,monica}@automation.ucv.ro *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 191–200. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
192
D. Şendrescu et al.
is however no doubt that the control of the biological state variables (biomass, substrates, products) can help to increase the bioprocess performances. In order to develop and apply advanced control strategies for these biological variables, obviously is necessary to obtain a useful dynamical model. The bioprocess modelling is a difficult task; however, using the mass balance of the components inside the bioreactor and obeying a number of modelling rules, a dynamical model of a bioprocess can be obtained. The main engineering motivation in applying control methods to such living processes is to improve operational stability and production efficiency. But the use of modern control for these bioprocesses is still low. Nonlinear model predictive control (NMPC) is needed especially for nonlinear, unsteady processes where a trajectory needs to be followed from the prediction of a nonlinear model [4]. It is especially useful for processes operating at or near singular points that cannot be captured by linear controllers and where higher order information is needed. The traditional control design involves complicated mathematical analysis and has difficulties in controlling highly nonlinear and time varying plants as well. NMPC uses the nonlinear dynamic model to predict the effect of sequences of control steps on the controlled variables. Recently, there has been considerable interest in the use of neural networks (NNs) for the identification and control of complex dynamical system [5], [6]. The main advantage of using NNs in control applications is based both on their ability to uniformly approximate arbitrary input-output mappings [8] and on their learning capabilities that enable the resulting controller to adapt itself to possible variations in the controlled plant dynamics [5]. More precisely, the variations of plant parameters are transposed in the modification of the NN parameters (i.e. the adaptation of the NN weights). Using the feedback linearization and NNs, several NNs-based adaptive controllers were developed for some classes of uncertain, time varying and nonlinear systems [5], [7]. In this paper, the design and analysis of a nonlinear model predictive control strategy for an anaerobic digestion bioprocess with incompletely known dynamics is presented. The design of the model predictive controller is based on the radial basis function NN. The control signals are generated using a NN approximation of the functions representing the uncertain plant dynamics. The derived control method is applied to an anaerobic bioprocess used for the wastewater treatment. This bioprocess is characterized by strongly nonlinear, time varying and not exactly known dynamical kinetics. The control objective is to maintain a low pollution level. The paper is organized as follows. Section 2 is devoted to description and modelling of an anaerobic digestion bioprocess. The nonlinear model predictive control strategy is presented in Section 3. Simulations results presented in Section 4 illustrate the performance of the proposed control algorithms and, finally, Section 5 concludes the paper.
Neural Network Model Predictive Control of a Wastewater Treatment Bioprocess
193
2 Nonlinear Model of the Anaerobic Bioprocess In the conventional MPC controller, a linear predictive model is used because the theory of the identification of a linear system has well been established. The nonlinear part of the system response is treated as disturbance. But, a linear model, no matter how well has it been structured and tuned, may be acceptable only in the case where the system is working around the operating point. If the system is highly nonlinear, such as biotechnological processes, control based on the prediction from a linear model may result in unacceptable response. In some cases, significant static errors exist, and in other cases, oscillation or even instability may occur. Therefore, some kinds of non-linear models should be used to describe the behaviour of a highly non-linear system. Wastewater treatment plants are difficult to control because of nonlinear dynamics and of their unknown parameters whose values can modify in time. We consider a biomethanation process – wastewater biodegradation with production of methane gas that takes place inside a Continuous Stirred Tank Bioreactor whose reduced model is presented in [1], [2]. It is a two phases process. In the first phase, the glucose from the wastewater is decomposed in fat volatile acids (acetates, propionic acid), hydrogen and inorganic carbon under action of the acidogenic bacteria. In the second phase, the ionised hydrogen decomposes the propionic acid CH3CH2COOH in acetates, H2 and carbon dioxide CO2. In the first methanogenic phase, the acetate is transformed into methane and CO2, and finally in the second methanogenic phase, the methane gas CH4 is obtained from H2 and CO2 [2], [10], [11]. The following simplified reaction scheme is considered, Φ ⎧ S1 ⎯⎯→ X1 + S2 ⎨ Φ ⎩S 2 ⎯⎯→ X 2 + P1 1
2
(1)
where: S1 represents the glucose substrate, S2 the acetate substrate, X1 is the acidogenic bacteria, X2 the acetoclastic methanogenic bacteria and P1 represents the product, i.e. the methane gas. The reaction rates are denoted by Φ1 ,Φ 2 . The corresponding dynamical model is
⎡ X1 ⎤ ⎡ 1 ⎢ ⎥ ⎢ − k1 S1 d ⎢ ⎥ ⎢ ⎢X 2 ⎥ = ⎢ 0 dt ⎢ ⎥ ⎢ ⎢ S2 ⎥ ⎢ k2 ⎢P ⎥ ⎢ 0 ⎣ 1⎦ ⎣
0 ⎤ ⎡ X1 ⎤ ⎡ 0 ⎤ ⎡ 0 ⎤ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ 0 ⎥ S1 ⎥ ⎢ DSin ⎥⎥ ⎢ 0 ⎥ ⎢ Φ ⎡ 1⎤ 1 ⎥⎢ ⎥ − D⎢ X 2 ⎥ + ⎢ 0 ⎥ − ⎢ 0 ⎥ ⎥ ⎣Φ 2 ⎦ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − k3 ⎥ ⎢ S2 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎢ P ⎥ ⎢ 0 ⎥ ⎢Q ⎥ k 4 ⎥⎦ ⎣ 1⎦ ⎣ ⎦ ⎣ 1⎦
(2)
D. Şendrescu et al.
194
Defining the state vector ⎡ X 1 ⎤ ⎡ ξ1 ⎤ ⎢ ⎥ ⎢ξ ⎥ ⎢ S1 ⎥ ⎢ 2 ⎥ ξ = ⎢ X 2 ⎥ = ⎢ξ 3 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ S 2 ⎥ ⎢ξ 4 ⎥ ⎢ P ⎥ ⎢ξ5 ⎥ ⎣ 1⎦ ⎣ ⎦
(3)
whose components are concentrations in (g/l), and the reaction rates ⎡ Φ (ξ) ⎤ Φ = Φ(ξ) = ⎢ 1 ⎥ ⎣Φ 2 (ξ)⎦
(4)
the model (2) can be written in a compact form as:
ξ = KΦ(ξ) − Dξ + F − Q
(5)
F is the vector of inflow rates , Q is the vector of gaseous outflow rates, D is the dilution rate and K is the yield coefficient’s matrix. The reaction rates for this process are nonlinear functions of the state components and are given by the Monod law S1 ⋅ X 1 K M + S1
Φ1 (ξ) = μ1*
(6)
1
and the Haldane kinetic model Φ 2 (ξ) = μ*2
KM
2
S2 ⋅ X 2 + S 2 + S 22 / K i
(7)
where K M , K M are Michaelis-Menten constants, μ1* , μ*2 represent specific growth rates coefficients and Ki is the inhibition constant. For the anaerobic bioprocess described by dynamical model (2) we consider the problem of controlling the output pollution level y defined as: 1
2
y = α1S1 + α 2 S 2
(8)
where α1 and α 2 are known constants, by using an optimal control input computed using a neural network model predictive control presented in the following section.
3 Neural Network Model Predictive Control 3.1 Problem Formulation Consider the following discrete-time, time-invariant nonlinear system:
Neural Network Model Predictive Control of a Wastewater Treatment Bioprocess
195
⎧ ξ k +1 = f (ξ k , uk ) ⎨ ⎩ y k = h (ξ k , u k )
(9)
with ξ k the state vector, uk the control signal (corresponding in our case with the discretisation of system (2)). The objective is to regulate the output signal to a specified setpoint value yref while guaranteeing that certain input and state constraints:
ξ min ≤ ξ k ≤ ξ max u min ≤ u k ≤ u max
(10)
Nonlinear model predictive control (NMPC) treats such a constrained control problem by repeatedly solving the following optimization problem: N Nu min ⎛⎜ ∑ ( y ref − yk +i )T Ψ ( y ref − yk +i ) + ∑ ukT+i Ωuk +i ⎞⎟ i =1 ⎠ ⎝ i=1
(11)
⎧ξ k +1 = f (ξ k , u k ) ⎪ s.t.⎨ξ min ≤ ξ k ≤ ξ max ⎪u ≤ u ≤ u k max ⎩ min
(12)
where Ψ and Ω are positive semidefinite matrices, N denotes the length of the prediction horizon and Nu the length of the control horizon. From the sequence resulting after the on-line optimization of (11) under nonlinear constraints (12), only the first optimal control is applied as input to the system. At the next sampling instant, the current state is obtained (measured or estimated) and the optimization problem (11), (12) is solved again with this new initial state value, according to the well-known receding horizon principle [12]. The previous general NMPC formulation is applied to anaerobic wastewater treatment process, in order to regulate the pollution level to a reference value yref by manipulating the dilution rate D.
3.2 The Neural Network Model The predictive model for a conventional MPC controller is usually a linear model which is preferred as being more intuitive and requiring less a prior information for its identification. MPC based on linear models is acceptable if the process operates at a single setpoint and the primary use of the controller is the rejection of disturbances. Many chemical processes, including polymer reactors, do not operate at a single setpoint. However, these models are not suitable for a nonlinear system such as biotechnological processes. To solve this problem neural networks are proposed to obtain the estimated output used by the MPC controller, because the neural networks have the ability to map any nonlinear relationships between an input and output set. There have also been many reports on the application of neural network to bioprocesses modelling and identification [6], [8].
D. Şendrescu et al.
196
In this paper the process model is obtained using a radial basis neural network (RBNN) with adjustable parameters to approximate the reaction rates Φ1 and Φ 2 from model (2). A RBNN is made up of a collection of p > 0 parallel processing units called nodes. The output of the ith node is defined by a Gaussian function γ i ( x) = exp ( − | x − ci | 2 / σi2 ) , where x ∈ ℜ n is the input to the NN, ci is the centre of the i-th node, and σ i is its size of influence. The output of a RBNN, y NN = F ( x,W ) , may be calculated as [13] F ( x,W ) = ∑ip=1 wi γ i ( x ) = W T (t )Γ ( x ) ,
(13)
where W (t ) = [ w1 (t ) w2 (t ) … w p (t )]T is the vector of network weights and Γ(x) is a set of radial basis functions defined by Γ( x) = [ γ1 ( x ) γ 2 ( x) … γ p ( x)]T . Given a RBNN, it is possible to approximate a wide variety of functions f (x) by making different choices for W. In particular, if there is a sufficient number of nodes within the NN, then there is some W * such as sup x∈S F ( x, W * ) − f ( x ) < ε , x
where S x is a compact set, ε > 0 is a finite constant, provided that f (x) is continuous [13]. The RBNN is used to estimate the reaction rates Φ1 and Φ 2 (that are considered unknown) using some state measurements.
3.3 The Control Algorithm The model predictive control is a strategy that is based on the explicit use of some kind of system model to predict the controlled variables over a certain time horizon, called the prediction horizon. The control strategy can be described as follows [14]: 1) At each sampling time, the value of the controlled variable y(t+k) is predicted over the prediction horizon k=1,..., N. This prediction depends on the future values of the control variable u(t+k) within a control horizon k=l,..., Nu. 2). A reference trajectory yref(t+k), k=1,.., N is defined which describes the desired system trajectory over the prediction horizon. 3). The vector of future controls u(t+k) is computed such that an objective function (a function of the errors between the reference trajectory and the predicted output of the model) is minimised. 4). Once the minimisation is achieved, the first optimised control action is applied to the plant and the plant outputs are measured. Use this measurement of the plant states as the initial states of the model to perform the next iteration. Steps 1 to 4 are repeated at each sampling instant; this is called a receding horizon strategy. The strategy of the MPC based control is characterized by scheme represented in Fig. 1.
Neural Network Model Predictive Control of a Wastewater Treatment Bioprocess
reference
NMPC
optimised input
Nonlinear Bioprocess
197
controlled output
Nonlinear Model predicted output
Fig. 1 NMPC control scheme
When a solution of the nonlinear least squares (NLS) minimization problem cannot be obtained analytically, the NLS estimates must be computed using numerical methods. To optimize a nonlinear function, an iterative algorithm starts from some initial value of the argument in that function and then repeatedly calculates next available value according to a particular rule until an optimum is reached approximately. Between many different methods of numerical optimization the Levenberg-Marquardt (LM) algorithm was chosen to solve the optimisation problem. The LM algorithm is an iterative technique that locates the minimum of a multivariate function that is expressed as the sum of squares of non-linear realvalued functions [15], [16]. It has become a standard technique for non-linear least-squares problems [17], widely adopted in a broad spectrum of disciplines. LM can be thought of as a combination of steepest descent and the Gauss-Newton method. When the current solution is far from the correct one, the algorithm behaves like a steepest descent method. When the current solution is close to the correct solution, it becomes a Gauss-Newton method.
4 Simulation Results In this Section we will apply the designed nonlinear model predictive control in the case of the anaerobic digestion bioprocess presented in Section 2. In order to control the output pollution level y, as input control we chose the dilution rate, u = D . The main control objective is to maintain the output y at a specified low level pollution y d ∈ ℜ . We will analyze the realistic case where the structure of the system of differential equation (2) is known and specific reaction rates Φ1 and Φ 2 (Eqs. (6) and (7)) are completely unknown and must be estimated. Using a RBNN from subsection 3.2, one constructs an on-line estimate of Φ1 respectively of Φ 2 . The performance of the nonlinear predictive controller presented in subsection 3.3 has been tested through extensive simulations by using the process model (2). The values of yield and kinetic coefficients are [18]: k1 = 3.2, k2 = 16.7, k3 = 1.035, k4 = 1.1935, k5 = 1.5, k6 = 3, k7 = 0.113, μ1∗ = 0.2 h-1, K M 1 = 0.5 g/l, μ∗2 = 0.35 h-1, K M 2 = 4 g/l, K I 2 = 21 g/l, and the values α1 = 1.2, α 2 = 0.75. It must be
D. Şendrescu et al.
198
noted that for the reaction rates estimation a full RBNN with deviation σ i = 0.05 was used. The centres ci of the radial basis functions are placed in the nodes of a mesh obtained by discretization of states X 1 ∈ [1, 12] g/l, X 2 ∈ [0.4, 0.65] g/l, S1 ∈ [0.1, 1.4] g/l and S 2 ∈ [0.3, 1.6] g/l with dX i = dSi = 0.2 g/l, i = 1, 2. The simulation results, obtained with a sample period Ts=6 min, are presented in Figs. 2 – 5. In Fig. 2 the controlled output trajectory is presented and in Fig. 3 the nonlinear model predictive control action (dilution rate D evolution) is depicted. The functions Φ1 and Φ 2 provided by the RBNN are depicted versus the “real” functions in Fig. 4 and Fig. 5. From these figures it can be seen that the behaviour of the control system with NMPC controller is very good, although the process dynamics are incompletely known. The control action has an oscillatory behavior, but these oscillations are relatively slow and with small magnitude. 2.6
0.2
[g/l]
[1/h]
2.4
0.18
2.2
0.16 0.14
2
0.12
1.8 0.1
1.6 0.08
2
1.4
0.06
1
1.2
0.04
1
0.02
Time [h] 0.8 0
10
20
30
40
50
60
Fig. 2 The controlled output evolution (reference (1) and controlled output (2)).
10
20
30
40
50
60
70
Fig. 3 The nonlinear model predictive control action (dilution rate D).
0.04
0.7
[g/l h]
[g/l h] 0.6
Time [h]
0 0
70
0.035
1
1
0.03 0.5 2
0.025
2
0.4
0.02 0.3
0.015 0.2
0.01
0.1
0.005 Time [h]
Time [h] 0 0
10
20
30
40
50
60
70
0 0
10
20
30
40
50
60
70
Fig. 4 The real reaction rate Φ1 (1) versus Fig. 5. The real reaction rate Φ 2 (1) versus the function provided by the RBNN (2). the function provided by the RBNN (2).
Neural Network Model Predictive Control of a Wastewater Treatment Bioprocess
199
5 Conclusion In this paper, a nonlinear model predictive control strategy was developed for a wastewater treatment bioprocess. The nonlinear model used by the control algorithm was obtained using the analytical description of the biochemical reactions. The unknown reaction rates are estimated using radial basis neural networks. The nonlinear model states are used to calculate the optimal control signal applied to the system. The optimization problem was solved using the iterative Levenberg-Marquardt algorithm. The main goal of feedback control was to maintain a low pollution level in the case of an anaerobic bioprocess with strongly nonlinear and not exactly known dynamical kinetics. The obtained results are quite encouraging from a simulation viewpoint and show good tracking precision. The numerical simulations show that the use of the nonlinear model predictive control strategy leads to a good control performance. Acknowledgments. This work was supported by CNCSIS-UEFISCDI Romania, project number PN II-RU TE 106/2010.
References 1. Bastin, G., Dochain, D.: On-line Estimation and Adaptive Control of Bioreactors. Elsevier, Amsterdam (1990) 2. Bastin, G.: Nonlinear and adaptive control in biotechnology: a tutorial. In: Proc. ECC 1991 Conf., Grenoble, pp. 2001–2012 (1991) 3. Selişteanu, D., Petre, E.: Vibrational control of a class of bioprocesses. Contr. Eng. and App. Inf. 3(1), 39–50 (2001) 4. Camacho, E.F., Bordons, C.: Model Predictive Control, 2nd edn. Springer, Heidelberg (2004) 5. Hayakawa, T., Haddad, W.M., Hovakimyan, N.: Neural network adaptive control for a class of nonlinear uncertain dynamical systems with asymptotic stability guarantees. IEEE Trans. on Neural Networks 19, 80–89 (2008) 6. Petre, E., Selişteanu, D., Şendrescu, D.: Neural Networks Based Adaptive Control for a Class of Time Varying Nonlinear Processes. In: Int. Conf. on Control, Automation and Systems ICCAS 2008, COEX, Seoul, Korea, October 14-17, pp. 1355–1360 (2008) 7. Funahashi, K.: On the approximate realization of continuous mappings by neural networks. Neural Networks 2, 183–192 (1989) 8. Yu, W., Li, X.: Some new results on system identification with dynamic neural networks. IEEE Trans. Neural Networks 12(2), 412–417 (2001) 9. Isidori, A.: Nonlinear Control Systems, 3rd edn. Springer, Berlin (1995) 10. Petre, E.: Nonlinear Control Systems – Applications in Biotechnology, 2nd edn. Universitaria, Craiova (2008) (in Romanian) 11. Dochain, D., Vanrolleghem, P.: Dynamical Modelling and Estimation in Wastewater Treatment Processes. IWA Publishing (2001) 12. Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36, 789–814 (2000)
200
D. Şendrescu et al.
13. Spooner, J.T., Passino, K.M.: Decentralized adaptive control of nonlinear systems using radial basis neural networks. IEEE Trans. on Autom. Control 44(11), 2050–2057 (1999) 14. Eaton, J.W., Rawlings, J.R.: Feedback control of nonlinear processes using online optimization techniques. Computers and Chemical Engineering 14, 469–479 (1990) 15. Wang, Y., Boyd, S.: Fast model predictive control using online optimization. In: Proc. of the 17th World Congress of International Federation of Automatic Control, WCIFAC 2008 (2008) 16. Kouvaritakis, B., Cannon, M.: Nonlinear Predictive Control: Theory and Practice. IEE (2001) 17. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, Heidelberg (1999) 18. Petre, E., Selişteanu, D., Şendrescu, D.: Adaptive control strategies for a class of anaerobic depollution bioprocesses. In: Proc. of Int. Conf. on Automation, Quality and Testing, Robotics, Cluj-Napoca, Romania, Tome II, May 22-25, pp. 159–164 (2008)
Neural Networks Based Adaptive Control of a Fermentation Bioprocess for Lactic Acid Production Emil Petre, Dan Selişteanu, and Dorin Şendrescu
*
Abstract. This work deals with the design and analysis of some nonlinear and neural adaptive control strategy for a lactic acid production that is carried out in continuous stirred tank bioreactors. An indirect adaptive controller based on a dynamical neural network used as on-line approximator to learn the time-varying characteristics of process parameters is developed and then is compared with a classical linearizing controller. The controller design is achieved by using an input-output feedback linearization technique. The effectiveness and performance of both control algorithms are illustrated by numerical simulations applied in the case of a lactic fermentation bioprocess for which kinetic dynamics are strongly nonlinear, time varying and completely unknown. Keywords: Neural networks, Adaptive control, Lactic acid production.
1 Introduction In the last decades, the control of bioprocesses has been a significant problem attracting wide attention, the main engineering motivation being the improvement of operational stability and production efficiency. It is well known that control design involves complicated mathematical analysis and has difficulties in controlling highly nonlinear and time varying plants. A powerful tool for nonlinear controller design is the feedback linearization [1, 2], but the use of it requires the complete knowledge of the process. In practice there are many processes described by highly nonlinear dynamics; for that reason an accurate model for these processes is difficult to develop. Therefore, in recent years, it has been noticed a great progress in development of adaptive and robust adaptive controllers, due to their ability to compensate for both parametric uncertainties and process parameter variations. Recently, also there has been considerable interest in the use of neural networks (NNs) for the identification and control of complex dynamical system [3, 4, 5, 6, 7, 8]. The main advantage of Emil Petre, Dan Selişteanu, and Dorin Şendrescu Department of Automatic Control, University of Craiova, A.I. Cuza 13, Craiova, Romania e-mail: {epetre,dansel,dorins}@automation.ucv.ro *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 201–212. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
202
E. Petre, D. Selişteanu, and D. Şendrescu
using NNs in control applications is based both on their ability to uniformly approximate arbitrary input-output mappings and on their learning capabilities that enable the resulting controller to adapt itself to possible variations in the controlled plant dynamics [5]. More precisely, the variations of plant parameters are transposed in the modification of the NN parameters (i.e. the adaptation of the NN weights). Using the feedback linearization and NNs, several NNs-based adaptive controllers were developed for some classes of uncertain, time varying and nonlinear systems [4, 5, 7, 9]. In this paper, the design and analysis of some nonlinear and NNs control strategies for a lactic fermentation bioprocess are presented. In fact, by using the feedback linearization, the design of a linearizing controller and of an indirect adaptive controller based on a dynamical NN is achieved for a class of square nonlinear systems. The control signals are generated by using a recurrent NN approximation of the functions representing the uncertain or unknown and time varying plant dynamics. Adaptation in this controller requires on-line adjustment of NN weights. The adaptation law is derived in a manner similar to the classical Lyapunov based model reference adaptive control design, where the stability of the closed loop system in the presence of the adaptation law is ensured. The derived control methods are applied to a fermentation bioprocess for lactic acid production, which is characterized by strongly nonlinear, time varying and completely unknown dynamical kinetics.
2 Process Modelling and Control Problem Lactic acid has traditionally been used in the food industry as an acidulating and/or preserving agent, and in the biochemical industry for cosmetic and textile applications [10, 12]. Recently, lactic acid fermentation has received much more attention because of the increasing demand for new biomaterials such as biodegradable and biocompatible polylactic products. Two major factors limit its biosynthesis, affecting growth and productivity: the nutrient limiting conditions and the inhibitory effect caused by lactic acid accumulation in the culture broth [10]. A reliable model that explicitly integrates nutritional factor effects on both growth and lactic acid production in a batch fermentation process implementing Lb. casei was developed by Ben Youssef et al. [10] and it is described by the following differential equations:
X = μX − k d X , P = ν p X , S = −qs X ,
(1)
where X, S and P are, respectively, the concentrations of cells, substrate (glucose) and lactic acid. μ, νp and qs correspond, respectively, to specific growth rate of cells, specific rate of lactic acid production and to specific rate of glucose consumption. kd is the rate of cell death. In (1) the mechanism of cell growth was modelled as follows:
⎞⎛ ⎛ K gc ⎞⎛ S P ⎟⎜1 − gc μ = μ max ⎜⎜ gc P ⎟⎟⎜⎜ gc ⎟ ⎜ ⎝ K P + P ⎠⎝ K S + S ⎠⎝ PC
⎞ ⎟, ⎟ ⎠
(2)
Neural Networks Based Adaptive Control of a Fermentation Bioprocess
203
with μ max the maximum specific growth rate, K Pgc the lactic acid inhibition constant, K Sgc the affinity constant of the growing cells for glucose, and PCgc the critical lactic acid concentration. The superscript gc denotes the parameters related to growing cells and rc to that of the resting cells. The specific lactic acid production and specific consumption rates are given by
ν p = δμ + γ (S /( K Src + S ) ) , q s = ν p / YPS ,
(3)
where δ and γ are positive constants, K Src is the affinity constant of the resting cells for glucose, and YPS represent the substrate to product conversion yield. The kinetic parameters of this model can be readjusted depending on the medium enrichment factor α . So, the nutritional limitation is described by the following hyperbolic type expressions [10]:
μ max =
K gc (α − α 0 ) K rc (α − α 0 ) μ max (α − α 0 ) , K Pgc = P max , K Src = S max , K αP + (α − α 0 ) K αS + (α − α 0 ) K αμ + (α − α 0 )
(4)
with α0 minimal nutritional factor necessary for growth, Kαμ , KαP and KαS saturation constants. μ max , K Pgcmax and K Srcmax correspond to the limit value of each parameter. Since in the process of lactic acid production the main cost of raw material comes from the substrate and nutrient requirements, in [10] some possible continuous-flow control strategies that satisfy the economic aspects were investigated. The advantage of a continuous-flow process is that the main product, which is also an inhibitor, is continuously withdrawn from the system. Much more, according to microbial engineering theory, for a product-inhibited reaction like lactic acid [10] or alcoholic fermentation [11], a multistage system composed of many interconnected continuous stirred tank reactors, where in the different reactors some variables of microbial culture (substrate, metabolites) can be kept closely to some optimal values, may be a good idea. Therefore the model (1)-(4) can be extended to a continuous-flow process that is carried out in two continuous stirred tank reactors sequential connected, as in Fig. 1. For this bioreactor, the mathematical model is given by the following set of differential equations, each stage being of same constant volume V:
First stage : X 1 = ( μ1 − k d ) X 1 − D1 X 1 P = ν X − D P 1
p1
1
1 1
Second stage : X 2 = ( μ 2 − k d ) X 2 + D1 X 1 − ( D1 + D2 ) X 2 P = ν X + D P − ( D + D ) P 2
p2
2
1 1
1
2
2
(5)
S1 = −q s1 X 1 + D S − D1 S1 S 2 = − q s 2 X 2 + D1 S1 + D2 S − ( D1 + D2 ) S 2 α 2 = D1α1 − ( D1 + D2 )α 2 α1 = D12α1in − D1α1 in 11 1
in 2
with D1 = D11 + D12 and where X i , S i and Pi , (i = 1, 2) are, respectively, the concentrations of biomass (cells), substrate and lactic acid in each bioreactor. μ i ,
ν pi and qsi (i = 1, 2) correspond, respectively, to specific growth rate of the cells,
E. Petre, D. Selişteanu, and D. Şendrescu
204
specific rate of lactic acid production and to specific rate of glucose consumption in each bioreactor. D11 is the first-stage dilution rate of a feeding solution at an influent glucose concentration S1in . D12 is the first-stage dilution rate of a feeding solution at an influent enrichment factor α1in . D2 is the influent dilution rate added at the second stage and S 2in is the corresponding feeding glucose concentration. It can be seen that in the second stage no growth factor feeding tank is included since this was already feeding in the first reactor. In the model (5) the mechanism of cell growth, the specific lactic acid production rate and the specific consumption rate are given by [10]:
⎛ KPigc ⎞⎛ Si ⎞⎛ ⎞ ⎟⎜1 − Pi ⎟ ,ν pi = δμi + β ⎛⎜ Si ⎟⎜ gc gc gc ⎟ ⎟ ⎜ ⎜ K rc + S ⎜ ⎟ i ⎝ KPi + Pi ⎠⎝ KS i + Si ⎠⎝ PC ⎠ ⎝ Si
μi = μmax i ⎜⎜
S 1in
α 1in
S 2in
D11
D12
D2
X1, S1, P1 X1, S1, P1
ν ⎞ ⎟ , qsi = pi . ⎟ YPS ⎠
(6)
X2, S2, P2 X2, S2, P2
Fig. 1 A cascade of two reactors for the lactic acid production
The kinetic parameters of this model may be readjusted depending on the medium enrichment factor α i as following [10]:
μ max i =
K gc (α − α 0 ) K rc (α − α 0 ) μ max (α i − α 0 ) , K Pigc = P max i , K Srci = S max i . K αP + (α i − α 0 ) K αS + (α i − α 0 ) K αμ + (α i − α 0 )
(7)
Now, the operating point of the continuous lactic acid fermentation process could be adjusted by acting on at least two control inputs, i.e. the primary and the secondary glucose feeding flow rates D11 and D2 . The number of input variables can be increased by including as a control input the rate of enrichment feeding D12 . As it was already formulated, the control objective consists in adjusting the plant’s load in order to convert the glucose into lactic acid via fermentation, which is directly correlated to the economic aspects of lactic acid production. More exactly, considering that the process model (5)-(7) is incompletely known and its parameters are time varying, the control goal is to maintain the process at some operating points, which correspond to a maximal lactic production rate and a minimal residual glucose concentration. By a process steady-state analysis, it was demonstrated [10] that these desiderata can be satisfied if the operating point is kept around the points S1* = 3 g/l and S 2* = 5 g/l. As control variables we chose the
Neural Networks Based Adaptive Control of a Fermentation Bioprocess
205
dilution rates both in the first and in the second reactors D1 and D2 , respectively. In this way we obtain a multivariable control problem with two inputs: D1 and D2 , and two outputs: S1 and S 2 .
3 Design of Control Strategies Consider the class of multi-input/multi-output square nonlinear dynamical systems (that is, the systems with as many inputs as outputs) of the form [7, 8]: n
x = f ( x ) + ∑ g i ( x )u i = f ( x ) + G ( x )u ; i =1
y = Cx
(8)
with the state x ∈ ℜ n , the input u ∈ ℜ n and the output y ∈ ℜ n . f : ℜ n → ℜ n is an unknown smooth function and G a matrix whose columns are the unknown smooth functions g i ; note that f and g i contain parametric uncertainties which are not necessarily linear parameterizable. C is a n × n constant matrix. For the processes (8) the control objective is to make the output y to track a specified trajectory y ref . The problem is very difficult or even impossible to be solved if the functions f and gi are assumed to be unknown. Therefore, in order to model the nonlinear system (8), dynamical NNs are used. Dynamical neural networks are recurrent, fully interconnected nets, containing dynamical elements in their neurons. They can be described by the following system of coupled first-order differential equations [7, 8]: n
xˆ i = a i xˆ i + bi ∑ wij φ ( xˆ i ) + bi wi ,n +1ψ ( xˆ i )u i , i = 1,..., n j =1
(9)
or compactly xˆ = Axˆ + BWΦ ( xˆ ) + BWn +1Ψ ( xˆ )u;
y N = Cxˆ
(10)
with the state xˆ ∈ ℜ n , the input u ∈ ℜ n , the output y N ∈ ℜ n , W a n × n matrix of adjustable synaptic weights, A - a n × n diagonal matrix with negative eigenvalues ai , B - a n × n diagonal matrix of scalar elements bi , and Wn +1 a n × n diagonal matrix of adjustable synaptic weights: Wn+1 = diag{w1,n +1 " wn ,n +1} . Φ (xˆ ) is a ndimensional vector and Ψ (xˆ ) is a n × n diagonal matrix, with elements the activation functions φ ( xˆ i ) and ψ ( xˆ i ) respectively, usually represented by sigmoids of the form:
φ ( xˆ i ) =
m1 m2 , , ψ ( xˆi ) = ˆ + β i i = 1, ..., n , −δ xˆ 1 + e −δ 2 xi 1+ e 1 i
where m k and δ k , k = 1, 2 are constants, and β i > 0 are constants that shift the sigmoids, such that ψ ( xˆi ) > 0 for all i = 1, ..., n . Next, by using the feedback linearization technique, two nonlinear controllers for the system (8) are presented: a linearizing feedback controller, and a nonlinear
E. Petre, D. Selişteanu, and D. Şendrescu
206
adaptive controller using dynamical neural networks. Firstly, the linearizing feedback controller case is considered, which it is an ideal case, when maximum prior knowledge concerning the process is available. We suppose that the functions f and G in (8) are completely known, the relative degree of differential equations in (8) is equal to 1, and all states are on-line measurable. Assume that we wish to have the following first order linear stable closed loop (process + controller) behaviour: ( y ref − y ) + Λ ( y ref − y ) = 0
(11)
with Λ = diag{ λi }, λi > 0, i = 1,..., n . Then, by combining (8) and (11) one obtains the following multivariable decoupling linearizing feedback control law: u = (CG ( x) )
−1
(− Cf ( x) + ν )
(12)
with (CG ( x)) assumed invertible, which applied to the process (8) result in y = ν , where ν is the new input vector designed as ν = y ref + Λ ( y ref − y ) . The control law (12) leads to the linear error model et = −Λet , where et = y ref − y is the tracking error. For λi > 0 , the error model has an exponential stable point at et = 0 . Because the prior knowledge concerning the process is not realistic, next it will be analyzed a more realistic case, when the model (8) is practically unknown, that is the functions f and G are completely unknown and time varying. To solve the control problem, a NN based adaptive controller will be used. The dynamical NN (10) is used as a model of the process for the control design. Assume that the unknown process (8) can be completely described by a dynamical NN plus a modelling error term ω ( x, u ) . In other words, there exist weight values W * and Wn*+1 such that (8) can be written as: x = Ax + BW *Φ ( x) + BWn*+1Ψ ( x)u + ω ( x, u ); y = Cx .
(13)
It is clear that the tracking problem can be now analyzed for the system (13) instead of (8). Since W * and Wn*+1 are unknown, the solution consists in designing a control law u(W , Wn +1 , x) and appropriate update laws for W and Wn+1 such that the network model output y tracks a reference trajectory y ref . The dynamics of NN model output (13), where the term ω ( x, u ) is assumed to be 0, can be expressed as: y = Cx = CAx + CBW *Φ ( x) + CBWn*+1Ψ ( x)u
(14)
Assume that CBWn*+1Ψ ( x) is invertible, which implies relative degree equal to one for input-output relation (14). Then, the control law (12) is particularized as follows
(
u = CBWn*+1Ψ ( x)
) (− CAx − CBW Φ ( x) + ν ) −1
*
(15)
Neural Networks Based Adaptive Control of a Fermentation Bioprocess
207
where the new input vector ν is defined as ν = y ref + Λ ( y ref − y ) , which applied to the model (14) results in a linear stable system with respect to this input, as y = ν . Defining the tracking error between the reference trajectory and the network output (14), as et = y ref − y , then the control law (15) leads to a linear error model et = − Λet . For λi > 0, i = 1,..., n , the error et converges to the origin exponentially. Note that the control input (15) is applied both to plant and neural model. Now, we can define the error between the identifier (NN) output and real system (ideal identifier) output as em = y N − y = C ( xˆ − x ) . Assuming that the identifier states are closely to process states [7, 8], then from (10) and (13) we obtain the error equation: ~ ~ (16) em = CAC −1em + CBWΦ ( x) + CBWn+1Ψ ( x)u ~ ~ with W = W − W * , Wn+1 = Wn+1 − Wn*+1 . Since control law (15) contains the unknown weight matrices W * and Wn*+1 , this becomes an adaptive control law if these weight matrices are substituting by their on-line estimates calculated by appropriate updating laws. Since we are interested to obtain stable adaptive control laws, a Lyapunov synthesis method is used. Consider the following Lyapunov function candidate: ~ ~ ~ ~ (17) V = (1 / 2) ⋅ e mT Pem + etT Λ−1et + tr{W T W } + tr{WnT+1Wn +1 }
(
)
where P > 0 is chosen to satisfy the Lyapunov equation PA + AT P = − I . Differentiating (17) along the solution of (16), where C is considered to be equal to identity matrix, finally one obtains: ~ ~ ~ ~ V = −1/ 2 ⋅ emT em − etT et + Φ T ( x)W T BPem + uT ΨT (x)Wn+1 BPem + tr{W TW} + tr{WnT+1Wn+1} (18) ~ ~ ~ ~ For tr{W T W } = −Φ T ( x)W T BPem , tr{W nT+1Wn +1 } = −u T Ψ T ( x)Wn +1 BPem , (18) becomes: (19) V = −1 / 2 ⋅ e T e − e T e = −1 / 2⋅ || e || 2 − || e || 2 ≤ 0 m m
t
t
m
t
and consequently, for the network weights the following updating laws are obtained: w ij = −bi piφ ( x j )e mi , i , j = 1, ..., n ; w i , n+1 = −bi piψ ( xi )u i emi , i = 1, ..., n (20) Theorem 1. Consider the control law (15), and the tracking and model errors defined above. The updating laws (20) guarantee the following properties [7, 8]: ~ lim t →∞ et (t ) = 0 ; ii) lim t →∞ W (t ) = 0 , i) lim t →∞ em (t ) = 0 , ~ lim t →∞ Wn +1 (t ) = 0 .
E. Petre, D. Selişteanu, and D. Şendrescu
208
Proof. Since V in (12) is negative semidefinite, then we have V ∈ L∞ , which ~ ~ implies et , em , W , Wn +1 ∈ L∞ . Furthermore, xˆ = x + C −1em is also bounded. Since V is a non-increasing function of time and bounded from below, then there exists lim t →∞ V (t ) = V (∞) . By integrating V from 0 to ∞ we obtain ∞
⎛ 2 2 ⎞ ∫ ⎜ 2 || e m || + || et || ⎟ dt = V (0) − V ( ∞ ) < ∞ 0
1
⎝
⎠
which implies et , em ∈ L2 . By definition, φ ( xi ) and ψ ( xi ) are bounded for all x and by assumption all inputs to the NN, the reference y ref and its time derivative are also bounded. Hence, from (8) we have that u is bounded and from et = −Λet and (9) we conclude that et , em ∈ L∞ . Since et , em ∈ L2 ∩ L∞ and et , em ∈ L∞ , using Barbalat’s Lemma [13], one obtains that lim t →∞ et (t ) = 0 and lim t →∞ em (t ) = 0 . Using now the boundedness of u, Φ ( x), Ψ ( x ) and the convergence of e (t ) to 0, we have that W and W also converge to 0. But we n +1
m
cannot conclude anything about the convergence of weights to their optimal values. In order to guarantee this convergence, u , Φ ( x ), Ψ ( x ) need to satisfy a persistency of excitation condition [8].
4 Control Strategies for Lactic Acid Production Firstly, we consider the ideal case where maximum prior knowledge concerning the process (kinetics, yield coefficients and state variables) is available, and the relative degree of differential equations in process model is equal to 1. Assume that for the two interconnected reactors we wish to have the following first order linear stable closed loop (process + controller) behaviour: d ⎡ S1* − S1 ⎤ ⎡λ1 0 ⎤ ⎡ S1* − S1 ⎤ ⎢ ⎥+ ⎢ ⎥ = 0 , λ1 , λ2 > 0 , dt ⎣ S 2* − S 2 ⎦ ⎢⎣ 0 λ 2 ⎥⎦ ⎣ S 2* − S 2 ⎦
(21)
where S1* and S 2* are the desired values of S1 and S 2 . Since the dynamics of S1 and S 2 in the process model (5) have the relative degree equal to 1, these can be considered as an input-output model and can be rewritten in the following form: d ⎡ S1 ⎤ ⎡ − q s1 X 1 ⎤ ⎡− D12 S1in ⎤ ⎡ S1in − S1 0 ⎤ ⎡ D1 ⎤ (22) +⎢ . =⎢ +⎢ in ⎥ ⎢ ⎥ ⎥ − S q X dt ⎣ 2 ⎦ ⎣ s 2 2 ⎦ ⎣ 0 ⎦ ⎣ S1 − S 2 S 2 − S 2 ⎥⎦ ⎢⎣ D2 ⎥⎦ Then from (21) and (22) one obtains the following exactly multivariable decoupling linearizing feedback control law [12]: ⎡ D1 ⎤ ⎡ S1in − S1 0 ⎤ ⎢ D ⎥ = ⎢ S − S S in − S ⎥ ⎣ 2⎦ ⎣ 1 2 2 2⎦
−1
⎧⎪⎡S1* ⎤ ⎡λ1 0 ⎤ ⎡ S1* − S1 ⎤ ⎡ qs1 X 1 ⎤ ⎡ D S in ⎤ ⎫ + 12 1 ⎬ (23) ⎨⎢ * ⎥ + ⎢ ⎥+ ⎥⎢ * ⎪⎩⎣S 2 ⎦ ⎣ 0 λ2 ⎦ ⎣S 2 − S 2 ⎦ ⎢⎣qs 2 X 2 ⎥⎦ ⎢⎣ 0 ⎥⎦ ⎭
Neural Networks Based Adaptive Control of a Fermentation Bioprocess
209
⎡ S in − S1 0 ⎤ where the decoupling matrix ⎢ 1 in ⎥ remains invertible as long as S S S − 2 2 − S2 ⎦ ⎣ 1 S1 < S1in and S 2 < S 2in (conditions satisfied in a normal operation of the two reactors). The control law (23) applied to process (22) leads to the following two linear error models e1 = −λ1e1 , e2 = −λ2 e2 , where e1 = S1* − S1 and e2 = S 2* − S 2 represent the tracking errors, which for λ1 , λ 2 > 0 have an exponential stable point at origin. Since the prior knowledge concerning the process previously assumed is not realistic, now we will analyze a more realistic case, where the process dynamics are incompletely known and time varying. We will assume that the reaction rates qs1 X 1 and qs 2 X 2 are completely unknown and can by expressed as:
qs1 X 1 = ρ1 , qs 2 X 2 = ρ 2 ,
(24)
where ρ1 and ρ 2 are considered two unknown and time-varying parameters. In this case, the control law (23) becomes: ⎡ D1 ⎤ ⎡S1in − S1 0 ⎤ ⎢ D ⎥ = ⎢ S − S S in − S ⎥ ⎣ 2⎦ ⎣ 1 2 2 2⎦
−1
⎧⎪⎡ S1* ⎤ ⎡λ1 0 ⎤ ⎡ S1* − S1 ⎤ ⎡ ρ1 ⎤ ⎡ D12 S1in ⎤ ⎫⎪ ⎨⎢ * ⎥ + ⎢ ⎥+⎢ ⎥+⎢ ⎥⎬ . ⎥⎢ * ⎪⎩⎣ S 2 ⎦ ⎣ 0 λ2 ⎦ ⎣ S 2 − S 2 ⎦ ⎣ ρ 2 ⎦ ⎣ 0 ⎦ ⎪⎭
(25)
Since the control law (25) contains the unknown parameters ρ1 and ρ 2 , these will be substituted by their on-line estimates ρˆ1 and ρˆ 2 calculated by using a dynamical neural network (10), whose structure for this case is particularized as follows:
ρˆ i (t ) = ai ρˆ i + bi ∑ wijφ ( ρˆ j ) + bi wi , n +1ψ ( ρˆ i ) Di ; i, j = 1, 2 . n
j =1
(26)
So, D1 and D2 in (25) are modified so that the estimations ρˆ1 (t ) and ρˆ 2 (t ) are used in place of ρ1 and ρ2. The parameters wij and wi , n +1 are adjusted using the adaptation laws (20).
5 Simulation Results and Comments The performance of designed neural adaptive controller (25), (26), by comparison to the exactly linearizing controller (23) (which yields the best response and can be used as benchmark), has been tested by performing extensive simulation experiments, carried out by using the process model (5)-(7) under identical and realistic conditions. 0 The values of the kinetic parameters used in simulations are [11]: μ max = 0.45 h-1,
K Sgc = 0.5 g/l, K Srcmax = 12 g/l, K Pgcmax = 15 g/l, δ = 3.5, γ = 0.9 h-1, α0 = 0.02 g/l, K αμ = 0.2 g/l, K αP = 1.1 g/l, K αS = 4 g/l, PCgc = 95 g/l, YPS = 0.98 g/g, k d = 0.02 h-1, in in D12=0.002 h-1, S10in = 50 g/l, S 20 = 200 g/l, α10 = 6 g/l.
E. Petre, D. Selişteanu, and D. Şendrescu
210
The system’s behaviour was analyzed assuming that the influent glucose concentrations in the two feeding substrates act as perturbations of the form: in S1in (t ) = S10in ⋅ (1 + 0.25 sin(π t / 25)) and S 2in (t ) = S 20 ⋅ (1 − 0.25 cos(π t / 50)) , and the 0 (1 + 0.25 sin(π t / 40)) . kinetic coefficient μ max is time-varying as μ max (t ) = μ max The behaviour of closed-loop system using NN adaptive controller, by comparison to the exactly linearizing law is presented in Fig. 2. The graphics from the left figure show the time evolution of the two controlled variables S1 and S2 respectively, and the graphics from the right figure correspond to control inputs D1 and D2, respectively. In order to test the behaviour of the indirect adaptive controlled system in more realistic circumstances, we considered that the measurements of both controlled variables (S1 and S2) are corrupted with an additive white noise with zero average (5% from their nominal values). The simulation results in this case, conducted in the same conditions are presented in Fig. 3. The behaviour of controlled variables and of control inputs is comparable with the results obtained in the free noise simulation. The time evolution of the estimates of unknown functions (24) provided by the recurrent NN estimator is presented in Fig. 4, in both simulation cases.
0.05
4
S 2* 2
Control inputs D1, D2 (h-1)
Controlled outputs S1, S2 (g/l)
4.5
1
3.5 3 2 2.5 1
S1*
2 1.5
0
100
200
300
400
500
600
700
2 0.04
D1
0.02 1
0.01
0
800
1
0.03
D2 2 0
100
200
300
Time (h)
400
500
600
700
800
Time (h)
Fig. 2 Simulation results – neural adaptive control (2) versus exactly linearizing control (1) 0.05
4
S 2*
1
2
Control inputs D1, D2 (h-1)
Controlled outputs S1, S2 (g/l)
4.5
3.5 3 2 2.5 1
S1*
2 1.5
0
100
200
300
400
Time (h)
500
600
700
800
2 0.04
D1
1
0.03
0.02 1 0.01
0
D2 2 0
100
200
300
400
500
600
700
800
Time (h)
Fig. 3 Simulation results – neural adaptive control (noisy data) (2) versus linearizing control (1)
Neural Networks Based Adaptive Control of a Fermentation Bioprocess 2.5
Estimated parameter ρ2 (g/lh)
Estimated parameter ρ1 (g/lh)
3 2.5 2 1.5 1 1 2 0.5 0
211
0
100
200
300
400
Time (h)
500
600
700
800
2 2 1.5
1 1 0.5
0
0
100
200
300
400
500
600
700
800
Time (h)
Fig. 4 Estimates of unknown functions: (1) - without, and (2) - with noisy measurements
It can be noticed from Fig. 4 that the time evolution of estimates for noisy measurements of S1 and S2 is similar with the time profiles in free noise case. From graphics in Fig. 2 and Fig. 3 it can be seen that the behaviour of overall system with indirect adaptive controller, even if this controller uses much less a priori information, is good, being very close to the behaviour of closed loop system in the ideal case when the process model is completely known. Note also the regulation properties and ability of the controller to maintain the controlled outputs S1 and S2 very close to their desired values, despite the high load variations for S1in and S 2in , time variation of process parameters and the influence of noisy measurements. The gains of control laws (23), respectively (25) are λ1 = 0.55, λ 2 = 0.85. For the NN adaptive controller the initial values of the weights are set to 0.5 and the design parameters were chosen as: m1 = 180, m2 = 180, δ 1 = δ 2 = 0.1, β1 = β 2 = 0.2, a1 = a2 = −12, b1 = b2 = 0.01, p1 = p 2 = 2.5 . It must be noted that a preliminary tuning for the NN controller is not necessary. It can be concluded that when process nonlinearities are not completely known and bioprocess dynamics are time varying, the NN adaptive controllers are viable alternatives. As a future problem remains the controller design when the real plant is of higher order then assumed, or in the presence of other unmodelled dynamics.
6 Conclusion An indirect NN adaptive control strategy for a nonlinear system for which the dynamics is incompletely known and time varying was presented. The controller design is based on the input-output linearizing technique. The unknown controller functions are approximated using a dynamical neural network. The form of the controller and the neural controller adaptation laws were derived from a Lyapunov analysis of stability. It was demonstrated that under certain conditions, all the controller parameters remain bounded and the plant output tracks the output of a linear reference model. The simulation results showed that the performances of the proposed adaptive controller are very good.
212
E. Petre, D. Selişteanu, and D. Şendrescu
Acknowledgments. This work was supported by CNCSIS–UEFISCSU, Romania, project number PNII–IDEI 548/2008.
References 1. Isidori, A.: Nonlinear Control Systems, 3rd edn. Springer, Berlin (1995) 2. Sastry, S., Isidori, A.: Adaptive control of linearizable systems. IEEE Trans. Autom. Control 34(11), 1123–1131 (1989) 3. Diao, Y., Passino, K.M.: Stable adaptive control of feedback linearizable time-varying nonlinear systems with application to fault-tolerant engine control. Int. J. Control 77(17), 1463–1480 (2004) 4. Fidan, B., Zhang, Y., Ioannou, P.A.: Adaptive control of a class of slowly time varying systems with modelling uncertainties. IEEE Trans. Autom. Control 50, 915–920 (2005) 5. Hayakawa, T., Haddad, W.M., Hovakimyan, N.: Neural network adaptive control for a class of nonlinear uncertain dynamical systems with asymptotic stability guarantees. IEEE Trans. Neural Netw. 19, 80–89 (2008) 6. McLain, R.B., Henson, M.A., Pottmann, M.: Direct adaptive control of partially known non-linear systems. IEEE Trans. Neural Netw. 10(3), 714–721 (1999) 7. Petre, E., Selişteanu, D., Şendrescu, D., Ionete, C.: Neural networks-based adaptive control for a class of nonlinear bioprocesses. Neural Comput. & Applic. 19(2), 169– 178 (2010) 8. Rovithakis, G.A., Christodoulou, M.A.: Direct adaptive regulation of unknown nonlinear dynamical systems via dynamic neural networks. IEEE Trans. Syst. Man, Cybern. 25, 1578–1594 (1995) 9. Petre, E., Selişteanu, D., Şendrescu, D.: Neural networks based adaptive control for a class of time varying nonlinear processes. In: Proc. Int. Conf. Control, Autom. and Systems ICCAS 2008, Seoul, Korea, pp. 1355–1360 (2008) 10. Ben Youssef, C., Guillou, V., Olmos-Dichara, A.: Modelling and adaptive control strategy in a lactic fermentation process. Control Eng. Practice 8, 1297–1307 (2000) 11. Dahhou, B., Roux, G., Cheruy, A.: Linear and non-linear adaptive control of alcoholic fermentation process: experimental results. Int. J. Adapt. Control and Signal Process. 7, 213–223 (1973) 12. Petre, E., Selişteanu, D., Şendrescu, D.: An indirect adaptive control strategy for a lactic fermentation bioprocess. In: Proc. IEEE Int. Conf. Autom., Quality and Testing, Robotics, Cluj-Napoca, Romania, May 28-30, pp. 175–180 (2010) 13. Sastry, S., Bodson, M.: Adaptive control: Stability, Convergence and Robustness. Prentice-Hall International Inc., Englewood Cliffs (1989)
New Evaluation Method for Imperfect Alternative Matrix Toshimasa Ozaki*, Kanna Miwa, Akihiro Itoh, Mei-Chen Lo, Eizo Kinoshita, and Gwo-Hshiung Tzeng *
Abstract. In the presumption of the missing values for the imperfect alternative matrix, there are two methods; the Harker method and the Nishizawa method. However, it is often difficult to determine which method is appropriate. This paper focuses on the decision-making process of the Analytical Network Process (ANP) by examining the evaluation matrix as an imperfect matrix. The proposed method is composed of the alternative matrix and the criterion matrix, which is based on the matrix inverse of the alternative matrix, and presumes the missing values in the four by four matrix from the eigenvector. The same estimate obtained by the Harker method is stably obtained by artificially providing information in the imperfect matrix. Furthermore, the essence of the decision-making is considered through these examination processes. Keywords: Decision analysis, AHP, ANP, Harker method, Imperfect matrix.
1 Introduction In the Analytical Hierarchy Process (AHP), it is the assumption that the evaluation matrix does not have the missing value. However, the imperfect matrix is inevitably caused when pair comparison is difficult. Harker (1987) and Nishizawa (2005) Toshimasa Ozaki · Kanna Miwa · Akihiro Itoh Faculty of Commerce, Nagoyagakuin University, Nagoya, Japan *
Mei-Chen Lo Department of Business Management, National United University, Miaoli, Taiwan Eizo Kinoshita Faculty of Urban Science, Meijo University, Japan Mei-Chen Lo · Gwo-Hshiung Tzeng Institute of Project Management, Kainan University, Taoyuan, Taiwan Mei-Chen Lo · Gwo-Hshiung Tzeng Institute of Management of Technology, Chiao Tung University, Hsinchu, Taiwan * Corresponding author. J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 213–222. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
214
T. Ozaki et al.
et al. proposed the methods of presuming the missing values of such as the matrix. Though we can presume the values even by the Harker method and the Nishizawa method, we hesitate which method is appropriate when the value of C.I. (Consistency Index) defined by Saaty differs. Authors (Ozaki et al., 2009a, 2009b, 2010a, 2010b) found out the method of solving the simple dilemma in the imperfect matrix that had the missing values by using the ANP, and they have proposed applying this method to the evaluation matrix of the AHP. However, although the missing values can be presumed by other methods, they cannot be presumed by this proposed method because it has instability of solution. This paper proposes the way to address the faults of this method by resolving the decision-making problem of the ANP and the imperfect alternative matrix.
2 Proposed Method and the Problem 2.1 Priority of the Alternative Matrix with the Missing Values Sugiura et al. (2005) set the example of the alternative matrix with the missing values shown in Table 1, and prioritized the evaluation procession. Table 1 Example that cannot be compared Evaluation Evaluation Evaluation Kodama
a1
Hikari
a2
Nozomi
b1
b2
c1 c2
On the other hand, authors (Ozaki et al., 2009a, 2010a, 2010b) defined the criterion matrix W by taking the inverse of the alternative matrix U in which the positions of the missing values were replaced with zeros, and prioritize the three alternatives of "Kodama", "Hikari", and "Nozomi" using the ANP. ⎡ a1 b1 0 ⎤ U = ⎢a2 0 c1 ⎥ ⎥ ⎢ ⎢⎣ 0 b2 c2 ⎥⎦
0 ⎤ ⎡ b2c1 b1c2 1 ⎢ W= 0 a1c1 ⎥ a2c2 ⎥ a2b1c2 + a1b2c1 ⎢ ⎢⎣ 0 a1b2 a2b1 ⎥⎦
(1)
(2)
It is shown that this eigenvector of the multiple product UW (the agreement matrix) is the same as the eigenvector obtained by Sugiura et al.. It is suggested this evaluation method by the ANP that authors presume the missing values by considering as a decision-making for the imperfect alternative matrix.
New Evaluation Method for Imperfect Alternative Matrix
215
2.2 Proposal of the ABIA Method and Its Problem Now, we name the above method the ABIA method (ANP Based on Inverse Alternative), and consider the imperfect matrix U with the missing elements. Some examples are shown in Table 2 where the missing values are arbitrarily generated. The table shows the differences from that obtained by our method and those obtained by the Harker and Nishizawa. We simply utilize our proposed method to have the comparison with those mentioned three methods. Following this, the missing values are possible to presume by the Harker method, instead of those might be nearly impossible to presume by the ABIA method and the Nishizawa method. Table 2 Examples of comparison with the other method 1
Example U
Missing values Our's Harker's Nishizawa's Example U
Missing values Our's Harker's Nishizawa's
1 1/2 1/5 1/4 a
2 1 1/ a 1/3
2 5 a 1 1/4
4 3 4 1
1 1/2 1/5 1/4 a
2 1 1/2 1/ a
3 5 2 1 1/4
4 a 4 1
1 1/2 1/5 1/4 a
1.369
4
1.190
3.468
1.088
1.369
4
1.095
a 1 1/2 1/3 b
Solution none 3.135
12.992
Solution none
5 5 2 1 1/4
b 3 4 1
1 1/2 1/5 1/ b a
5 2 1 1/ a
4 3 a 1
a c 1 1/4 c
b 3 4 1
1.095
4 1 1/ a 1/5 1/ b a
2 1 1/2 1/3
2 1 1/ a 1/3 b
6 5 b 1 1/4
a 3 4 1
1 1/2 1/ a 1/ b a
2 1 1/ b 1/3 b
10.929
1.369
1.500
3.709
1.214
10.961
1.370
1.501
6.005
0.750
10.954
1.369
Solution none
Nevertheless, the ABIA method does not give the stable estimates and same solution for the Harker for an imperfect evaluation matrix. Therefore, we focus on an imperfect alternative matrix of four by four, and resolve the problems with the ABIA method.
3 Proposal of the P-ABIA Method In the ABIA method, the agreement point of these two matrices is shown by the eigenvector of the ANP, which is composed of the imperfect alternative matrix and the criterion matrix based on the inverse alternative matrix. However, an unanticipated problem seems to occur by the ABIA method because the matrix with the missing values has not been treated in the ANP. Then we propose the revised statutes named P-ABIA method (Power Method of ANP Based on Inverse Alternative) in this section.
216
T. Ozaki et al.
For the solution described above, the more we have the missing values, the more increased we have zeros to the element of the agreement matrix. The principal eigenvector is difficult to calculate mathematically, even with the Harker method. In addition the P-ABIA method is judged from C.I.. Therefore, the presumed values by the P-ABIA method are compared with those obtained by the Harker method according to the example of the numerical values.
3.1 Numerical Example 1 The missing values to be presumed are shown as Eq. (3), and those values are presumed to be a=3.315 and b=12.99 respectively by the Harker method (See Example 4 in Table 2). Furthermore, the maximum eigenvalue of the alternative matrix for which the values are substituted is 4.083, and the value of C.I. is 0.028.
a 5 ⎡ 1 ⎢1 / a 1 2 U =⎢ 1 ⎢1 / 5 1 / 2 ⎢ ⎣1 / b 1 / 3 1 / 4
b⎤ 3⎥ ⎥ 4⎥ ⎥ 1⎦
(3)
Because the eigenvector of the agreement matrix UW cannot be determined by the ABIA method, this case is an example in which the missing values cannot be presumed. 4 ⎡ 1 ⎢ 0 1 UW = ⎢ ⎢1 / 5 4 / 5 ⎢ 0 ⎣ 0
0 − 12 ⎤ 0 0 ⎥ ⎥ 1 − 12 / 5⎥ ⎥ 0 1 ⎦
(4)
We can derive det(UW − λI ) = (1 − λ )4 as the characteristic polynomial of Eq.
(4) using the eigenvalue λ so that the eigenvalue is the multiplicity of one. However, the eigenvalue exists from first row and third row of the agreement matrix of Eq. (4) in the closed discs of 8 and 7/5 in the radius in the complex plane that centers on one according to the theorem of Gershgorin, this existence of the eigenvalue is denied because of the multiplicity of one. Furthermore, the eigenvalue cannot exist from both second row and forth row of the agreement matrix of Eq. (4) because of the radius of zero in the complex plane. This is a reason the eigenvector of Eq. (4) cannot be calculated. The missing values can be presumed by the agreement between the imperfect matrix U and the criterion matrix W in the ABIA method, so that the agreement of both becomes difficult to obtain when zero increases to the element of the agreement matrix UW like this. That is, it is possible to agree by agreement UW if there is some information in a or b in the alternative matrix Eq. (3). Then, we propose the P-ABIA method to which some information x is artificially provided to the missing value of the ABIA method, and assume that the information x is uncertain. There are two cases with a=x, and b=x in this example.
New Evaluation Method for Imperfect Alternative Matrix
217
(a) a=x Because the missing value is only by artificially providing information x in this case, the agreement matrix UW is as follows: ⎡ ⎢ 1 ⎢ 3 ⎢ 20 UW = ⎢ 3 x + ⎢ 4 ⎢ 3 x + 20 ⎢ 1 ⎢ ⎣ 3 x + 20
60 x ⎤ 3 x + 20 ⎥ ⎥ 60 ⎥ 3 x + 20 ⎥ 60 x ⎥ 5(3 x + 20 ) ⎥ ⎥ 1 ⎥ ⎦
0 0 1 0 0 1 0 0
(5)
.
Because the components of the eigenvector z of this agreement matrix UW can be shown to be z1=1, z2= ( 15 x + 10 ) / 10 x , and z4= 1 / 2 15 x as an analytical solution when the eigenvalue is 1 + 2 15 x / (3 x + 20) , the missing value b is obtained to be 2 15 x . Therefore, the missing value becomes b that can be rewritten as 2 15 x when assuming a=x. We substitute these values for Eq. (3), define the eigenvector to be x, and search for the value of x that minimizes the maximum eigenvalue of the evaluation matrix. The maximum eigenvalue λ is shown as the following equation when the first row of Ux = λx is expanded:
λ = 1 + 2 /( 4 240 ) + 4 6 / 4 10 15 ⋅ 8 x 3 + 5 * 4 2 / 5 / 4 10 15 ⋅ 8 x −3 .
(6)
Because dλ dx = 0 , we obtain x=3.466807: the presumed values become a=3.4668, and b=14.4225: and the validity of these values can be judged from C.I.=0.027. (b) b=x Because artificially providing the missing value b information x in this case, the eigenvector z of the agreement matrix UW can be shown to be z1=1, z2= 6 / 5 x . ⎡ ⎢ 1 ⎢ 6 ⎢ UW = ⎢ 2 x + 15 ⎢ 3 ⎢ 2 x + 15 ⎢ 2 ⎢ ⎣ 2 x + 15
5x 2 x + 15 1 x 2 x + 15 5 2 x + 15
⎤ 0 0⎥ ⎥ 0 0⎥ . ⎥ 1 0⎥ ⎥ ⎥ 0 1⎥ ⎦
(7)
Therefore, the missing value becomes a that can be rewritten as z1/z2= 5 x / 6 when assuming b=x. We substitute a and b for Eq. (3), and search for x
218
T. Ozaki et al.
that minimizes the maximum eigenvalue of Eq. (3). The maximum eigenvalue λ is shown as the following equation when the first row of Ux = λx is expanded:
λ = 2 + 5 5−1 ⋅ 4 2 6 / 5 ⋅ 8 x −3 + −4 60 5 / 6 ⋅ 8 x 3 .
(8)
Because dλ dx = 0 becomes x=14.4225, the presumed values become a=3.4668, and b=14.4225. Though the estimates of a and b is higher than that obtained by the Harker method in the P-ABIA method, the value of C.I. is just a little lower than that obtained by the Harker method.
3.2 Numerical Example 2 In this case, missing values are three pieces (a, b, and c). We show the imperfect alternative matrix U on the left, the criterion matrix W in the center, and the agreement matrix on the right. 0 2 a b⎤ ⎡1 ⎡ 1 ⎢0 1 ⎢1 / 2 1 c 3⎥ ⎥ W =⎢ U =⎢ ⎢0 1 4⎥ ⎢1 / a 1 / c 0 ⎢ ⎢ ⎥ ⎣1 / b 1 / 3 1 / 4 1 ⎦ ⎣0 1 / 3
0 0 6⎤ ⎡ 1 0 0⎤ ⎢1 / 2 1 ⎥ 0 3⎥ 0 3⎥ ⎥ UW = ⎢ ⎢ 0 4 / 3 1 0⎥ 1 0⎥ ⎢ ⎥ ⎥ 0 1⎦ ⎣ 0 1 / 3 1 / 4 1⎦
(9)
The missing values of a, b, and c become 1.5, 6, and 0.75 obtained by the Harker method, and the maximum eigenvalue is 4. However, their values become 1.5, 3.7308, and 1.214 obtained by the ABIA method, and the maximum eigenvalue is 4.09. The estimates obtained by the ABIA method are different from values obtained by the Harker method, and the value of C.I. is inferior. Though there is many zero in the element of the agreement matrix UW as well as the Numerical Example 1 in this case, the eigenvalue and eigenvector exist from the closed disc in the radius in the complex plane that centers on one according to the theorem of Gershgorin. Then, the next three cases are examined in the P-ABIA method as follows: Case 1 2 0 ⎡ 1 ⎢1 / 2 1 y Ua = ⎢ ⎢ 0 1/ y 1 ⎢ ⎣1 / x 1 / 3 1 / 4
Case 2 x⎤ x 2 ⎡ 1 ⎢1 / 2 1 3⎥⎥ y , Ub = ⎢ 4⎥ ⎢1 / x 1 / y 1 ⎥ ⎢ 1⎦ ⎣ 0 1/ 3 1/ 4
Case 3 2 x y ⎤ (10) ⎡ 1 0⎤ ⎢1 / 2 1 ⎥ 0 3 ⎥⎥ 3⎥ , Uc = ⎢ ⎢1 / x 0 1 4⎥ 4⎥ ⎥ ⎢ ⎥ 1 / y 1 / 3 1 / 4 1⎦ . 1⎦ ⎣
(a) Case 1 The value a is obtained to be z1/z3= xy / 2 from the eigenvector z of the agreement matrix UaWa when assuming b=x and c=y. The missing values of a, b, and c are substituted for Case1 in Eq. (10), x, and y that minimize the maximum eigenvalue of Case 1 are searched. The eigenvector of Case 1 is assumed to be x,
New Evaluation Method for Imperfect Alternative Matrix
219
and then the first and third rows Ux = λx are considered. Therefore, 2532y=x is obtained from the first row as ∂λ ∂x = 0 , and 32x=27y is obtained from the third row as ∂λ ∂y = 0 . The values x=6 and y=3/4 are obtained from these equations, and another missing value, a=3/2, is obtained from x and y. (b) Case 2 The value b is obtained to be z1/z4= 2 6x from the eigenvector z of the agreement matrix UbWb when assuming a=x and c=y. The missing values of a, b, and c are substituted for Case 2 in Eq. (10), x, and y that minimize the maximum eigenvalue of Case 2 are determined. The eigenvector of Case 2 is assumed to be x, and the first and fourth rows of Ux = λx are considered. Therefore, x3=6y2 is obtained from the first row as ∂λ ∂x = 0 , and 25xy2=33 is obtained from the fourth row as ∂λ ∂y = 0 . The values x=6 and y=3/4 are obtained from these equations, and another missing value, a=3/2, is obtained from x and y. (c) Case3 The value c is obtained to be z2/z4= 6 x / 4 from the eigenvector z of the agreement matrix UcWc when assuming a=x and b=y. The eigenvector of Case 3 is assumed to be x, and the second and third rows of Ux = λx are considered. Then, xy2=2133 and 3y2=25x3 are obtained from the second and thirds rows as ∂λ ∂x = 0 and ∂λ ∂y = 0 . Therefore, the values x=1.5, and y=6 are obtained from these equations, and another missing value, c=3/4, is obtained from x and y. The maximum eigenvalue of the alternative matrix for which a, b, and c are substituted is 4, and the C.I. is minimized. Estimated value of a, b, and c obtained by the ABIA method are corrected by the P-ABIA method.
4 Discussion of the Proposed Method (1) Imperfect alternative matrix with one loss in the rectangle We examine the P-ABIA method for the imperfect alternative matrix U with only one missing value ui,j(i<j) in the rectangle. Here, the missing element is assumed to be zero because it cannot be decided. Now, the missing value ui,j is replaced with a.
ª 1 «1 / a 12 U =« «1 / a13 « ¬1 / a14
a12 1 0 1 / a 24
"
a14 º 0 a 24 » » 1 a 34 » » " 1 ¼
(11)
220
T. Ozaki et al.
In the ABIA method, a=zi/zj= aik ail a jk −1a jl −1 (i<j,k
λ = 1 + a 4 aik −1ail −1a jk a jl + 2 4 a −1 8 aik ail a jk −1a jl −1
(12)
Because the missing value a, which fulfills dλ / da = 0 , becomes aik ail a jk −1a jl −1 , the presumed values are those in which the maximum eigenvalue of the alternative matrix with no missing values is assumed to be minimized. Therefore, the validity of the presumed values can be judged with the C.I. of the alternative matrix with no missing values. In the imperfect alternative matrix whose rectangular loss is one piece, the same values as the Harker method are presumed. (2) Validity of P-ABIA method The examples that have the faults are Eq. (4) and Eq. (9), while, the same eigenvectors as the Harker method are Eq. (5) and Eq. (7). If the missing value of the ith column and the jth row is assumed to be ui,j, many zeros are caused in the element wi,j, in the former, when the agreement matrix UW of both are compared. While, all elements of the ith column and the jth row of the agreement matrix are filled with numerical values other than zero in the latter, and a clear difference is seen in the form of the matrix UW. Because only one loss is assumed in the rectangle of the imperfect alternative matrix, the eigenvalue exists from the theorem of Gershgorin in Eq. (5) and Eq. (7). That is, the eigenvector of UW is obtained stably using the P-ABIA method. Nevertheless the agreement point is found between U and W in the ABIA method, a normal evaluation cannot be done when zeros increase in the element of the agreement matrix; the negotiation as both sides broke down in the Numerical Example 1, while, the concession becomes to the evaluation matrix U in the Numerical Example 2. However, the Numerical Example 1 and the Numerical Example 2 show that there is not the reasonable agreement if the appropriate criterion matrix W cannot be obtained. The number of loss in the rectangle is adjusted to one piece by artificially giving information on the missing values in the P-ABIA method. Therefore, the reasonable decision-making can be possible by using the P-ABIA method based on the ABIA method.
New Evaluation Method for Imperfect Alternative Matrix
221
(3) Eigenvector in the P-ABIA method When one value is missing in the rectangle of the alternative matrix, the solution in the P-ABIA method is presumed from the eigenvector z to the agreement matrix UW. Nevertheless, the solution is only related to two columns in Eq. (10). When the element ui,j of U is assumed to be zero, the element wi,j of W becomes zero as well. Elements outside of the i th and j th columns are elements of the identity matrix when considering the element of UW. There are six types of missing values of U when assuming i<j as follows: u1,2, u1,3, u1,4, u2,3, u2,4, and u3,4. Elements other than those in the first and j th rows of UW become elements of the identity matrix for u1,2, u1,3, and u1,4. The relation between the eigenvector and the maximum eigenvalue at this time is UWz= λ z, and this equation is expanded by the ith and jth rows. Then, two equations are obtained: zi+uwi,j= λ zi and uwj,i+zi= λ zj. After taking the geometric mean of the left sides of these equations and dividing the former equation by the latter, uwi , j / uw j ,i = zi / z j is obtained. Therefore, we do not need to calculate the eigenvector of the agreement matrix in the P-ABIA method. When the aspect of decision-making is considered, the alternative matrix and criterion matrix each have two areas shown in Fig.1.
Fig. 1 Schematic of agreement area between U and W
The area where the alternative matrix and the criterion matrix overlap is where they agreed, and the remaining area indicates where they disagree. When the element ui,j of U is the missing value, the elements of uwi ,k ≠ j and uw j ,k ≠i of the agreement matrix UW becomes zero. Therefore, these elements can be excluded from the examination. The influence of the missing values on the agreement point is focused on the areas where the two matrices disagree, which can be described as follows: elements of 1 and uwi , j in the i th row and elements of uw j ,i and 1 in the jth row of the agreement matrix. Both agreement points.
uwi , j and uw j ,i seem to indicate the
222
T. Ozaki et al.
5 Conclusions The present study focuses on the decision-making problem posed by the ANP by considering the ABIA method. The followings are clarified from the study. (1) The solution of the agreement matrix UW which has the fault of the ABIA method is stabilized by providing some information. (2) We provide the condition of minimizing the maximum eigenvalue of the matrix to it in order to specify some information. This condition is originally possessed in the Harker method. (3) Though the eigenvector is used in the Harker method and the ABIA method, the P-ABIA method need not be used the eigenvector only by the calculation of the matrix. We wish this idea presents a different angle in pursuing decision-making.
References Harker, P.T., Vargas, L.G.: Incomplete pairwise comparisons in the analytic hierarchy process. Mathematical Modeling 9, 838–848 (1987) Nishizawa, K.: Estimation of unknown comparisons in incomplete AHP. Report of the Research Institute of Industrial Technology 7, 1–9 (2005) Ozaki, T., Sugiura, S., Kinoshita, E.: Dissolution of the dilemma or circulation problem using the Analytic Network Process. In: Proceedings of the Tenth International Symposium on the Analytic Hierarchy Process, Pittsburgh. PA (2009) Ozaki, T., Miwa, K.: An approximation process of missing value in imperfect evaluation matrix. Journal of Japanese Symposium on the Analytic Hierarchy Process 3, 115–121 (2009) Ozaki, T., Lo, M.-C., Kinoshita, E., Tzeng, G.-H.: Decision-making by “Minor ANP” and classification of the types. In: Phillips-Wren, G., Jain, L.C., Nakamatsu, K., Howlett, R.J. (eds.) IDT 2010. Smart Innovation, Systems and Technologies, vol. 4, pp. 101–111. Springer, Heidelberg (2010) Ozaki, T., Miwa, K.: An Approximation Process of Missing Value In Imperfect Evaluation Matrix. Annual Research Journal of Nagoya Gakuin 47, 67–81 (2010) Sugiura, S., Kinoshita, E.: A dissolution of circular logic with concurrent convergence method. Research of Urban Informatics 10(1), 115–121 (2005)
Piecewise Surface Regression Modeling in Intelligent Decision Guidance System Juan Luo and Alexander Brodsky
Abstract. An intelligent decision guidance system which is composed of data collection, learning, optimization, and prediction is proposed in the paper. Built on the traditional relational database management system, the regression learning ability is incorporated. The Expectation Maximization Multi-Step Piecewise Surface Regression Learning (EMMPSR) algorithm is proposed to solve piecewise surface regression problem. The algorithm proves to outperform a few currently-used regression learning packages. Optimization and prediction are integrated to the system based on the learning outcome.
1 Introduction Increasing number of applications require predicting behavior of a complex system and making decisions to move the system towards desirable outcomes such as finding the best course of action in emergencies and creating public policies aimed at most positive outcomes. In such applications, predictions and decisions are to be made in the presence of large amounts of dynamically collected data and learned uncertainty models. Large amount of data usually are saved in a database system, e.g. Relational Database Management System (RDBMS) such as Oracle. The Structured Query Language (SQL) is intuitive and broadly used. However, it doesn’t support decision optimization and statistical learning, often necessary for building decisionguidance applications. Juan Luo Department of Computer Science, George Mason University, Fairfax, VA 22030 e-mail:
[email protected] Alexander Brodsky Department of Computer Science, George Mason University, Fairfax, VA 22030 e-mail:
[email protected]
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 223–235. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
224
J. Luo and A. Brodsky
Relational databases have traditionally taken the view that the data they store is a set of discrete observations. This is clearly reasonable when storing individual facts, such as the salary of employees or the description of a product. However, when representing time or space varying data, such as a series of temperature observations or a history of stock price over time, a set of discrete points is often neither the most intuitive nor compact representation. For researchers in many fields, such as biology [1] and finance [2], a common first step in understanding a set of data points is to model those points as a collection of curves, typically generated using some form of regression (curve fitting). Regression, a form of modeling, helps smooth over errors and gaps in raw data points (noisy or has missing values), yields a compact and more accurate representation of those points as a few parameters, and provides insight into the data by revealing trends and outliers. Regression analysis attempts to build a model based on the relationship of several independent variables and a dependent variable [3]. Given as input to regression learning is a parametric functional form, e.g., f (x1 , x2 , x3 ) = p1 x1 + p2 x2 + p3 x3 , and a set of training examples, e.g., tuples of the form (x1 , x2 , x3 , f ), where f is an experimental observation of the function f value for an input (x1 , x2 , x3 ) . Intuitively, the problem of regression analysis is to find the unknown parameters, e.g., p1 , p2 , p3 which best approximate the training set. For example, the national housing price can be modeled as a function of such determinants as age of the house, the floor area of the house, neighborhood attributes, and location attributes. This functional form may have unknown parameters, reflecting the relationship between house price and a particular attribute of the house. In realistic situation, a single parametric functional form, e.g., f (x1 , x2 , x3 ) = p1 x1 + p2 x2 + p3 x3 may not be able to express the relationship between independent and explanatory variables when the relationship changes according to the different value intervals in which the explanatory resides. For example, housing prices show different behavior in response to age of the house or the floor area of the house based geographical location. So, instead of the conventional and stationary model, for example, f (x) = p0 + p1 x in the case of linear regression with a single explanatory variable, the piecewise linear regression model can be expressed as ⎧ f (p , x) x < b1 ⎪ ⎪ ⎨ 1 1 f2 (p2 , x) b1 ≤ x < b2 f (p, x) = (1) ... ⎪ ⎪ ⎩ fk (pk , x) bk−1 ≤ x In the PWR expression in equation 1, for different value intervals of the explanatory variable, a specific functional form fi represents a ”segment” of the line in the overall problem. The problem is named piecewise surface regression when the number of explanatory variables is more than one. The algorithm to tackle this specific type of regression learning is one focus of our paper. A decision-guidance management system (DGMS) that supports a closed-loop of data acquisition, learning, prediction and decision optimization was proposed in [4]. Mathematical and Constraint Programming (MP and CP), used for decision
Piecewise Surface Regression Modeling in Intelligent Decision Guidance System
225
optimization, i.e., finding values for control variables that maximize or minimize an objective within given constraints, are involved in the decision-guidance management system. The syntax of decision optimization in DGMS is adopted in our paper to find the best solution, given the outcome from piecewise regression learning. The contribution of the paper is summarized as: • We extend the RDBMS with the piecewise regression learning ability. The functional forms are represented as database tables. The regression learning process is implemented as stored procedures. • We propose an EM-based Multi-Step Piecewise Surface Regression Learning Algorithm (EMMPSR) to solve piecewise surface regression problem. The multiple steps involved are clustering, local regression, classification and regression learning for each individual surface. • We describe a case study of decision optimization process based on the learning outcome of EMMPSR algorithm. The rest of paper is organized as following. In section 2, we briefly discuss the state-of-art intelligent decision systems and related work of piecewise regression learning. In section 3, we give a motivating example, which serves as the running example for the rest of paper as well. We present the extended model of regression learning to the standard relational database using the running example in section 4. In section 5, the comprehensive EMMPSR algorithm is described. We presents out experimental results in section 6. The decision optimization process is described in section 7 and the paper is concluded in section 8.
2 Related Research Mathematical and scientific packages like MATLAB [9] and R [10], do support creating regression models. However, these tools lack support for declarative or relational queries. Queries typically need to be implemented as custom scripts in MATLAB, or in languages like Perl. A related concern is that tools like MATLAB do not provide a seamless way to interact with data already stored in a DBMS. Data from a relational table needs to be manually imported into MATLAB in order to fit a regression model to it. Once a model has been fit to the data, it can be used to make predictions or compute the interpolated value of a function at specific points from within MATLAB, but this code lives in custom scripts, which do not provide any of the benefits of storing the data within the RDBMS. Existing commercial DBMSs provide some support for fitting models by taking the form of modeling tools and add-ons for applications. For example, IBM’s Intelligent Miner [11] support creating models using PMML (Predictive Model Markup Language). Models are viewed as stand alone black boxes with specialized interfaces for fitting and visualization. A typical use of PMML involving regression is to first fit a set of points to functions using an external tool, load those functions into the database, and then use the functions to predict the value of some other set of points by plugging them into the functions, typically using a stored procedure. However, the piecewise surface regression problem is not supported by the extended add-ons.
226
J. Luo and A. Brodsky
Constraint query languages, proposed in the context of querying geometric regions and formalized by [5], represent and query infinite regions as systems of constraints. There have been prototype implementations of constraint database systems for solving spatial queries and interpolating spatial data [6] and [7]. Our methodology is simpler and specifically restricted to regression models whereas constraint databases have focused mainly on linear constraints to keep query processing tractable. The focus of our work is on efficient query processing for regression models, while work on constraint query languages and databases has traditionally focused on the complexity of supporting large numbers of constraints (e.g., for linear programming applications). A decision guidance query language (DGQL) framework [8] was proposed as the implementation of decision guidance management system, which used SQLlike syntax but allowed optimization and learning. It annotates existing queries in SQL to precisely express the optimization semantics, and then translates the annotated queries into equivalent mathematical programming (MP) formulation that can be solved efficiently. Although regression learning problem can be formulated in DGQL, the piecewise feature of the functional forms makes the corresponding reduced MP models very expensive and inefficient to solve. That’s why we propose a different piecewise regression learning methodology but still adopt the optimization syntax / semantics of DGQL to express the decision optimization process in our case study.
3 A Motivating Example To make our discussions more concrete, consider an example of decision making to support a logistic transportation network, in which two different products are to be shipped from several origins (suppliers) to several destinations at a minimum cost. What is to be decided in this network is the exact amount of each product to be transported from each supplier to each destination, at the same time minimizing the total cost of transportation network. Figure 1 illustrates the network. Transportation
Fig. 1 A transportation network
Piecewise Surface Regression Modeling in Intelligent Decision Guidance System
227
problem is a classical example of mathematical optimization problem in Operations Research area. The unit cost of shipment from each supplier to each destination is usually given in advance and is fixed as well. In this case, the problem can be easily formed as a mathematical constraint problem and solved by any mathematical solver like CPLEX [12]. Four populated tables have been created in the database instance for the transportation network. The Supplier table contains the information related to suppliers. The Destination table contains information related to destinations. The Shipping Rate table lists the shipping rate for each supplier-destination pair and each product. Table 4, Transportation Amount table, is actually the table to be solved in the decision problem. The amount of each product to be shipped is marked as a special type of attribute ’TBD’ which means this attribute is to be decided by optimization. Table 1 Supplier
Table 2 Destination
SID SNAME SPAMT1 SPAMT2 s1 DET 60 90 s2 LIN 80 80 ... ... ... ...
Table 3 Shipping Rate
SID SNAME DPAMT1 DPAMT2 d1 ABC 40 30 d2 XYZ 50 70 ... ... ... ...
Table 4 Transportation Amount
SID DID UNIT RATE1 UNIT RATE2 s1 d1 5 5 s2 d2 2.5 2.5 s3 d3 6 6 s4 d4 5.5 5.5 ... ... ... ...
SID DID s1 d1 s2 d2 s3 d3 s4 d4 ... ...
TAMT1 TBD TBD TBD TBD ...
TAMT2 TBD TBD TBD TBD ...
More realistically, however, the unit cost of shipment may not always be fixed but change according to the amount of both product1 and product2 to be shipped. The most favorable unit cost may be available for only a limited number of units. Shipments beyond this limit pay higher rates. As an example, there are three cost rate levels are specified for each supplier-destination pair. Correspondingly the total cost of shipments along each pair increases with the amount shipped in a piecewiselinear style.
4 Predictive Modeling in RDBMS The piecewise regression learning is proposed as an extension to the RDBMS (see Figure 2). Given historical data table as inputs, it outputs a model that predicts future values of a designated target column based on the designated
228
J. Luo and A. Brodsky
Fig. 2 Piecewise regression extension to RDBMS
explanatory columns in the table. The predicative model is generated by issuing a stored function call to the predefined stored function in SQL, PWLearning. A table called FUNCTION NAME COLLECTION has been created in advance to contain predictive models for piecewise function polls. The schema of this table has four attributes FUNC ID (primary key), FUNC TABLE, BOUNDARY TABLE and CASE FUNC TABLE. FUNC TABLE represents the functional form for each piecewise surface. Its attributes are coefficients of the function. BOUNDARY TABLE represents the boundary constraints for each surface. Its attributes are coefficients of boundary surfaces. CASE FUNC TABLE represents the connection between each surface and boundary constraints. Each surface usually has more than one boundary constraint. Instances for these tables are listed at the end of section 7. The stored function PWLearning takes a few parameters as inputs, i.e. the name of the historical data table / view, the number of piecewise surfaces involved, and the dimension of the explanatory variables. The calling of PWLearning • firstly inserts a new row / predicative model description to FUNCTION NAME COLLECTION table. The row inserted by PWLearning is composed of a function identifier and values for the attributes FUNC TABLE, BOUNDARY TABLE and CASE FUNC TABLE. • secondly, generates the predictive model and stores the model information in three tables named by the values of most recently inserted row. • thirdly returns the assigned function identifier representing the predictive model. The predicative model can then be applied within SQL query that invokes another stored function, PREDICT(FUNC ID, X1, X2, ..., XN). This function takes the function identifier, FUNC ID and the value of explanatory variables as inputs. It returns target value back to the SQL query. The calling of PREDICT • firstly queries FUNCTION NAME COLLECTION table by FUNC ID and returns the name of three tables, FUNC TABLE, BOUNDARY TABLE and CASE FUNC TABLE.
Piecewise Surface Regression Modeling in Intelligent Decision Guidance System
229
• secondly queries BOUNDARY TABLE to see how many rows in the table can be satisfied by the value explanatory variables (X1, ..., XN). • thirdly joins the BOUNDARY TABLE and CASE FUNC TABLE based on the query result of second step and selects the piecewise surface identifier which the value of explanatory variables belongs to. • finally queries the FUNC ID table. The target value is calculated using both coefficients in the table and the value of explanatory variables and returned to the calling statement. In the transportation network, the historical cost set is prepared as a view in Table 4. For each supplier-destination pair, the functional form of cost can be expressed by a piecewise linear function with two explanatory variables PAMOUNT1 and PAMOUNT2 (the amount of product 1 and the amount of product 2 shipped between each supplier-destination pair). The target variable is the COST which is noisy or has missing values.
Table 5 Historical cost table SID DID PAMOUNT1 PAMOUNT2 COST s1 d1 10 10 21.98 s2 d2 10 30 56.98 s3 d3 22 25 94.36 s4 d4 38 9 88.67 ... ... ... ... ...
There are totally 12 supplier destination pairs in the transportation network. Correspondingly twelve rows are inserted into the FUNCTION NAME COLLECTION table. Table 6 FUNCTION NAME COLLECTION FUNC ID S1D1 S1D2 ... S3D4
FUNC TABLE FUNC S1D1 FUNC S1D2 ... FUNC S3D4
BOUNDARY TABLE BOUNDARY S1D1 BOUNDARY S1D2 ... BOUNDARY S3D4
CASE FUNC TABLE CASE FUNC S1D1 CASE FUNC S1D2 ... CASE FUNC S3D4
5 Piecewise Surface Regression Learning Algorithm 5.1 Formal Definition Regression analysis attempts to build a model based on the relationship of several independent (explanatory) variables and a dependent variable [3]. Let x1 , ..., xn , be
230
J. Luo and A. Brodsky
independent variables, and y, be dependent variable, both range over the set of R. The latter is a random variable defined over the underlying distribution of sample tuples in In = R × R × ... × R. Suppose the learning set contains m tuples. Let us denote such a tuple as xh = (xh1 , ..., xhn ) for h = 1, ..., m. The collection of data, c = (xh , yh ) for h = 1, . . . , m, represent the available training data to estimate the values of the random variable y = f (xh , β )+ N for h = 1, . . . , m, where N is a random noise. We assume that N is distributed as a Gaussian with 0 mean and variance σ such that: E(y) = E( f (xh , β ) + N) = E( f (xh , β )) = f (xh , β ) , where E is the expected value. The standard least squares method is used to find coefficients β of f that minimize σ . Application can be found, which lie on the borderline between classification and regression; these occur when the input space X can be subdivided into disjoint regions Xi characterized by different behaviors of the function f to be reconstructed. One of the simplest situation of such kind is piecewise surface regression: in this case X is a polyhedron in the n-dimensional space Rn and {X}ki=1 is a polyhedral partition of X, i.e. Xi ∩ X j = Φ for every i, j = 1, . . . , k and ki=1 Xi = X. The target of a piecewise surface regression problem is to reconstruct an unknown function f : X → R having a linear behavior in each region Xi f ∗ (x) = fi (x j , βi )
if i = 1, . . . , k, j = 1, . . . , m
(2)
when only a training set D containing m samples (xh , yh ), h = 1, . . . , m, is available. The output yh gives a noisy evaluation of f (xh ), being xh ∈ X; the region Xi to which xh belongs is not given in advance. The parameters set β1 , β2 , . . . , βi for i = 1, 2, . . . , k, characterizes the function set fi and their estimate is a target the piecewise surface regression problem. The regions Xi are polyhedral, i.e., they are defined by a set of li linear inequalities, which can be written in the following form: 1 ≥0 (3) Ai x where Ai is a matrix with li rows and n + 1 columns and their estimate is another target of learning process for every i = 1, 2, . . . , k . According to (2) and (3), the target of the learning problem is actually two-fold: to generate both the regions Xi and the parameter set βi for the unknown function set fi , utilizing the information contained in the training set.
5.2 Algorithm The idea of PWLearning is described in Algorithm 1. The Expectation Maximization (EM) algorithm [13] has been adapted in our algorithm. The partition of input space is separated by applying a double-fold k-means clustering algorithm, incorporating the value of target variable. After the clustering of polyhedral regions, a multi-category SVM library [14] is called to find out the boundary matrix Ai in (3) for every polyhedral region. Each region is represented by a boundary matrix Ai .
Piecewise Surface Regression Modeling in Intelligent Decision Guidance System
231
For each polyhedral region, a surface regression model is easily learned by robustfit [15]. Similarly to EM algorithm, an iteration process is involved in our approach: first, the local models are trained according to the resulted clusters. Then the data points in every cluster will be re-assigned to the local model which has the best predicative performance. The local model will be updated again based on the newly created clusters of polyhedral regions. The iteration process will be repeated until termination criterion reached. Algorithm 1. The EM-based Multi-step Piecewise Surface Regression Learning Input: data set D with size m, number of clusters k Output: local models fi and cluster boundary matrix Ai for i = 1, ..., k 1
2
3
4
5
(Local regression) foreach h = 1, ..., m do 1.1 Build the local dataset Eh containing the sample (xh , yh ) and the pairs (x, y) ∈ S, together with the e − 1 closest neighbors x to xh . 1.2 Perform a linear regression to obtain the feature vector vh (with dimension n + 1) of a linear unit fitting the samples in Eh . (Clustering) Perform clustering process in the feature vector space. 2.1 Run regular k-means on feature vector space Rn+1 with assigned feature vector center set CV to subdivide the set of feature vectors vh into k groups Ui . 2.2 Build a new training set D containing the m pairs (xh , ih ) being Uih the cluster including vh repeat Multi-category classification on training set D to compute the cluster boundary set Ai for every region Xi . (Regression) For every j = 1,..., k, run a linear regression on the samples (x, y) ∈ D with x ∈ Xi . The parameter set Bi returned represents the ith surface function fi . Update cluster index of each data point, further the training set D , according to the minimal predicative error among surface function set fi for i = 1, ..., k. until the maximum number of iterations reached or the no cluster index is reassigned ;
6 Experimental Results To evaluate our EM-based multi-step piecewise surface regression algorithm EMMPSR, we generate synthetic high-dimensional data which is piecewisedefined. We compare the performance of EMMPSR with those of M5P (weka.classifier.trees) [16], classregtree (matlab statistical toolbox) [9], and MultilayerPerceptron (three layer neural network) (weka.classifier.functions) [16] on three set of synthetic data.
232
J. Luo and A. Brodsky
The data sets are generated using three different piecewise models. Each model has linear boundaries between regions and linear functions within each region. Model 1 and model 2 each has three regions and two independent variables. Model 3 has five regions and nine independent variables. Data in each model are generated with additive Gaussian noise with zero mean and 0.1 variance. We generated 300 sample points for model 1, 900 data points for model 2 and 1500 data points for model 3. The second data set is generated from the piecewise functions: ⎧ ⎨ 3 + 4x1 + 2x2 if 0.5x1 + 0.29x2 ≥ 0 and x2 ≥ 0 f (x1 , x2 ) = −5 − 6x1 + 6x2 if 0.5x1 + 0.29x2 < 0 and 0.5x1 − 0.29x2 < 0 ⎩ −2 + 4x1 − 2x2 if 0.5x1 − 0.29x2 ≥ 0 and x2 < 0 (4) This target function is depicted in Equation (4) and the data points are plotted in Figure 3. Total 900 samples are drawn uniformly from I2 = [−1, 1] × [−1, 1] and y is determined as y = f ∗ (x1 , x2 ) + ε , where ε ∼ N(0, 0.1). In this setting, the target value need to combined to determine the appropriate cluster prototypes.
Fig. 3 Synthetic Data Set Generated in Model 2
The following function estimate is yielded by the EMMPSR algorithm: ⎧ 3.0067 + 3.9940x1 + 1.9977x2 if 0.5x1 + 0.32x2 ≥ 0.005andx2 ≥ 0 ⎪ ⎪ ⎨ −5.0217 − 6.0201x1 + 6.0056x2 if 0.5x1 + 0.32x2 < 0.005 and f (x1 , x2 ) = 0.5x ⎪ 1 − 0.31x2 < 0.01 ⎪ ⎩ −2.0035 + 3.9793x1 − 2.0330x2 if 0.5x1 − 0.31x2 ≥ 0.01andx2 < 0 (5) As noted, the generated model is a good approximation of the unknown function to learn in Equation 4. Five-fold cross validation is adopted to evaluate the learning performance by randomly dividing the data set into 5 equal parts. Each part is held out in turn and the remaining four is trained for the learning method. The root mean squared error (RMSE) [17] is calculated on the unseen data. The results are
Piecewise Surface Regression Modeling in Intelligent Decision Guidance System
233
summarized in Table 7. It is obvious that EMMPSR outperforms other methods as to RMSE. Table 7 RMSE values for performance comparison experiments on synthetic data sets Model Model1 Model2 Model3
M5P MultilayerPerceptron Classregtree EMMPSR 1.0925 3.0657 2.8899 0.3759 0.7599 1.8773 0.4995 0.2538 37.6910 47.8030 33.3755 30.8755
Another matrix to be compared among different methods is average number of rules generated by each model for a data set. In EMMPSR it is the number of regions, while in M5P and Classregtree it is the number of rules generated during the process of building the tree. EMMPSR only uses a fraction of the rules that are generated by M5P and Classregtree. The corresponding tables are generated for the predictive model of Equation 5. FUNC TABLE represents three piecewise functional forms with their coefficients. BOUNDARY TABLE represents the boundary matrix with their coefficients as well. CASE FUNC TABLE joins the FUNC TABLE and BOUNDARY TABLE together. Table 8 FUNC TABLE PID C0 C1 C2 P1 3.0067 3.9940 1.9977 P2 -5.0217 -6.0201 6.0056 P3 -2.0035 3.9793 -2.0330
Table 9 BOUNDARY TABLE BID A0 B1 0.005 B2 0.01 B3 0
A1 0.5 0.5 0
A2 0.32 -0.31 -1
Table 10 CASE FUNC TABLE CASE BID LESS THAN FLAG P1 B1 -1 P1 B3 -1 P2 B1 1 P2 B2 1 P3 B2 -1 P3 B3 1
7 Decision Optimization The optimization process for the transportation network is described in the following query sequence written in SQL. CREATE VIEW supply-transport-from AS SELECT s.SID, s.SPAMT1 as total-supply1, s.SPAMT2 as total-supply2,
234
J. Luo and A. Brodsky
sum(t.TAMT1) as total-transp1, sum(t.TAMT2) as total-transp2 FROM Supplier s, Transportation Amount t WHERE s.SID = t.SID GROUP BY s.SID CHECK total-supply1 = total-transp1, total-supply2 = total-transp2 CREATE VIEW destination-transport-to AS SELECT d.DID, d.DPAMT1 as total-dest1, d.DPAMT2 as total-dest2, sum(t.TAMT1) as total-transp1, sum(t.TAMT2) as total-transp2 FROM Destination d, Transportation Amount t WHERE d.DID = t.DID GROUP BY d.DID CHECK total-dest1 = total-transp1, total-dest2 = total-transp2 CREATE VIEW TOTAL-COST AS SELECT SID, DID, sum(PREDICT(SID + DID, TAMT1, TAMT2)) FROM Transporation Amount t MINIMIZE TOTAL-COST The queries follow the syntax of SQL except two exceptions. First, the attribute TAMT1 and TAMT2 in Table 4, Transportation Amount is marked as a special annotation ’TBD’. Second, the objective of the decision optimization is given as ’MINIMIZE TOTAL-COST’. In DGQL, decision optimization problem is written as a regular data problem, i.e., a sequence of relational views and accompanying integrity constraints, together with some annotation of which database table column needs to be decided by the system (i.e., variables) and toward what goal (i.e., optimization objective). Here, existing queries in the reporting software can be directly used. Essentially, DGQL allows users to write optimization problem as if writing a reporting query in a forward manner. In other words, DGQL has SQL-like syntax, yet it uses mathematical programming algorithms to solve optimization problems by involving action statements like (maximize) or (minimize). The above two exceptions in the SQL queries can be represented by DGQL queries: SELECT dgql.augment(’Transportation Amount’, ’TAMT1’, null, null); SELECT dgql.augment(’Transportation Amount’, ’TAMT2’, null, null); SELECT * FROM dgql.minimize(’TOTAL-COST’); In the optimization process for the transportation network, the total cost is calculated in the VIEW TOTAL COST by summing the cost of each supplier-destination pair. The transportation cost between Supplier ’SID’ and Destination ’DID’ is calculated and returned by the stored function PREDICT(SID+DID, TAMT1, TAMT2). The functional identifier FUNC ID is specified by the concatenation of both ’SID’ and ’DID’. The amount TAMT1 for product 1 and TAMT2 for product 2 are explanatory variables.
Piecewise Surface Regression Modeling in Intelligent Decision Guidance System
235
8 Conclusion An intelligent decision system is proposed in this paper which involves a loop of data collection, learning, optimization and prediction. The EMMPSR algorithm is designed to solve the piecewise surface regression problem in the learning stage. Experimental results show that the EMMPSR outperforms a few currently wideused regression tools in terms of RMSE and simplicity of functional forms. The piecewise surface regression is extended as a predictive model for the RDBMS. The decision optimization, built on the outcome of the learning stage, shows how to find the optimal solution and at the same time satisfy constraints. A couple of future research topics will be selections of main features which will be used for regression, and how to choose initial number of clusters as input to EMMPSR algorithm.
References 1. Hunter, J., McIntosh, N.: Knowledge-Based Event Detection in Complex Time Series Data. In: Proceedings of the Joint European Conference on Artificial Intelligence in Medicine and Medical Decision Making, pp. 271–280 (1999) 2. Fox, J.: Applied Regression Analysis, Linear Models, and Related Methods (1997) 3. Draper, N., Smith, H.: Applied Regression Analysis Wiley Series in Probability and Statistics (1998) 4. Brodsky, A., Wang, X.: Decision-Guidance Management Systems (DGMS): Seamless Integration. In: The 41st Hawaii International International Conference on Systems Science (HICSS-41 2008), pp. 7–10 (2008) 5. Kanellakis, P., Kuper, G., Revesz, P.: Constraint Query Languages. In: Symposium on Principles of Database Systems, pp. 299–313 (1990) 6. Revesz, P.: Constraint databases: A survey. In: Semantics in Databases, pp. 209–246 (1995) 7. Revesz, P., Chen, R.: The MLPQ/GIS Constraint Database System. In: SIGMOD Conference on Management of Data (2000) 8. Brodsky, A., Egge, N., Wang, X.: Reusing Relational Queries for Intuitive Decision Optimization. In: 44th Hawaii International International Conference on Systems Science, pp. 1–9 (2011) 9. Matlab, http://www.mathworks.com/products/matlab 10. The R Project For Statistical Computing, http://www.r-project.org/ 11. IBM: IBM DB2 Intelligent Miner, http://www-386.ibm.com/software/data/iminer 12. AMPL, http://www.ampl.com 13. Dempster, P., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of The Royal Statistical Society, Series B, 1–38 (1977) 14. Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/˜cjlin/libsvm 15. Huber, P., Ronchetti, E.: Robust statistics. Wiley, New York (1981) 16. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: An Update. SIGKDD Explorations (2009) 17. Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2004)
Premises of an Agent-Based Model Integrating Emotional Response to Risk in Decision-Making Ioana Florina Popovici*
Abstract. Classical definition of risk implies attaching various probabilities to events considered risky. People affected by risk generate different perspectives about the phenomenon due to their individual emotional response. Measuring the real dimension of risk in the global economy implies taking into account all the different perspectives of the individuals who bear the consequences of risk. The most important fact is that the consequences of a risky event are born at different scales according to the agents` own emotional response to risk. Intuitively thinking, even the probabilities used for estimating risk imply measuring it at different scales because of different emotional perception of risk by individual. All in all, dimension of risk varies according to the scales used for measurement. A global view on the real dimension of risk implies taking into account the infinite scales of risk born differently at various scales at the level of perception by agents. Keywords: Decision-making, agent-based modeling, emotional behavior, fractals.
1 Introduction The economic model of the market is defined by economic theory as a group of people who trade goods or services at a certain price derived from various transactions. Therefore, economy can be modeled according to modern theories of complexity [18] as a group of agents that interact on a number of times leading to certain result or state in the model. Agents organize themselves in groups or clusters according to shared preferences derived from similar strategies. The notion of a project implies a cluster of agents acting towards a common goal. This in turn implies a complex structure of society and economy in general. The model of the economy made up of agents is similar to a fractal shape known by economic theory as “Sierpinski triangle” [10]. According to this, the model of global economy reveals a complex structure of a fractal shape. Looking from a top-down perspective to the micro-level of an enterprise one can see to a smaller scale the notion of project that clusters individual agents. What are the implications for estimating Ioana Florina Popovici University of Babeş – Bolyai, Cluj e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 237–245. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
238
I.F. Popovici
risk in a fractal model of the economy? The answer lies in the propagation of local behavior of an agent towards risk in the transition from local to global in the economy. This transition takes place through repeated cooperative games between agents acting local at the level of a project run by an enterprise that transfers local behavior to global economy because of an iterated process [4, 14]. The paper presents the intuitive premises for building an agent-based model for modeling the decision-making process of financing an investment project by taking into account the emotional response to the context of financial risk perceived in starting a project. Decision is the act which turns strategy into reality during the interactions of agents in the economy. Who is the agent faced with the decision of financing a project? This is the project manager responsible for the results achieved by the project. Decision–making process is based on the strategic use of information coming from local environment. When facing a decision, agents attribute weights to emotional response to risk and shapes reward. They decide at the margin of value between various alternatives in the local environment [2, 15]. But where do agents get the value from? The answer relies in the perceived value of the reward obtained by the agent from implementing a certain investment project. A certain reward coming from a project is valuable to an agent as long as it has utility for that agent [11]. Utility is a notion defined according to an agent’s strategy built upon the level of risk willing to take and subordinated to an objective. Agents’ objectives regarding reward from implementing a certain project lies in the monetary surplus or social benefit obtained according to the individual characteristics of the agent. Project’s financial structure implies certain risks belonging to the type of financing source used. The phenomenon of risk of financial imbalance in the project has direct implications to the financial balance between revenue and spending, during implementation. The equilibrium of the project’s budget is to be studied in dynamics because the synchronization between receipts and payments depends on their sequence in time. The agent’s decision referring to which source of financing is optimal for the project is guided by criteria of risk and reward. The agent’s behavioral response to these two factors it is the one that guides the entire decision – making process.
2 Application of Agent Based Modeling on Emotional DecisionMaking – Framework and Assumptions The model is based on the assumptions that agents interact in repeated cooperative games characterized by Nash equilibrium [12], see Table 1. They adopt strategies according their emotional mood towards risk and reward gained. Some target profit making in transactions with other agents. These are what economic theory calls "homo oeconomicus”. They adopt a risk-averse emotional behavior towards events that may reduce the profit desired. Another class of agents is what game theory calls "homo ludens" [3]. They prefer to assume certain risks in transactions with other partners as they play "for the sake of the game." The third category of agents is indifferent to risk and is constantly changing its strategy depending on
Premises of an Agent-Based Model Integrating Emotional Response
239
the context and objective followed [1]. Whether it's a financial loss or a gain, this category of agents constantly move from the category of "homo ludens" to that of "homo oeconomicus” and vice-versa according to their response to context and following a social responsible attitude. This form of adaptation to environment is called "genetic algorithm" [13]. The concept describes the pattern of successive changes in the strategic behavior of the third category of agents called “homo social-responsible”. Table 1 Categories of agents in the economy
Type 0f Agent ”HOMO OECONOMICUS” ”HOMO LUDENS” ”HOMO SOCIAL-RESPONSIBLE”
Risk Aversion Appetite Indifferent
Reward Gain (monetary) Gain/ loss (monetary) Gain (socially worth)
The mechanism of the selection process of sources for funding projects managed by agents is the subject of the model described. The agent will make a comparative analysis between the financing sources and after that he will decide which option will be used to finance a particular project based on selection criteria of risk and reward. The model is built starting from the principle of budgetary constraint of each individual, entity or project in the economy. The budget constraint formula can be used as a tool in the analysis supporting decision-making of an agent who has to identify the optimal combination of financing sources used to support a particular project inside the limits of cost sustainability by the revenues generated by it. The agent is represented by manager or business owner who must choose the optimum funding source to finance a certain project. The funding sources used must be repaid from the project budget, which is limited by the investment capacity to generate revenue. The types of funding used to achieve a specific project investment are as follows: self-financing from revenues generated by the project, loan, private investor financing, public-private partnership, non-refundable grant. These forms of financing can be combined to ensure a financially sustainable project and minimize the risk of imbalances in the projects’ budget. 1.
Function of budget constraint for an agent using a single financing alternative for project is [19]: S0 (rc +1) = Sv
(1)
where: - S 0 is the initial amount invested (which is accessed through an internal or external source of financing a project), - S v is the reimbursed amount and it is composed of the cost and the principal of the loan. - rc is a cost rate calculated by the ratio of the total cost of financing these (commissions, interest, fees or other costs), specific to each type of financing and the amount borrowed. This rate is the standardization of cost for each type of funding.
240
I.F. Popovici
2.
Function for budget constraint for an agent using double financing alternatives for project: S x ( rx + 1) + S y ( ry + 1) = S v S x + S y = S0
(2)
where: x and y are indices that reflect the type of funding. The mechanism is similar when the agent is facing decision-making for financing a project using three of four alternatives, depending on the offer of the financial market. This rule reflects the local strategy of an agent who is faced with the decision of financing a project with the most suitable combination of capital.
3 Dynamic Analysis on Estimating Probability of Risk Using Fractals What does risk mean to any of us? The notion reveals meaning according to the perspective belonging to each individual it refers to. Is the notion of risk partially subjective? Does it connect to the perception of the individual? Maybe, part of the impact depends on the subjective evaluation of the individual according to emotional response. How can risk be defined? Is it the effect or outcome suffered by the subject who deals with risk. Or it can also be identified in the interaction of the forces that by their simultaneous or cumulated action can lead to a situation called “risk”. Is risk just a negative notion or it can also a positive one. It can lead to more favorable post-risk situation if the change is perceived by the subject as an opportunity for redefining identity. Some factors are perceived as having a negative value so that the deviation derived is not welcomed by the subject, other factors can lead to positive deviations which can bring added value to an activity. Further on, this paper will tackle the notion of risk as having negative value as revealed by the agents’ emotional response to it. Risk is defined according to classical theory as the probability of occurrence of a certain deviation on the course of achieving a goal. What determines this deviation is due to the factors that interact. Risk is the probability that a failure or disharmony occurs and blocks the system from moving on. The occurrence of an event that causes a certain change that needs redefining the way of going on is part of the phenomenon of risk. Classical methods of measuring risk refer to calculating probabilities of occurrence of certain risks in order to avoid them or minimize its consequences. Recent theories, such as cumulative prospect theory presents an alternative to quantifying risk by evaluating results of economic transactions through the concept of marginal value [6]. The subject will select the option that provides the highest emotional reward between all the alternatives available. According to the cumulative prospect theory, individuals attach different emotional response to a negative value (loss) compared with a positive one (gain). Therefore decision will take into account the context of decision, and that is the nature of the final outcome (loss or gain). Another difference in the existing theories concerning the value of options granted by the subject relates to the shape of this function. According to the theory of marginal utility, the function is linear in comparison
Premises of an Agent-Based Model Integrating Emotional Response
241
with the value function described in the cumulative prospect theory which takes the form of fractal "power law" [17]. Methodology of quantifying risk refers to estimating the perceived emotional probability at the individual level, through a mathematical formula, through induction and elements of fractals [9]. The formula of risk function R (x) [19] integrates the emotional perception of risk at individual level and it is as follows: β ⎧⎛ − 1 ⎫ ⎞ ⎪⎜ • − x ⎟ , x < 0, β > 0⎪ ⎪⎝ k ⎪ ⎠ R ( x) = ⎨ ⎬, α ⎪⎛ 1 ⎪ ⎞ ⎪⎜ • x ⎟ , x ≥ 0, α > 0 ⎪ ⎠ ⎩⎝ k ⎭
(3)
where: - 1 / k is a parameter specific to the agents’ emotional mood and it depends on the objective pursued by the type of agent. In the model, k is the amount borrowed (So). - α, β, are the exponents describing a fractal [8] non-linear evolution of humansubjective phenomena. - x refers to a unit of the result, which can be a loss (f (x) <0), a gain (f (x)> 0) or break-even point. Risk function R(x) is defined as having fractal characteristics [18] due to human-subjective perception of risk by each individual agent due to information asymmetry and psychological factors of human perception [7]. The function F (x) representing the decision making process describes the emotional response to risk of an agent from a global perspective. The decision function F(x) depends on context (loss or gain) and refers to the value (positive or negative) of the emotional reward (monetary of social) perceived by the agent [7]. It is managed according to elements in the formula: ⎧1 ⇔ x ≥ 0 ∧ R( x) ≥ 1 ⎫ ⎪ ⎪ ⎪− 1 ⇔ x ≥ 0 ∧ R( x) < 1⎪ F ( x) = ⎨ ⎬ ⎪1 ⇔ x < 0 ∧ R( x) ≥ 1 ⎪ ⎪⎩− 1 ⇔ x < 0 ∧ R( x) < 1⎪⎭
(4)
4 Results of the Empirical Analysis of the Model The model of decision-making in an agent-based economy is described by an example of a decision-making process for financing projects implemented by agents that act in the name of the companies they represent. Agents’ decision is based on the criteria of risk and reward that accompany the transaction. The choice of the best financing source for a project is similar to buying a certain good or service on the market. Risk is defined in terms of financial sustainability of a project using a certain type of financing source. On the other hand, reward is defined according to
242
I.F. Popovici
the agents’ characteristics or objective followed. In this sense the model is built upon three types of agents. Each type follows a specific objective, has its own emotional response to risk and reward. Table 2. presents a comparative analysis [16] of the four study cases carried out by testing an investment project through various forms of financing and results reveal a ranking of the four combinations of project financing based on the criteria of risk. Table 2 Sources of financing a project and associated risk of financial imbalance
No. 1 2 3 4
Study case PRIVATE INVESTOR GRANT P.P.P. CREDIT
Type of risk Medium-high risk Medium risk Low risk Low-medium risk
The decision-making process of financing a project in the model takes place according to the type of agent and its emotional response to risk and reward from implementing it. Therefore, the agent "homo ludens" will select to finance his investment project through a private investor financial contribution because it offers him the highest degree of emotional reward according to his ‘mood’, due to the highest risk score compared to the other options chosen (see table 3). This agent shows an increased appetite for risk during transactions with other agents. The second type of agent „homo oeconomicus” will choose the financing option of P.P.P. (public-private partnership) because it corresponds to his behavior described in the model referring to his aversion for risk. His selected option is characterized by the lowest level of emotional reward and risk. The third type of agent "homo social-responsible" can select any of the four combinations of financing. Such an agent is not designed to get emotionally involved in pursuing the maximization of monetary gain and he is indifferent to risk. His decision-making process is guided by objectives such as social benefit or environmental protection and the financial gain is subordinated to those stated beforehand. Table 3 Emotional decision-making of an agent
Type Of Agent
Risk
Reward
Emotional Decision
”HOMO OECONOMICUS”
Aversion
Gain
P.P.P
”HOMO LUDENS”
Appetite
Gain/ loss
PRIVATE INVESTOR
”HOMO SOCIALRESPONSIBLE”
Indifferent
Social gain
CREDIT/ GRANT
Premises of an Agent-Based Model Integrating Emotional Response
243
5 Conclusions and Future Works The conclusions that can be drawn at this stage refer to the fact that it is possible to integrate emotional mood of an agent into his actions during transactions with other agents. This can be done by using a function of value by the agent when deciding which strategy to follow by choosing an option from multiple alternatives found in the local environment. The model describes how emotional mood of the agent regarding risk and reward from an activity is used to influence agents’ decision regarding the choice of a financing source for a certain project. Present research shows that the agent’s decision implying the financing of a project is not solely based on notions like utility or maximizing profit but also on the emotional attitude or’ mood’ towards risk which is connected to the reward obtained from the project. Another important aspect of the agent’s choice refers to the nature of reward which is not just monetary but it can also have social or environmental content. One innovative aspect of this research refers to the integration of emotions into decision-making quantified into the local behavioral strategy of the agent. This is integrated into the model by the use of fractal theory principles into the agent’s decision function regarding risk and reward. The present work lays a brick to the field of agent based modeling and fractals used for the study of emotional decision-making process in the financing of a project. Risk is the result of the cumulative action of several factors. Actual management risk intends to diversify risk to all the participants in the market. This splits risk towards several players in order to minimize risk per individual. The logic is similar to that of economies of scale. It refers to the fact that the total cost is divided between a large number of participants so that the cost per unit of production is minimized. Only that the process of risk diversification created a network of risks in the financial market. That connected all the participants into a big entity through the network effect created. Any little change into this large and interconnected mechanism of risk turned into a big movement at a global view through network effect. Secondly, the risk sharing mechanism unifies the players into a complicated network. Every “share” of risk is perceived differently by individuals, at various levels. This creates different perspectives towards the dimension of the phenomenon of risk. Let`s take into consideration that every level has its own network characterized by different scales inside the level. All in all, there are different dimensions of the phenomenon of risk on a horizontal, vertical or diagonal level. Each of these dimensions has different scales inside the level. Therefore, the inside structure of the phenomenon of risk sharing in the financial market is characterized by a fractal structure. The scaling relation between the various components of risk is similar to that of a fractal. This throws a different view over risk probability because of the fractal dimension of the phenomenon. This view assumes the fact that risk phenomenon hasn`t got a uniformly distributed structure across overall dimension. This implies that it is made up of pieces of “risk” shared at individual level at different scales which show the same characteristics as the whole. Further research will extend the application of the model to estimating risk of financial crisis in the global economy due to the increased access to finance from
244
I.F. Popovici
financial market of companies. Players on the market are represented by agents, managers of projects or owners of companies who use various financing sources assuming high financial risks in the context of uncertainty and instability of the global economy.
References 1. Bloomquist, K.: A comparison of agent-based models of income tax evasion. Internal Revenue Service, 8–15 (2004) 2. Boloş, M., Popovici, I.: Ultramodernity in risk theory. In: Annals of the International Conference on Financial Trends in the Global Economy – Universitate Babeş Bolyai, Cluj Napoca – FSEGA, November 13-14, pp. 3–10 (2009) 3. Bătrâncea, L.: Teoria jocurilor, comportament economic. In: Risoprint (ed.) Experimente, Cluj Napoca, pp. 163–194 (2009) 4. Camerer, C., Ho, T.-H., Chong, J.K.: Behavioral game theory: Thinking, building and Teaching. Research paper NSF grant, 24 (2001) 5. Gellert, W.K., Hellwich, K.: Mică Enciclopedie matematică. In: Silvia, C. (ed.) Traducere de Postelnicu V, Tehnică, Bucureşti, pp. 234–250 (1980) 6. Kahneman, D., Tversky, A.: The framing of decisions and the psychology of choice science. New Series 211(4481), 453–458 (1981) 7. Daniel, K., Tversky, A.: Advances in prospect theory: Cumulative representation of uncertainty stanford university, Department of Psychology, Stanford. Journal of Risk and Uncertainty 5, 297–323 (1992) 8. Lapidus, M., van Frankenhuijsen, M.: Fractal geometry. In: Complex Dimensions and Zeta Functions Geometry and Spectra of Fractal Strings, pp. 41–45. Springer Science, Business Media, LLC (2006) 9. Liebovitch, L.: Fractals and chaos simplified for life sciences. In: Center for Complex Systems, Florida Atlantic University, Oxford University Press, New York,Oxford (1998) 10. Mandelbrot, B.: The fractal geometry of nature, updated and augmented. In: International Business Machines, Thomas J. Watson Research Center Freeman and Company, New York (1983) 11. Daniel, M., Morris Cox, E.: The new science of pleasure consumer behavior and the measurement of well-being. In: Frisch Lecture, Econometric Society World Congress, London, August 20, pp. 3–7. University of California, Berkeley (2005) 12. Rasmusen, E.: Games and information. In: An Introduction to Game Theory, 3rd edn., pp. 10–28. Basil Blackwell (2000) 13. Scarlat, E.: Agenţi şi modelarea bazată pe agenţi în economie. ASE Bucureşti, 15–20 (2005) 14. Scarlat, E., Maries, I.: Simulating collective intelligence of the communities of practice using agent-based methods. In: Jędrzejowicz, P., Nguyen, N.T., Howlet, R.J., Jain, L.C. (eds.) KES-AMSTA 2010. LNCS, vol. 6070, pp. 305–314. Springer, Heidelberg (2010) 15. Scarlat, E., Maracine, V.: Agent-based modeling of forgetting and depreciating knowledge in a healthcare knowledge ecosystem. Economic Computation And Economic Cybernetics Studies and Research 41(3-4) (2008)
Premises of an Agent-Based Model Integrating Emotional Response
245
16. Scarlat, E., Boloş, M., Popovici, I.: Agent-based modeling in decision-making for project financing. Journal Economic Computation and Economic Cybernetics Studies and Research, 5–10 (2011) 17. Taleb, N.,, P.: Epistemiology and risk management. Risk and Regulation Magazine, 3– 10 (August 25, 2007) 18. Taleb, N., Mandelbrot, B.: Fat tails, asymmetric knowledge, and decision making, Nassim Nicholas Taleb”s Essay in honor of Benoit Mandelbrot”s 80th birthday. Wilmott Magazine, 2,16 (2005) 19. Tulai, C., Popovici, I.: Modeling risk using elements of game theory and fractals. In: Annals of the International Conference - Universitatea din Craiova, Competitivitate şi Stabilitate în Economia bazată pe Cunoaştere, May 14-15, vol. 10, pp. 2–7 (2010)
Proposal of Super Pairwise Comparison Matrix Takao Ohya* and Eizo Kinoshita
*
Abstract. This paper proposes a Super Pairwise Comparison Matrix (SPCM) to express all pairwise comparisons in the evaluation process of the dominant analytic hierarchy process (AHP) or the multiple dominant AHP (MDAHP) as a single pairwise comparison matrix. In addition, this paper shows, by means of a numerical counterexample, that an evaluation value resulting from the application of the Harker method to a SPCM does not necessarily coincide with that of the evaluation value resulting from the application of the dominant AHP(DAHP) to the evaluation value obtained from each pairwise comparison matrix by using the eigenvalue method. Keywords: pairwise comparison matrix, dominant AHP, logarithmic least square method, Harker's method.
1 Introduction The analytic hierarchy process (AHP) proposed by Saaty[1] enables objective decision making by top-down evaluation based on an overall aim. In actual decision making, a decision maker often has a specific alternative (regulating alternative) in mind and makes an evaluation on the basis of the alternative. This was modeled in dominant AHP (DAHP), proposed by Kinoshita and Nakanishi[2]. If there are more than one regulating alternatives and the importance of each criterion is inconsistent, the overall evaluation value may differ for each regulating alternative. As a method of integrating the importances in such cases, the concurrent convergence method (CCM) was proposed. Kinoshita and Sekitani[3] showed the convergence of CCM. Takao Ohya School of Science and Engineering, Kokushikan University, Tokyo, Japan *
Eizo Kinoshita Faculty of Urban Science, Meijo University, Gifu, Japan * Corresponding author. J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 247–254. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
248
T. Ohya and E. Kinoshita
Meanwhile, Ohya and Kinoshita proposed the geometric mean multiple dominant AHP (GMMDAHP), which integrates weights by using a geometric mean based on an error model to obtain an overall evaluation value. Herein, such methods of evaluation with multiple regulating alternatives will be generically referred to as the multiple dominant AHP (MDAHP). Section 2 briefly explains DAHP and MDAHP and then proposes a super pairwise comparison matrix (SPCM) to express the pairwise comparisons appearing in the evaluation processes of the dominant AHP and MDAHP as a single pairwise comparison matrix. Section 3 gives a specific numerical example in which the Harker method are applied to a SPCM. With the numerical example, it is shown that the evaluation value resulting from the application of DAHP to the evaluation value obtained from each pairwise comparison matrix by using the eigenvalue method does not necessarily coincide with the evaluation value resulting from the application of the Harker method to the SPCM.
2 SPCM In this section, we propose a SPCM to express the pairwise comparisons appearing in the evaluation processes of DAHP and MDAHP as a single pairwise comparison matrix. Section 2.1 outlines DAHP procedure and explicitly states pairwise comparisons. Section 2.2 proposes the SPCM that expresses these pairwise comparisons as a single pairwise comparison matrix.
2.1 Evaluation in DAHP The true absolute importance of alternative a (a = 1,..., A) at criterion c(c = 1,..., C ) is vca . The final purpose of the AHP is to obtain the relative valC
ue (between alternatives) of the overall evaluation value va = ∑ vca of alternac =1
tive a . The procedure of DAHP for obtaining an overall evaluation value is as follows: DAHP Step 1: The relative importance uca = α c vca (where α c is a constant) of alternative a at criterion c is obtained by some kind of methods. In this paper, uca is obtained by applying the pairwise comparison method to alternatives at criterion c .
Proposal of Super Pairwise Comparison Matrix
249
Step 2: Alternative d is the regulating alternative. The importance uca of alternative a at criterion c is normalized by the importance ucd of the regud lating alternative d , and u ca ( = u ca / u cd ) is calculated. Step 3: With the regulating alternative d as a representative alternative, the
importance ucd of criterion c is obtained by applying the pairwise comparison method to criteria, where, ucd is normalized by
C
∑ ucd = 1 .
c =1
d Step 4: From u ca , u cd obtained at Steps 2 and 3, the overall evaluation value
C
d u a = ∑ ucd uca of alternative a is obtained. By normalization at Steps 2 and c =1
3, u d = 1 . Therefore, the overall evaluation value of regulating alternative d is normalized to 1.
2.2 Proposal of SPCM The relative comparison values rcca ′a ′ of importance vca of alternative a at criteria c as compared with the importance vc ′a ′ of alternative a′ in criterion c′ , are arranged in a (CA × CA) or (AC × AC) matrix. In a (CA × CA) matrix, index of alternative changes first. In a (AC × AC) matrix, index of criteria changes first. This is proposed as the SPCM R = ( rcca′a ′ ) or ( raac′c′ ) . In a SPCM, symmetric components have a reciprocal relationship as in pairwise comparison matrices. Diagonal elements are 1 and the following relationships are true: c ′a ′ r ca = 1
ca r ca =1
rcca′a ′
.
,
(1)
(2)
Pairwise comparison at Step 1 of DAHP consists of the relative comparison value rccaa′ of importance vca of alternative a , compared with the importance vca ′
of alternative a′ at criterion c . Pairwise comparison at Step 3 of DAHP consists of the relative comparison value rccd′d of importance vcd of alternative d at criterion c , compared with the importance vc ′d of alternative d at criterion c′ , where the regulating alternative is d . Figures 1 and 2 show SPCMs using DAHP when there are three alternatives (13) and four criteria (I-IV) and the regulating alternative is Alternative 1. In these
250
T. Ohya and E. Kinoshita
figures, * represents pairwise comparison between alternatives at Step 1 and # represents pairwise comparison between criteria at Step 3.
I1 I2 I3 II 1 II 2 II 3 III 1 III 2 III 3 IV 1 IV 2 IV 3
I 1 1 * * #
I 2 * 1 *
I 3 * * 1
II 1 #
II 2
II 3
III
III
III
1 #
2
3
* 1 *
* * 1
#
#
1 * * #
#
#
IV 1 #
IV 2
IV 3
* 1 *
* * 1
#
1 * * #
* 1 *
* * 1
#
1 * *
Fig. 1 SPCM by Dominant AHP (CA × CA)
I 1 II 1 III 1 IV 1 I2 II 2 III 2 IV 2 I3 II 3 III 3 IV 3
I 1 1 # # # *
II 1 # 1 # #
III
1 # # 1 #
IV 1 # # # 1
I 2 *
II 2
III
2
IV 2
I 3 *
*
II 3
3
* *
*
* *
1 *
* 1
*
* 1
* *
1 *
*
* 1
* *
IV 3
* *
1
*
III
1 *
1
Fig. 2 SPCM by Dominant AHP (AC × AC)
Figures 3 and 4 show SPCMs using the MDAHP when there are three alternatives (1-3) and four criteria (I-IV) and all alternatives are regulating ones. In these figures, * represents pairwise comparison between alternatives in a same criterion and # represents pairwise comparison between criteria of a same alternative.
Proposal of Super Pairwise Comparison Matrix
I1 I2 I3 II 1 II 2 II 3 III 1 III 2 III 3 IV 1 IV 2 IV 3
I 1 1 * * #
I 3 * * 1
I 2 * 1 *
II 1 #
II 2
II 3
251
III
III
III
1 #
2
3
# 1 * * #
# # # #
# #
#
# #
1 * * #
# #
#
# # * * 1
* 1 *
#
# # # 1 * *
#
#
IV 3
#
#
#
IV 2
# # * * 1
* 1 *
#
IV 1 #
#
#
# * * 1
* 1 *
Fig. 3 SPCM by MDAHP (CA × CA)
I 1 II 1 III 1 IV 1 I2 II 2 III 2 IV 2 I3 II 3 III 3 IV 3
I 1 1 # # # *
II 1 # 1 # #
III
1 # # 1 #
IV 1 # # # 1
* * * * *
I 2 *
II 2
III
2
IV 2
*
1 # # # *
# 1 # #
III
3
IV 3
*
# # 1 #
* * # # # 1
* *
II 3
*
* *
I 3 *
*
* * * * 1 # # #
# 1 # #
# # 1 #
* # # # 1
Fig. 4 SPCM by MDAHP (AC × AC)
SPCM of DAHP or MDAHP is an incomplete pairwise comparison matrix. Therefore, the LLSM based on an error model or an eigenvalue method such as the Harker or two-stage method is applicable to the calculation of evaluation values from an SPCM.
252
T. Ohya and E. Kinoshita
3 Numerical Example Three alternatives from 1 to 3 and four criteria from I to IV are assumed, where Alternative 1 is the regulating alternative. As the result of pairwise comparison between alternatives at criteria c (c = I,...,IV), the following pairwise comparison matrices RcA , c = Ι,..., IV are obtained:
⎛ 1 1/ 3 5⎞ ⎜ ⎟ R =⎜ 3 1 3 ⎟, ⎜1 / 5 1 / 3 1 ⎟ ⎝ ⎠ ⎛1 1/ 3 1/ 3⎞ ⎟ ⎜ A R III = ⎜ 3 1 1 / 3 ⎟, ⎜3 3 1 ⎟⎠ ⎝ A I
⎛ 1 ⎜ R = ⎜ 1/7 ⎜1 / 3 ⎝
7 1
⎛ 1 ⎜ = ⎜ 1/3 ⎜1 / 5 ⎝
3 1
A II
R
A IV
3
1
3 ⎞ ⎟ 1 / 3 ⎟, 1 ⎟⎠ 5⎞ ⎟ 1⎟ 1 ⎟⎠
.
With regulating alternative 1 as the representative alternative, importance between criteria was evaluated by pairwise comparison. As a result, the following pairwise comparison matrix
RIC is obtained:
⎛ 1 ⎜ ⎜ 3 C RI = ⎜ 1/3 ⎜ ⎜ 3 ⎝
1/ 3 1 1/3 1
3 1/3 ⎞ ⎟ 3 1 ⎟. 1 1/3 ⎟ ⎟ 3 1 ⎟⎠
(1) SPCM + Harker method In the Harker method, the value of a diagonal element is set to the number of missing entries in the row plus 1 and then evaluation values are obtained by the usual eigenvalue method. Figure 5 shows the SPCM by the Harker method. Table 1 shows the evaluation values obtained from the SPCM in Fig. 5. Table 1 Evaluation values obtained by the Harker method
Criterion I Alternative 1 Alternative 2 Alternative 3
0.196 0.370 0.074
Criterion II 0.352 0.039 0.107
Overall Criterion III Criterion IV evaluation value 0.095 0.356 1 0.190 0.087 0.687 0.391 0.072 0.645
Proposal of Super Pairwise Comparison Matrix
253
3 ⎛ 7 1/ 3 5 1/ 3 ⎜ 3 10 3 ⎜ ⎜ 1 / 5 1 / 3 10 ⎜ ⎜ 3 7 7 3 3 ⎜ 1 / 7 10 1 / 3 ⎜ ⎜ 1 / 3 3 10 ⎜ 1 / 3 1/ 3 7 1/ 3 1/ 3 ⎜ ⎜ 3 10 1 / 3 ⎜ 3 3 10 ⎜ ⎜ 3 1 3 ⎜ ⎜ ⎜ ⎝
⎞ ⎟ ⎟ ⎟ ⎟ 1 ⎟ ⎟ ⎟ ⎟ ⎟ 1/ 3 ⎟ ⎟ ⎟ ⎟ 7 3 5⎟ ⎟ 1 / 3 10 1 ⎟ ⎟ 1 / 5 1 10 ⎠ 1/ 3
Fig. 5 SPCM by the Harker method
+
(2) DAHP the eigenvalue method By applying the eigenvalue method to the individual pairwise comparison matrices RcA (c = I,..., IV) , R1C , the evaluation values at Steps 2 and 3 of DAHP are obtained as follows: ⎛ 1 ⎜ ⎜ 3 ⎜1 / 5 ⎝ ⎛ 1 ⎜ ⎜ 1/7 ⎜1 / 3 ⎝
1/ 3 1 1/ 3 7 1 3
5 ⎞ ⎡1.000 ⎤ ⎟ 3 ⎟ ⎢⎢1.754 ⎥⎥ = 3.295 1 ⎟⎠ ⎢⎣ 0.342 ⎥⎦
⎡1.000 ⎤ ⎢1.754 ⎥ ⎢ ⎥ ⎢⎣ 0.342 ⎥⎦
3 ⎞ ⎡1.000 ⎤ ⎟ 1 / 3 ⎟ ⎢⎢ 0.131 ⎥⎥ = 3.007 1 ⎟⎠ ⎢⎣ 0.362 ⎥⎦
⎡1.000 ⎤ ⎢ 0.131 ⎥ ⎢ ⎥ ⎢⎣ 0.362 ⎥⎦ ⎡1.000 ⎤ = 3.136 ⎢⎢ 2.080 ⎥⎥ ⎢⎣ 4.327 ⎥⎦
⎛ 1 1 / 3 1 / 3 ⎞ ⎡1.000 ⎤ ⎜ ⎟ 1 1 / 3 ⎟ ⎢⎢ 2.080 ⎥⎥ ⎜3 ⎜3 3 1 ⎟⎠ ⎢⎣ 4.327 ⎥⎦ ⎝ 3 5 ⎞ ⎡1.000 ⎤ ⎛ 1 ⎡1.000 ⎤ ⎜ ⎟⎢ ⎥ 1 ⎟ ⎢ 0.281 ⎥ = 3.029 ⎢⎢ 0.281 ⎥⎥ ⎜ 1/3 1 ⎜1 / 5 1 ⎢⎣ 0.237 ⎥⎦ 1 ⎟⎠ ⎢⎣ 0.237 ⎥⎦ ⎝ ⎛ 1 1 / 3 3 1/3 ⎞ ⎡ 0 . 169 ⎤ ⎡ 0 . 169 ⎤ ⎜ ⎟⎢ ⎥ ⎢ ⎥ 3 1 3 1 0 . 368 ⎜ ⎟⎢ ⎥ = 4 .155 ⎢ 0 .368 ⎥ ⎜ 1/3 1/3 1 1/3 ⎟ ⎢ 0 . 096 ⎥ ⎢ 0 . 096 ⎥ ⎜ ⎟⎢ ⎥ ⎢ ⎥ ⎜ 3 ⎟ 1 3 1 ⎠ ⎣ 0 . 368 ⎦ ⎝ ⎣ 0 .368 ⎦ .
254
T. Ohya and E. Kinoshita A
A
A
A
C
From the above result, the eigenvalues of RI , RII , RIII , RIV , R1 are 3.259, 3.007, 3.136, 3.029, and 4.155, respectively. The C.I. values are 0.147, 0.004, 0.068, 0.015, and 0.052, respectively. Based on these results, Table 2 shows the evaluation value u1c u1ca of each alternative and criterion and the overall evaluation value of each alternative. Table 2 Evaluation values obtained by the eigenvalue method
Criterion I Alternative 1 Alternative 2 Alternative 3
0.169 0.296 0.058
Criterion II 0.368 0.048 0.133
Criterion III Criterion IV 0.096 0.199 0.414
0.368 0.103 0.087
Overall evaluation value 1 0.646 0.692
The evaluation value resulting from the application of DAHP to evaluation values that are obtained from each pairwise comparison matrix shown in Table 2 by the eigenvalue method do not coincide with the evaluation value resulting from the application of the Harker method to SPCM shown in Table 1. With these numerical example, it is shown that the evaluation value resulting from the application of DAHP to the evaluation value obtained from each pairwise comparison matrix by using the eigenvalue method does not necessarily coincide with the evaluation value resulting from the application of the Harker method to the SPCM.
References 1. Saaty, T.L.: The Analytic Hierarchy Process. McGraw-Hill, New York (1980) 2. Kinoshita, E., Nakanishi, M.: Proposal of new AHP model in light of do-minative relationship among alternatives. Journal of the Operations Research Society of Japan 42, 180–198 (1999) 3. Kinoshita, E., Sekitani, K., Shi, J.: Mathematical Properties of Dominant AHP and Concurrent Convergence Method. Journal of the Operations Research Society of Japan 45, 198–213 (2002) 4. Harker, P.T.: Incomplete pairwise comparisons in the Analytic Hierarchy Process. Mathematical Modeling 9, 837–848 (1987)
Reduction of Dimension of the Upper Level Problem in a Bilevel Programming Model Part 1 Vyacheslav V. Kalashnikov, Stephan Dempe, Gerardo A. Pérez-Valdés, and Nataliya I. Kalashnykova*
Abstract. The paper deals with a problem of reducing dimension of the upper level problem in a bilevel programming model. In order to diminish the number of variables governed by the leader at the upper level, we create the second follower supplied with the objective function coinciding with that of the leader and pass part of the upper level variables to the lower level to be governed but the second follower. The lower level problem is also modified and becomes a Nash equilibrium problem solved by the original and the new followers. We look for conditions that guarantee that the modified and the original bilevel programming problems share at least one optimal solution.
1 Introduction Bilevel programming modeling is a new and dynamically developing area of mathematical programming and game theory. For instance, when we study value chains, the general rule usually is: decisions are made by different parties along the chain, and these parties have often different, even opposed goals. This raises the difficulty of supply chain analysis, because regular optimization techniques Vyacheslav V. Kalashnikov ITESM, Campus Monterrey, Monterrey, Mexico e-mail:
[email protected] Stephan Dempe TU Bergacademie Freiberg, Freiberg, Germany e-mail:
[email protected] Gerardo A. Pérez-Valdés NTNU, Trondheim, Norway e-mail:
[email protected] Nataliya I. Kalashnykova UANL, San Nicolás de los Garza, Mexico e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 255–264. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
256
V.V. Kalashnikov et al.
(e.g., like linear programming) cannot be readily applied, so that tweaks and reformulations are often needed (cf., [1]). The latter is the case with the Natural Gas Value Chain. From extraction at the wellheads to the final consumption points (households, power plants, etc.), natural gas goes through several processes and changes ownership many a time. Bilevel programming is especially relevant in the case of the interaction between a Natural Gas Shipping Company (NGSC) and a Pipeline Operating Company (POC). The first one owns the gas since the moment it becomes a consumption-grade fuel (usually at wellhead/refinement complexes, from now onward called the extraction points) and sells it to Local Distributing Companies (LDC), who own small, city-size pipelines that serve final costumers. Typically, NGSCs neither engage in business with end-users, nor actually handle the natural gas physically. Whenever the volumes extracted by the NGSCs differ from those stipulated in the contracts, we say an imbalance occurs. Since imbalances are inevitable and necessary in a healthy industry, the POC is allowed to apply control mechanisms in order to avoid and discourage abusive practices (the so called arbitrage) on part of the NGSCs. One of such tools is cash-out penalization techniques after a given operative period. Namely, if a NGSC has created imbalances in one or more pool zones, then the POC may proceed to `move' gas from positiveimbalanced zones to negative-imbalanced ones, up to the point where every pool zone has the imbalance of the same sign, i.e., either all non-negative or all nonpositive, thus rebalancing the network. At this point, the POC will either charge the NGSC a higher (than the spot) price for each volume unit of natural gas withdrawn in excess from its facilities, or pay back a lower (than the sale) price, if the gas was not extracted. Prices as a relevant factor induce us into the area of stochastic programming instead of the deterministic approach. The formulated bilevel problem is reduced to the also bilevel one but with linear constraints at both levels (cf., [2]). However, this reduction involves introduction of many artificial variables, on the one hand, and generation of a lot of scenarios to apply the essentially stochastic tools, on the other hand. The latter makes the dimension of the upper level problem simply unbearable burden even for the most modern and powerful supercomputers. The aim of this paper is a mathematical formalization of the task of reduction of the upper level problem’s dimension without affecting (if possible!) the optimal solution of the original bilevel programming problem.
2 Main Results We start with an example. Consider the following bi-level (linear) programming problem (P1):
Reduction of Dimension of the Upper Level Problem
257
F ( x, y, z ) = x − 2 y + z → min x ,y ,z
s.t. x + y + z ≥ 15, 0 ≤ x, y,z ≤ 10 , and z ∈Ψ ( x, y ) ,
(P1)
where Ψ ( x, y ) = { z solving the lower level (linear) problem} : f 2 ( x, y,z ) = 2 x − y + z → min z
s.t. x + y − z ≤ 5, 0 ≤ z ≤ 10. It is easy to show that problem (P1) has a unique optimal solu-
(
)
(
)
tion x* , y* ,z* = ( 0 ,10 ,5 ) , with F x* , y* ,z* = −15 . By the way, the lower level
(
)
optimal value f 2 x* , y* ,z* = −5 . Now, let us construct a modified problem (MP1), which is, strictly speaking, an MPEC (mathematical program with equilibrium constraints): F ( x, y,z ) = x − 2 y + z → min x,y,z
s.t.
(MP1)
x + y + z ≥ 15, 0 ≤ x, y,z ≤ 10, and
( y,z ) ∈Φ ( x ) , where Φ ( x ) = {( y,z ) solving the lower level equilibrium problem} : Find a Nash equilibrium between two followers: 1) Follower 1 has the problem: f1 ( x, y,z ) ≡ F ( x, y,z ) = x − 2 y + z → min y
s.t. x + y − z ≤ 5, 0 ≤ y ≤ 10;
258
V.V. Kalashnikov et al.
2) Follower 2 has the problem: f 2 ( x, y,z ) ≡ 2 x − y + z → min z
s.t. x + y − z ≤ 5, 0 ≤ z ≤ 10. In other words, in problem (MP1), the leader controls directly only variable x , whereas the lower level is represented with an equilibrium problem. In the latter, there are two decision makers: the second one is the same follower from problem (P1); she/he controls variable z , accepts the values of the leader’s variables x as a parameter, and tries to reach a Nash equilibrium with the first follower, who actually is aiming at finding also an equilibrium with the second follower by controlling only variable y and taking the values of the leader’s variable x as a parameter. Actually, follower 1 is the same leader (her/his objective function is the leader’s objective function’s projection onto the space R 2 of the variables ( y, z ) for each fixed value of the variable x .) Now it is not difficult to demonstrate that problem (MP1) is also solvable and
(
(
)
)
has exactly the same solution: x* , y* ,z* = ( 0 ,10 ,5 ) with F x* , y* ,z* = −15 .
By the way, the lower level equilibrium problem has the optimal solution y* = y* ( x ) = 10, z* = z* ( x ) = min {10, 5 + x} for each value 0 ≤ x ≤ 10 of the leader’s upper level variable. Of course, the optimal value x* = 0 provides for the ■ minimum value of the upper level objective function F. Now more generally, consider the following bi-level programming problem: Find a vector
(
F x* , y* ,z*
(x , y ,z )∈ X ×Y × Z ⊂ R *
)
*
*
n1
× R n2 × R n3 such that (P1):
⎧ ⎫ ⎪ ⎪ ⎪ F ( x, y,z ) w.r.t. ( x, y ) ∈ X × Y , subject to ⎪ ⎪⎪ ⎪⎪ G ( x, y,z ) ≤ 0, where = min ⎨ ⎬ ⎪ ⎪ ⎧⎪ f 2 ( x, y,z ) w.r.t. z ∈ Z and ⎫⎪ ⎪ ⎪ z x, y Arg min ∈ Ψ = ( ) ⎨ ⎬⎪ ⎪ ⎪⎩subject to g ( x, y,z ) ≤ 0. ⎪⎭ ⎭⎪ ⎩⎪
Here F , f 2 : R n → R, and G : R n → R m1 , g : R n → R m2 are continuous functions and mappings, respectively, with n = n1 + n2 + n3 , where ni ,i = 1, 2,3;m j , j = 1, 2,
are some fixed natural numbers. In a relation to problem (P1), let us define the following auxiliary subset:
Φ = {( x, y ) ∈ X × Y : ∃ z ∈ Z such that g ( x, y,z ) ≤ 0} .
(1)
Reduction of Dimension of the Upper Level Problem
259
Now we make the following assumption: A1. The set Φ1 ⊆ Φ defined as the subset of all pairs ( x, y ) ∈Φ , for which there
exists a unique vector z = z ( x, y ) ∈Ψ ( x, y ) satisfying, in addition, the inequality G ( x, y, z ( x, y ) ) ≤ 0 , is nonempty, convex, and compact. Moreover, suppose that
the thus defined function z : Φ1 → R n3 is continuous with respect to all its variables. Next, we introduce another bi-level programming problem, which is actually a so-called mathematical program with equilibrium constraints (MPEC): Find a
(x , y ,z )∈ X ×Y × Z ⊂ R *
*
*
(MP1)
n1
× R n2 × R n3 solving the problem (MP1):
F ( x, y, z ) →
(2)
min
( x,y,z )∈ X ×Y × Z
subject to (3) ( y,z ) ∈ Λ ( x ) , where Λ ( x ) is a collection of generalized Nash equilibrium (GNE) points of the following two-person game. Player 1 selects her strategies from the set Y and minimizes her payoff function f1 ( x, y, z ) ≡ F ( x, y, z ) subject to the con-
straints G ( x, y, z ) ≤ 0 and g ( x, y,z ) ≤ 0 . Player 2 uses the set of strategies Z and minimizes
her
payoff
function
G ( x, y, z ) ≤ 0 and g ( x, y,z ) ≤ 0 .
f 2 ( x, y,z )
subject
to
the
constraints
Remark 1. It is clear that if a vector ( y ,z ) ∈ Y × Z solves the lower level equilib-
rium problem of the MPEC (MP1) for a fixed x ∈ X , then z = z ( x, y ) , with the
mapping z = z ( x, y ) ∈Ψ ( x, y ) from assumption A1. Conversely, if a vector y minimizes the function f1 ( y ) ≡ f1 ( x, y, z ( x, y ) ) over an appropriate set of vec-
tors y, and in addition, G ( x, y ,z ( x, y ) ) ≤ 0 , then lower level problem in (MP1).
( y ,z ) = ( y ,z ( x, y ))
solves the ■
We are interested in establishing relationships between the solutions sets of problems (P1) and (MP1). First, we can prove the following auxiliary result. Theorem 1. Under assumption A1, there exists a nonempty convex compact subset D ⊂ X such that for all x ∈ D , there is a generalized Nash equilibrium (GNE) solution ( y,z ) ∈ Y × Z of the lower level equilibrium problem of the MPEC (MP1). ■
In order to establish relationships between the optimal solutions sets of problems (P1) and (MP1). We start with a rather restrictive assumption concerning problem (MP1), having in mind to relax them in our second paper.
260
V.V. Kalashnikov et al.
A2. Assume that the generalized Nash equilibrium (GNE) state y = y ( x ) , whose ex-
istence for each x ∈ D has been established in Theorem 1, is determined uniquely. Remark 2. In assumption A2, it would be redundant to demand the uniqueness of the GNE state z = z ( x ) , because this has been already required implicitly in as-
y = y ( x)
sumption A1: indeed, if
z = z ( x, y ( x ) ) = z ( x ) .
is determined uniquely, so is the ■
Theorem 2. Under assumptions A1 and A2, problems (P1) and (MP1) are equivalent. ■
Next, we examine certain nonlinear and linear bilevel programs and find out when assumptions A1 and A2 hold in this particular case. Moreover, we try to relax some of the too restrictive conditions in these assumptions.
3 Nonlinear Case First, it is easy to verify that for a problem (P1) with a non-void solutions set, assumption A1 always holds if all the components of the mappings G and g are convex (continuous) functions, and in addition, the lower level objective function f 2 = f 2 ( x, y,z ) is strictly convex with respect to z for each fixed pair of values of
( x, y ) .
■
Lemma 3. Under the above cited conditions, assumption A1 holds.
Assumption A2 is much more restrictive than A1: the uniqueness of a generalized Nash equilibrium (GNE) is quite a rare case. In order to deal with assumption A2, we have to suppose additionally that the upper and lower level objective functions are (continuously) differentiable, and moreover, the combined gradient
(∇
T T y F ,∇ z f2
mapping
fixed x ∈ X .
In
)
T
: R n2 + n3 → R n2 + n3
mathematical
is strictly monotone for each
terms,
the
latter
means
(∇ F ( x, y ,z ) ,∇ f ( x, y ,z )) − ( ∇ F ( x, y ,z ) ,∇ f ( x, y , z )) T y
1
T z 2
1
1
1
T
T y
2
2
T z 2
2
2
that
T
,
>0
( y ,z ) − ( y , z ) for all ( y , z ) ≠ ( y ,z ) from the (convex) subset Ξ = Ξ ( x ) defined below: 1
1
2
1
1
2
2
2
Ξ = Ξ ( x ) = {( y, z ) ∈ Y × Z : G ( x, y,z ) ≤ 0 and g ( x, y,z ) ≤ 0} ,
(4)
which is assumed to be non-empty for some subset K of X. Then it is well-known (cf., [3]), that for each x ∈ K , there exists a unique GNE ( y ( x ) , z ( x ) ) of the LLP in (MP1), which can be found as a (unique) solution of the corresponding variational inequality problem: Find a vector ( y ( x ) , z ( x ) ) ∈ Ξ ( x ) such that
Reduction of Dimension of the Upper Level Problem
( y − y ( x))
T
261
∇ y F ( x, y ( x ) ,z ( x ) ) + ( z − z ( x ) ) ∇ z f 2 ( x, y ( x ) ,z ( x ) ) ≥ 0 T
(5)
for all ( y, z ) ∈ Ξ ( x ) .
4 Linear Case In the linear case, when all the objective functions and the components of the constraints are linear functions and mappings, respectively, the situation with providing that assumptions A1 and A2 hold, is a bit different. For assumption A1 to hold, again it is enough to impose conditions guaranteeing the existence of a unique solution of the lower level LP problem z = z ( x, y ) on a certain compact subset of Z. For instance, the classical conditions will do (cf., [4]). As for assumption A2, here in linear case, the problem is much more complicated. Indeed, the uniqueness of a generalized Nash equilibrium (GNE) at the lower level of (MP1) is much too restrictive a demand. As was shown by Rosen [5], the uniqueness of a so-called normalized GNE is rather more realistic assumption. This idea was further developed later by many authors (cf., [6]). Before we consider the general case, we examine an interesting example (a slightly modified example from [7]), in which one of the upper level variables accepts only integer values. In other words, the problem studied in this example, is a so called mixed-integer bi-level linear programming problem (MIBLP). Consider the following example. Let the upper level problem have the following objective function: F ( x, y, z ) = −60 x − 10 y − 7 z → min
(6)
x ∈ X = {0,1} ; y ∈ [ 0,100] ; z ∈ [0 ,100]
(7)
x,y ,z
subject to and
f 2 ( x, y, z ) = −60 y − 8 z → min
(8)
z
subject to ⎡10 2 3 ⎤ ⎡ x ⎤ ⎡ 225⎤ ⎡0 ⎤ g ( x, y,z ) = ⎢⎢ 5 3 0 ⎥⎥ ⎢⎢ y ⎥⎥ − ⎢⎢ 230 ⎥⎥ ≤ ⎢⎢0 ⎥⎥ . ⎢⎣ 5 0 1 ⎥⎦ ⎢⎣ z ⎥⎦ ⎢⎣ 85 ⎥⎦ ⎢⎣0 ⎥⎦
(9)
We select the mixed-integer bi-level linear program (MIBLP) (6)–(9) as problem (P1). Its modification in comparison to the original example in [7] consists in elevating the lower level variable y (in the original example) up to the upper level in our example. (However, it is curious to notice that the optimal solution of the original example coincides with that of the modified one:
( x*, y*, z * ) = (1; 75; 21 23 )
in both cases!)
262
V.V. Kalashnikov et al.
It is easy to examine that assumption A1 holds in this problem: indeed, the lower level problem (8)–(9) has a unique solution
{
}
z = z ( x, y ) = min 85 − 5 x, 75 − 103 x − 23 y
for
any
pair
of
feasible
val-
ues ( x, y ) ∈Φ = {( x, y ) : x ∈ {0,1} , 0 ≤ y ≤ 100} , which is in line with the predic-
tions by Mangasarian [4]. However, not all triples ( x, y, z ( x, y ) ) satisfy the lower
level constraints g ( x, y,z ( x, y ) ) ≤ 0 , and the feasible subset Φ1 ⊂ Φ described in assumption A1 becomes here
Φ1 = {( 0 ,1) : 0 ≤ y ≤ 76 23 } ∪ {(1, y ) : 0 ≤ y ≤ 55} ,
(10)
with the optimal reaction function ⎧⎪75 − 2 y, if x = 0; 3 z = z ( x, y ) = ⎨ (11) 2 2 − 71 ⎪⎩ 3 3 y, if x = 1. Therefore, assumption A1 would hold completely if the variable x were a continuous one. However, here the subset Φ1 ⊂ Φ is non-void, composed of two compact and convex parts, and the (constant) function z = z ( x, y ) is continuous with re-
spect to the continuous variable y over each of the connected parts of Φ1 . Next, comparing the optimal values of the upper level objective function F over both connected parts of the feasible set Φ1 , we come to the conclusion that the triple
( x*, y*, z * ) = (1; 75; 21 23 )
is the optimal solution of problem (P1). Indeed,
(
) ) ( than F ( 0 , y ,z ) = F ( 0,76
F 1, y1* ,z1* = F 1, 75, 21 23 = −1011 23 * 0
* 0
2 3
is
strictly
less
)
, 23 89 = −933 89 .
Now consider the modified problem: F ( x, y, z ) = −60 x − 10 y − 7 z → min ,
(12)
x,y ,z
subject to
x ∈ X = {0,1} ; y ∈ [ 0,100] ; z ∈ [ 0,100] ;
(13)
and ⎧ ⎪ ⎪ ; f 2 ( x, y,z ) = −60 y − 8 z → min , ⎪ f1 ( x, y,z ) = −60 x − 10 y − 7 z → 0≤min y ≤100 0 ≤ z ≤100 ⎪⎪ subject to (14) ⎨ ⎪ ⎡10 2 3⎤ ⎡ x ⎤ ⎡ 225⎤ ⎡0 ⎤ ⎪ ⎪ G ( x, y,z ) = ⎢⎢ 5 3 0 ⎥⎥ ⎢⎢ y ⎥⎥ − ⎢⎢ 230 ⎥⎥ ≤ ⎢⎢0 ⎥⎥ . ⎪ ⎢⎣ 5 0 1⎥⎦ ⎢⎣ z ⎥⎦ ⎢⎣ 85⎥⎦ ⎢⎣0 ⎥⎦ ⎪⎩
Reduction of Dimension of the Upper Level Problem
263
We call problem (12)–(14) the modified problem (MP1). It is easy to see that for each value of x, either x = 0 or x = 1, the lower level problem has a continuous set of GNEs. Namely, if x = 0, then all the GNE points ( y,z ) = ( y ( 0 ) ,z ( 0 ) ) belong to the strait line interval described by the equation: 2 2 y + 3 z = 225 with 0 ≤ y ≤ 76 . (15) 3 In a similar manner, another strait line interval of GNE vectors for x = 1, that is ( y,z ) = ( y (1) ,z (1) ) , can be represented by the linear equation
(16) 2 y + 3 z = 215 with 0 ≤ y ≤ 75 . As it could be expected, the linear upper level objective function F attains its minimum value at the extreme points of the above intervals (15) and (16), corresponding to the greater value of the variable y:
) ( ) ( = F ( x , y , z ) = F ( 0 , 75, 21 ) = −1011
F0* = F x*0 , y*0 , z*0 = F 0 , 76 23 , 23 89 = −933 89 ; F1*
* 1
* 1
* 1
2 3
2 3
(17)
.
As F1* < F0* , the global optimal solution of problem (MP1) coincides with that of
(
)
the original problem (P1): ( x*, y*, z * ) = 1; 75; 21 23 , although assumption A2 is clearly invalid in this example.
■
Acknowledgements The research activity of the first author was financially supported by the R&D Department (Cátedra de Investigación) CAT-174 of the Instituto Tecnológico y de Estudios Superiores de Monterrey (ITESM), Campus Monterrey, and by the SEP-CONACYT project CB-200801-106664, Mexico. The third author was supported by the SEP-CONACYT project CB2009-01-127691.
References 1. Kalashnikov, V., Ríos-Mercado, R.: A natural gas cash-out problem: A bi-level programming framework and a penalty function method. Optimization and Engineering 7(4), 403–420 (2006) 2. Kalashnikov, V., Pérez-Valdés, G., Tomasgard, A., Kalashnykova, N.: Natural gas cash-out problem: Bilevel stochastic optimization approach. European Journal of Operational Research 206(1), 18–33 (2010) 3. Kinderlehrer, D., Stampacchia, G.: An Introduction to Variational Ine-qualities and Their Applications. Academic Press, New York (1980) 4. Mangasarian, O.: Uniqueness of solution in linear programming. Linear Algebra and Its Applications 25, 151–162 (1979)
264
V.V. Kalashnikov et al.
5. Rosen, J.: Existence and uniqueness of equilibrium points for concave N-person games. Econometrica 33(3), 520–534 (1965) 6. Nishimura, R., Hayashi, S., Fukushima, M.: Robust Nash equilibria in N-person noncooperative games: Uniqueness and reformulation. Pacific Journal of Optimization 5(2), 237–259 (2005) 7. Saharidis, G., Ierapetritou, M.: Resolution method for mixed integer bilevel linear problems based on decomposition technique. Journal of Global Optimization 44(1), 29–51 (2009)
Reduction of Dimension of the Upper Level Problem in a Bilevel Programming Model Part 2 Vyacheslav V. Kalashnikov, Stephan Dempe, Gerardo A. Pérez-Valdés, and Nataliya I. Kalashnykova*
Abstract. The paper deals with a problem of reducing dimension of the upper level problem in a bilevel programming model. In order to diminish the number of variables governed by the leader at the upper level, we create the second follower supplied with the objective function coinciding with that of the leader and pass part of the uppser level variables to the lower level to be governed but the second follower. The lower level problem is also modified and becomes a Nash equilibrium problem solved by the original and the new followers. We look for conditions that guarantee that the modified and the original bilevel programming problems share at least one optimal solution.
5 Normalized Generalized Nash Equilibrium We continue considering the reduction of dimension of the upper level problem in a bilevel program, studied in Part 1 of this paper. Following the line proposed in Rosen [5], we consider the concept of normalized generalized Nash equilibrium (NGNE) defined below. First of all, we have to make our assumptions more detailed: A3. We assume that each component G j ( x, y,z ) and g k ( x, y,z ) of the mapping G and g, respectively, is convex with respect to the variable ( y, z ) . Moreover, for Vyacheslav V. Kalashnikov ITESM, Campus Monterrey, Monterrey, Mexico e-mail:
[email protected] Stephan Dempe TU Bergacademie Freiberg, Freiberg, Germany e-mail:
[email protected] Gerardo A. Pérez-Valdés NTNU, Trondheim, Norway e-mail:
[email protected] Nataliya I. Kalashnykova UANL, San Nicolás de los Garza, Mexico e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 265–272. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
266
V.V. Kalashnikov et al.
each
feasible x ∈ X ,
fixed
there
( y , z ) = ( y ( x ) , z ( x ) ) ∈ Y × Z such that g ( x, y ( x ) ,z ( x ) ) < 0 for every nonlinear 0
0
0
0
0
0
k
(
exists
G j x, y
0
constraint
a
( x ) , z ( x )) < 0 0
point and
G j ( x, y, z ) ≤ 0 and
g k ( x, y, z ) ≤ 0 , respectively.
Remark 3. The latter inequalities in assumption A3 give a sufficient (Slater) condition for the satisfaction of the Karush-Kuhn-Tucker (KKT) constraint qualification. We wish to use the differential form of the necessary and sufficient KKT conditions for a constrained maximum. We therefore make the additional assumption: A4. All the components G j ( x, y,z ) and g k ( x, y,z ) of the mappings G and g, re-
spectively, possess continuous first derivatives with respect to y and z for all feasible ( x, y,z ) ∈ X × Y × Z . We also assume that for all feasible points, the payoff
function fi ( x, y,z ) for the i-th player possess continuous first derivatives with respect to the corresponding variables controlled by the player. In our problem (P2), for the scalar functions fi ( x, y, z ) , i = 1,2 , we denote by ∇ y f1 ( x, y, z ) and ∇ z f 2 ( x, y, z ) , respectively, the gradient with respect to the
player’s variables. Thus ∇ y f1 ( x, y, z ) ∈ R m2 and ∇ z f 2 ( x, y,z ) ∈ R m3 . The KKT conditions equivalent to (3) from Part 1 can now be stated as follows:
G ( x, y,z ) ≤ 0 , and g ( x, y,z ) ≤ 0 ,
(18)
and there exist u = ( u1 ,u2 ) ∈ R+m1 × R+m1 ,v = ( v1 ,v2 ) ∈ R+m2 × R+m2 such that
uiT G ( x, y,z ) = 0, viT g ( x, y,z ) = 0 , i = 1, 2,
(19)
and
⎧⎪ f1 ( x, y,z ) ≤ f1 ( x,w,z ) + u1T G ( x,w,z ) + v2T g ( x,w,z ) ; ⎨ T T ⎪⎩ f 2 ( x, y,s ) ≤ f 2 ( x, y,s ) + u2 G ( x, y,s ) + v2 g ( x, y,s ) .
(20)
Since fi ,i = 1, 2 , and the components G j ( x, y,z ) and g k ( x, y,z ) of the mappings
G and g, respectively, are convex and differentiable by assumptions A3 and A4, inequalities (20) are equivalent to
⎧⎪∇ y f1 ( x, y,z ) + u1T ∇ y G ( x, y,z ) + v1T ∇ y g ( x, y,z ) = 0; ⎨ T T ⎪⎩∇ z f 2 ( x, y,z ) + u2 ∇ z G ( x, y,z ) + v2 ∇ z g ( x, y,z ) = 0.
(21)
Reduction of Dimension of the Upper Level Problem
267
We shall also use the following relation, which holds as a result of the convexity of the components G j ( x, y,z ) and g k ( x, y,z ) . For every
( y , z ) ,( y , z ) ∈ Y × Z and at each fixed x ∈ X we have 0
0
1
1
(
)
(
) ( ) ( ) ( ) (
)
(
)
) ( )
)
T ⎧ 1 1 0 0 1 0 1 0 ∇ ( y ,z )G j x, y 0 ,z 0 = ⎪G j x, y ,z − G j x, y , z ≥ y − y ,z − z ⎪ T T ⎪ = y1 − y 0 ∇ y G j x, y 0 , z 0 + z1 − z 0 ∇ z G j x, y 0 , z 0 ; ⎪ (22) ⎨ T 0 0 ⎪ g k x, y1 , z1 − g k x, y 0 , z 0 ≥ y1 − y 0 ,z1 − z 0 ∇ = g x, y , z ( y ,z ) k ⎪ T T ⎪ 1 0 ∇ y g k x, y 0 ,z 0 + z1 − z 0 ∇ z g k x, y 0 ,z 0 . ⎪⎩ = y − y
(
)
(
(
)
(
(
)
(
)
(
)
)
(
A weighted nonnegative sum of the functions fi ,i = 1, 2 , is given by
σ ( x, y, z;r ) = r1 f1 ( x, y, z ) + r2 f 2 ( x, y,z ) , ri ≥ 0 ,
(23)
for each nonnegative vector r ∈ R 2 . For each fixed r, a related mapping p ( x, y,z;r ) of R n2 + n3 into itself is defined in terms of the gradients of the functions fi ,i = 1, 2, by
⎡ r1∇ y f1 ( x, y, z ) ⎤ p ( x, y, z; r ) = ⎢ ⎥. ⎢⎣ r2∇ z f 2 ( x, y,z ) ⎥⎦
(24)
After Rosen [5], we shall call p ( x, y,z;r ) the pseudo-gradient of σ ( x, y,z;r ) .
An important property of σ ( x, y,z;r ) is given by the following
Definition 1 [5]. The function σ ( x, y,z;r ) will be called uniformly diagonally
( y, z ) ∈ Y × Z and fixed
strictly convex for
(
0
any y , z
0
) ,( y , z ) ∈ Y × Z we have 1
r ≥ 0 if for every fixed x ∈ X and for
1
T
⎛ y1 − y 0 ⎞ ⎜ ⎟ ⎡ p x, y1 ,z1 ;r − p x, y 0 ,z 0 ;r ⎤ > 0. ⎦ ⎜ z1 − z 0 ⎟ ⎣ ⎝ ⎠
(
) (
)
(25)
Repeating and modifying arguments similar to those in [5], we will show later that a sufficient condition that σ ( x, y,z;r ) be uniformly diagonally strictly conT vex is that the symmetric matrix ⎡⎢ P ( x, y,z;r ) + P ( x, y, z;r ) ⎤⎥ is (uniformly by x ⎣ ⎦ from X) positive definite for ( y, z ) ∈ Y × Z , where P ( x, y,z;r ) is the Jacobi ma-
trix with respect to (y,z) of p ( x, y,z;r ) .
268
V.V. Kalashnikov et al.
Now following [5] we consider a special kind of equilibrium point such that each of the nonnegative multipliers u ∈ R+m1 ,v ∈ R+m2 involved in the KKT conditions (19)–(20) is given by
⎧⎪u1 = u 0 r1 and v1 = v 0 r1 , ⎨ 0 0 ⎪⎩u2 = u r2 and v2 = v r2 ,
(26)
for some r > 0 and u 0 ≥ 0, v 0 ≥ 0 . Like Rosen [5], we call this a normalized Nash equilibrium point (NGNE point). Now we establish, by slightly modifying the proofs of Theorems 3 and 4 in [5], the existence and uniqueness results for the NGNE points involved in the modified problem (MP1). Theorem 4. Under assumptions A3 and A4, there exists a normalized generalized Nash equilibrium point to a lower level equilibrium problem (3) in (MP1) for every specified r > 0 . Proof.. For a fixed value r = r > 0 , let
ρ ( x, y,z; w,s; r ) = r1 f1 ( x,w,z ) + r2 f 2 ( x, y,s ) .
(27)
Consider the feasible set of equilibrium problem (3):
Θ ( x ) = {( y,z ) ∈ Y × Z such that G ( x, y, z ) ≤ 0 and g ( x, y,z ) ≤ 0}
(28)
and the point-to-set mapping Γ : Θ ( x ) → Θ ( x ) , given by
⎧
Γ ( y, z ) = ⎨( w,s ) ∈ Θ ( x ) ρ ( x, y, z; w,s;r ) = ⎩
min
( q ,t )∈Θ ( x )
⎫
ρ ( x, y, z;q,t; r ) ⎬ . ⎭
(29)
It follows (by assumptions A3 and A4) from the continuity of ρ ( x, y, z;q,t;r )
and the convexity in (q,t) of ρ ( x, y, z;q,t;r ) for fixed (x,y,z) that Γ is an upper semi-continuous mapping that maps each point of the convex, compact set Θ ( x )
into a closed compact subset of Θ ( x ) . Then by the Kakutani Fixed Point Theo-
(
)
(
) ( min ρ ( x, y , z ; w,s;r ) . ) Θ( )
)
rem, there exists a point y 0 ,z 0 ∈Θ ( x ) such that y 0 , z 0 ∈ Γ y 0 , z 0 , or
(
)
ρ x, y 0 , z 0 ; y 0 , z 0 ; r = The fixed point
0
( w,s ∈
( y , z ) ∈Θ ( x ) 0
0
0
is an equilibrium point satisfying (3). For sup-
pose that it were not. Then, say for player 1, there would be a point
( y , z ) ∈Θ ( x ) and f ( x, y ,z ) < f ( x, y ,z ) . But then ρ ( x, y ,z ; y ,z ;r ) < ρ ( x, y ,z ; y ,z ;r ) , which contradicts (30). 1
0
1
0
0
1
0
0
1
0
1
0
(30)
x
0
0
0
0
y1 such that we
have
Reduction of Dimension of the Upper Level Problem
269
Now by the necessity of the KKT conditions, (30) implies the existence of u ∈ R+m1 ,v 0 ∈ R+m2 such that 0
(u )
T
0
(v )
G ( x, y,z ) = 0 ,
0
T
g ( x, y,z ) = 0,
(31)
and
( ) ( )
( ) ( )
T T ⎧ 0 ∇ y G ( x, y,z ) + v0 ∇ y g ( x, y,z ) = 0; ⎪ r1∇ y f1 ( x, y,z ) + u ⎨ T T ⎪ r2∇ z f 2 ( x, y,z ) + u 0 ∇ z G ( x, y,z ) + v 0 ∇ z g ( x, y,z ) = 0. ⎩ But these are just the conditions (19) and (21), with 0 0 ⎪⎧u1 = u r1 and v1 = v r1 , ⎨ 0 0 ⎪⎩u2 = u r2 and v2 = v r2 ,
which, together with (18), are sufficient to ensure that (3);
( y ,z ) 0
0
(32)
( y , z ) ∈Θ ( x ) 0
0
satisfies
is therefore a normalized generalized Nash equilibrium (NGNE)
point for the specified value of r = r .
■
Theorem 5. Let assumptions A3 and A4 be valid, and σ ( x, y,z;r ) be (uniformly
by x from X) diagonally strictly convex for every r ∈ Q , where Q is a convex subset of the positive orthant of R 2 . Then for each r ∈ Q there is a unique normalized generalized Nash equilibrium (NGNE) point. Proof. Assume that for some r = r ∈ Q we have two distinct NGNE points
( y , z ) ≠ ( y , z ) ∈Θ ( x ) . Then we have for A = 0,1, G ( x, yA , zA ) ≤ 0 , and g ( x, yA , zA ) ≤ 0; 0
0
1
1
(33)
there exist uA ∈ R+m1 ,vA ∈ R+m2 , such that
(uA ) G ( x, yA , zA ) = 0, ( vA ) g ( x, yA , zA ) = 0, T
T
( (
(34)
) ( ) ∇ G ( x, yA ,zA ) + ( vA ) ∇ g ( x, yA ,zA ) = 0; (35) ) ( ) ∇ G ( x, yA ,zA ) + ( vA ) ∇ g ( x, yA ,zA ) = 0. We multiply the first row in (35) by ( y − y ) for A = 0 and by ( y − y ) for A = 1 ; in a similar manner, we multiply the second row in (35) by ( z − z ) for A = 0 and by ( z − z ) for A = 1 ; finally, we sum all these four terms. This gives ⎧ A A A ⎪ r1∇ y f1 x, y ,z + u ⎨ ⎪ r2∇ z f 2 x, yA ,zA + u A ⎩
T
T
y
y
T
T
z
z
0
1
1
0
1
β + γ = 0 , where
0
0
1
270
V.V. Kalashnikov et al. T
⎛ y1 − y 0 ⎞ β = ⎜ 1 0 ⎟ ⎡ p x, y1 ,z1 ;r − p x, y 0 ,z 0 ;r ⎤ , ⎦ ⎜z −z ⎟ ⎣ ⎝ ⎠
(
) (
)
(36)
and
( ) ∇ G ( x, y ,z )( y − y ) + (u ) ∇ G ( x, y , z )( y − y ) + + ( v ) ∇ g ( x, y ,z )( y − y ) + ( v ) ∇ g ( x, y ,z )( y − y ) + + ( u ) ∇ G ( x, y ,z )( z − z ) + ( u ) ∇ G ( x, y ,z )( z − z ) + + ( v ) ∇ g ( x, y ,z )( z − z ) + ( v ) ∇ g ( x, y ,z )( z − z ) ≥ ≥ ( u ) ⎡G ( x, y ,z ) − G ( x, y ,z ) ⎤ + ( u ) ⎡G ( x , y ,z ) − G ( x, y ,z ) ⎤ + ⎣ ⎦ ⎣ ⎦ + ( v ) ⎡ g ( x, y ,z ) − g ( x, y ,z ) ⎤ + ( v ) ⎡ g ( x, y , z ) − g ( x, y ,z ) ⎤ = ⎣ ⎦ ⎣ ⎦ = − ( u ) G ( x, y ,z ) − ( u ) G ( x, y ,z ) − ( v ) g ( x, y ,z ) − ( v ) g ( x, y ,z ) ≥ 0.
γ = u0
0
T
0
0
0
1 T
1
y
T
0
0
0
1
1
T
y
0
T
0
1
0
1
1
1
0
0
0
1
1
1
1
1
0
T
z
0 T
0
0
0
1 T
1
1
z
1
1
0
z
0 T
0
T
0
1
y
z
0
1
y
0
T
0
1
0
1
1
1
1
T
1 T
1
1
T
1
0
1
0
1
0
1
0
1
T
0
1
1
1
0
0
T
0
0
(37) Then since σ ( x, y,z;r ) is (uniformly by x from X) diagonally strictly convex we
have β > 0 by (25), which contradicts β + γ = 0 and proves the theorem.
■
We complete by giving (similarly to [5]) a sufficient condition on the functions fi ,i = 1, 2, that insures that σ ( x, y,z;r ) is (uniformly by x from X) diagonally
strictly convex. The condition is given in terms of the ( n2 + n3 ) × ( n2 + n3 ) matrix P ( x, y,z;r ) , which is the Jacobi matrix with respect to (y,z) of p ( x, y,z;r ) for
fixed r > 0 . That is, the j-th column of P ( x, y,z;r ) is ∂p ( x, y,z;r ) ∂y j , if 1 ≤ j ≤ n2 ; and it is ∂p ( x, y,z;r ) ∂z j − n2 , if n2 + 1 ≤ j ≤ n3 , where p ( x, y,z;r ) is defined by (24). Theorem 6. A sufficient condition that σ ( x, y,z;r ) be (uniformly by x from X)
diagonally strictly convex for
( y, z ) ∈Θ ( x ) and fixed
r = r > 0 is that the sym-
T metric matrix ⎡⎢ P ( x, y,z;r ) + P ( x, y,z;r ) ⎤⎥ be (uniformly by x from X) positive ⎣ ⎦ definite for ( y, z ) ∈Θ ( x ) .
( y , z ) ≠ ( y , z ) ∈Θ ( x ) be any two distinct points in Θ ( x ) , and let ( y (α ) , z (α ) ) = α ( y , z ) + (1 − α ) ( y ,z ) , so that ( y (α ) , z (α ) ) ∈Θ ( x ) for
Proof. Let
0
0
1
1
1
1
0
0
0 ≤ α ≤ 1 . Now, since P ( x, y,z;r ) is the Jacobi matrix of p ( x, y, z; r ) , we have
Reduction of Dimension of the Upper Level Problem
dp ( x, y (α ) , z (α ) ;r ) dα
= P ( x, y (α ) ,z (α ) ;r )
271
d ( y (α ) ,z (α ) )
dα 1 ⎛ y − y0 ⎞ = P ( x, y (α ) , z (α ) ;r ) ⎜ ⎟, ⎜ z1 − z 0 ⎟ ⎝ ⎠
= (38)
or
(
1
) (
) ∫
p x, y1 ,z1 ; r − p x, y 0 ,z 0 ;r = P ( x, y (α ) , z (α ) ;r ) dα .
(39)
0
T
⎛ y1 − y 0 ⎞ Multiplying both sides by ⎜ ⎟ gives ⎜ z1 − z 0 ⎟ ⎝ ⎠ T
⎛ y1 − y 0 ⎞ ⎜ ⎟ ⎡ p x, y1 ,z1 ;r − p x, y 0 ,z 0 ;r ⎤ = ⎜ z1 − z 0 ⎟ ⎣ ⎦ ⎝ ⎠
(
) (
)
T
⎛ y 0 − y1 ⎞ ⎛ y 0 − y1 ⎞ = ⎜ ⎟ P ( x, y (α ) ,z (α ) ;r ) ⎜ ⎟ dα = ⎜ z 0 − z1 ⎟ ⎜ z 0 − z1 ⎟ ⎠ ⎝ ⎠ 0⎝ 1
∫
1 = 2
T
0 1 ⎛ y 0 − y1 ⎞ ⎡ T ⎛y −y ⎞ ⎜ ⎟ P ( x, y (α ) ,z (α ) ;r ) + P ( x, y (α ) ,z (α ) ;r ) ⎤ ⎜ ⎟ dα > 0, ⎥⎦ ⎜ z 0 − z1 ⎟ ⎜ z 0 − z1 ⎟ ⎢⎣ ⎠ ⎝ ⎠ 0⎝
1
∫
which shows that (25) is satisfied.
■
6 Conclusion The paper (Part 1 and Part 2) deals with an interesting problem of reducing the number of variables at the upper level of bilevel programming problems. The latter problems are widely used to model various applications, in particular, the natural gas cash-out problems described in [1] and [2]. To solve these problems with stochastic programming tools, it is important that part of the upper level variables be governed at the lower level, to reduce the number of (upper level) variables, which are involved in generating the scenario trees. The paper presents certain preliminary results recently obtained in this direction. In particular, it has been demonstrated that the desired reduction is possible when the lower level optimal response is determined uniquely for each vector of upper level variables. In Part 2, the necessary base for similar results is prepared for the general case of bilevel programs with linear constraints, when the uniqueness of the lower level optimal response is quite a rare case. However, if the optimal response is defined for a fixed set of Lagrange multipliers, then it is possible to demonstrate (following Rosen [5]) that the so called normalized Nash equilibrium is unique. The latter gives one a hope to get the positive results for reducing
272
V.V. Kalashnikov et al.
the dimension of the upper level problem without affecting the solution of the original bilevel programming problem.
Acknowledgements The research activity of the first author was financially supported by the R&D Department (Cátedra de Investigación) CAT-174 of the Instituto Tecnológico y de Estudios Superiores de Monterrey (ITESM), Campus Monterrey, and by the SEP-CONACYT project CB-200801-106664, Mexico. The third author was supported by the SEP-CONACYT project CB2009-01-127691.
References 1. Kalashnikov, V., Ríos-Mercado, R.: A natural gas cash-out problem: A bilevel programming framework and a penalty function method. Optimization and Engineering 7(4), 403–420 (2006) 2. Kalashnikov, V., Pérez-Valdés, G., Tomasgard, A., Kalashnykova, N.: Natural gas cash-out problem: Bilevel stochastic optimization approach. European Journal of Operational Research 206(1), 18–33 (2010) 3. Kinderlehrer, D., Stampacchia, G.: An Introduction to Variational Inequalities and Their Applications. Academic Press, New York (1980) 4. Mangasarian, O.: Uniqueness of solution in linear programming. Linear Algebra and Its Applications 25, 151–162 (1979) 5. Rosen, J.: Existence and uniqueness of equilibrium points for concave N-person games. Econometrica 33(3), 520–534 (1965) 6. Nishimura, R., Hayashi, S., Fukushima, M.: Robust Nash equilibria in N-person noncooperative games: Uniqueness and reformulation. Pacific Journal of Optimization 5(2), 237–259 (2005) 7. Saharidis, G., Ierapetritou, M.: Resolution method for mixed integer bilevel linear problems based on decomposition technique. Journal of Global Optimization 44(1), 29–51 (2009)
Representation of Loss Aversion and Impatience Concerning Time Utility in Supply Chains P´eter F¨oldesi, J´anos Botzheim, and Edit S¨ule
Abstract. The paper deals with the investigation of the critical time factor of supply chain. The literature review gives a background to understand and handle the reasons and consequences of the growing importance of time, and the phenomenon of time inconsistency. By using utility functions to represent the value of various deliverytimes for the different participants in the supply chain, including the final customers, it is shown that the behaviour and willingness of payment of time-sensitive and non time-sensitive consumers are different for varying lead times. Longer lead times not only generate less utility but impatience influences the decision makers, that is the time elasticity is not constant but it is function of time. For optimization soft computing techniques (particle swarm optimization in this paper) can be efficiently applied.
1 Introduction Time has limits, consumers have become time-sensitive and choose the contents of their basket of commodities according to available time as well. The time necessary to obtain a product/service (access time) is involved in product utility to an increasing extent, the assurance of which is the task of logistics. There are more reasons for P´eter F¨oldesi Department of Logistics and Forwarding, Sz´echenyi Istv´an University, 1 Egyetem t´er, Gy˝or, 9026, Hungary e-mail:
[email protected] J´anos Botzheim Department of Automation, Sz´echenyi Istv´an University, 1 Egyetem t´er, Gy˝or, 9026, Hungary e-mail:
[email protected] Edit S¨ule Department of Marketing and Management, Sz´echenyi Istv´an University, 1 Egyetem t´er, Gy˝or, 9026, Hungary e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 273–282. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
274
P. F¨oldesi, J. Botzheim, and E. S¨ule
the shortening of this access time, one of the most important is the change in customer expectations, which can be related to new trends emerging in the most diverse areas with the time factor playing the main role [10]. Increasing rapidity is also encouraged by the sellers in the competition against each other based on time, because of the pressure to reduce costs and inventory, and to increase the efficiency and customer satisfaction [12, 19, 21]. Customer satisfaction is determined by human factors as well, based on the loss aversion and impatient features of human thinking: “future utility is less important” [5]. After identifying the character of time as a resource it can be seen that there is a spread of new management technologies aimed at time compression and faster service. In addition to the actors are expected to achieve the traditional requirements such as cost reduction, capacity utilization, increase in efficiency, quality improvement and customer satisfaction [4, 18]. The paper is concerned with the time-sensitive and non time-sensitive customers behaviour by using utility functions, and it tries to find optimal time-parameters for different time-demands by using logistical performance measures based on time.
2 Time Compression The time-sensitive segment of population continuously increases. The time-sensitive segment, which depends on time, expects special services with high time-quality, speed and punctuality. Because of the increasing time-preference the intertemporal decisions are present-asymmetric. Lots of work is coupled with higher income [2, 16, 22] so time becomes scarcer than money. So time sensitivity dominates price sensitivity. The change in the attitude towards time is not a novelty and cannot be related to the formation of the pace society [3, 16]. Logistics has to find delivery solutions adjusted to the consumption behaviour of products, which generates many kinds of logistical needs to be seen already today. The importance of time is different according to production and consumption points of view, but it is different due to customer segments and groups of product as well. Relevant literature deals with consequences of time-based competition and those methods, which can respond to this challenge. More research [13] has found that there is a close relationship between the entire lead time (defined as the period between a purchase order placement and its receipt by the client), the customers demands, the willingness of payment and the customer loyalty. Karmarkar [6] pointed out that shorter delivery times are most probably inversely related to market shares or price premiums or both. Customers highly appreciate short and punctual delivery time; therefore they will not turn to competitors. Customers may be willing to pay a price premium for shorter delivery times. We can find several methods and practices in operations management, which cause visible results in manufacturing. These time-based performances include sales growth, return of investment, market share gain, and over-all competitive position [13, 14, 17, 20].
Representation of Loss Aversion and Impatience Concerning Time Utility
275
The customer’s need for fast service follows in the same direction as the company’s ambition to decrease lead time. By now it has turned out that time itself also behaves like a resource that has to be managed. Therefore within the supply chain not only the interior solutions are aimed at time-saving within the company but the spatial and temporal expansion of remote processes arranged by different actors and with different time-consumption is also of high importance. Literature on the competition strategies based on time is also aimed at the temporal integration of the different levels of the supply chain. Among these we can find the methods being popular nowadays such as just-in-time (JIT), agile production, lean production, Quick Response (QR), Efficient Customer Response (ECR), cross-docking, etc. [3, 13, 17, 19, 20, 21, 22] Based on these we can distinguish the internal - measurable only by the company – and the external - perceived also by the customers – forms of time performances [19] Customer responsiveness is an ability to reduce the time required to deliver products and to reorganize production processes quickly in response to requests. Improved customer responsiveness can be achieved through available inventory, which is close to buyers or faster delivery with shorter lead time and good connection to shipment logistics [11, 18].
2.1 Value Factors of Goods The possession-, consumer-, place- and time-value of products is different but it is the result of correlative processes. Consumer-value is created through production, which is basically determined by the quality of the product but it can be also influenced by the time and place of its access. These two latter values are valuecategories created by logistics. Place- and time-value can be interpreted only in relation to consumer-value because we can decide the optimal time and place of consumption only by obtaining consumer-value and only in accordance with it. Time-value becomes more important as it is determined by the lead time between the appearance and the satisfaction of demand [19]. It is maximal when the searchproduction-obtaining of the product does not have any time-requirements, that is to say the demand can be fulfilled immediately at the moment of its appearance. Time sensitivity is different with each consumer and product. We can speak about time sensitive consumer segments and also such kinds of products, which are very sensitive to any waiting or delay. The willingness of waiting is in relation to the importance of the product and its substitution. With the first one, the waitingwillingness is in direct proportion while with the latter one it is in the inverse ratio. Its formation determines the amount of the opportunity cost of waiting of a product for the consumer. Waiting means opportunity cost, the cost of which comes from wasted-time and wasted possibilities. Time - in a resource environment - behaves as a capacity, which we have to use efficiently. The consumer is always willing to wait as long as the advantage of sacrificed possibilities is lower than the benefit coming from the product, or the cost of waiting does not exceed it (for example unutilized capacities).
276
P. F¨oldesi, J. Botzheim, and E. S¨ule
2.2 The Value of Time Customers tend to make decisions based on acceptability, affordability and accessibility. In the literature these are the 3As framework in assessing potential benefits. Perceived benefits are determined by more elements in connection with product, provider and circumstances, against the perceived sacrifices; factors like cost, risk/uncertainty and time. Time appears like a hidden cost. How we value time depends on several factors. First of all it depends on the customer type. We distinguish between the end user and the industrial customer. The final buyer gets more and more time-sensitive, so in his case the choice based on time can describe a utility function, which measures product usefulness depending on the quantity/lengths of time it takes to obtain it. Fig. 1 shows a possible form of such a function. The derivative function can also give information about how the marginal utility of time behaves. If we can compare it with the marginal cost function of service, we can see whether it is worth making efforts to have faster service in a certain segment.
Fig. 1 Utility of time for the final buyers
Fig. 2 Utility of time for the industrial buyers
For the buyers at higher levels of the supply chain, those who buy for further processing (producers), or for reselling (wholesalers, retailers), there is another kind of utility function to draw. This is shown in Fig. 2. The limited time-utility is due to the larger time consciousness, because time costs money for companies. Like the aim to satisfy the consumer at a high level, the aim to operate efficiently as well leads to optimizing on a time basis.
2.3 Elasticity of Time There are consumers who are not sensitive to time, who do not want to or are not able to afford rapidity. There are products/services as well, where urgency is not necessary, just the opposite, quality is brought by time (e.g., process-centred services). The behaviour of these consumers is shown in Fig. 3, where price is not increasing parallel to faster service (opposite direction on the lead time axis) price
Representation of Loss Aversion and Impatience Concerning Time Utility
277
is constant, independent of time. The buyer does not pay more, even for a quicker service. His relation to time is totally inflexible. Fig. 4 shows the opposite side, where, to get something at a certain time is worth everything; it means there is an infinite time-elasticity.
Fig. 3 Absolute time-insensitive consumer
Fig. 4 Infinite time-elasticity
Time-elasticity shows how much value there is for a buyer to get 1% faster service. Fig. 5 shows the behaviour of a consumer who is not willing to appreciate the acceleration of delivery time in the same degree. Cutting the lead time from T1 to T2 he/she is only willing to pay the price P2 instead of the price P1, that means the relative decrease of T results only in a relative price increment P2−P1 P1 .
Fig. 5 Non-time sensitive consumers
Fig. 6 Time sensitive consumers
Time-elasticity appears in a flexible behaviour, which means a 1% relative decrease in lead time can realize a relative higher price-increment. Even a consumer surplus can arise if the reservation price (the maximum price the buyer is willing to pay for a certain time) is higher than the price fixed by the provider. Fig. 6 shows the behaviour of a time sensitive consumer. Economics and marketing oriented research recognizes that longer lead times might have a negative impact on customer demand. The firms objective is to maximize profit by optimal selection of price and delivery time.
278
P. F¨oldesi, J. Botzheim, and E. S¨ule
2.4 Maximizing the Value-Cost Ratio Concerning the Time Factor Concerning the customers’ time sensitivity detailed in the previous sections the following model can be set. The customer satisfaction (S) is affected by two elements: • SU – the actual utility of obtaining the goods • SA – the accuracy of the service, variance of the lead time A simple representation of the satisfaction based on utility can be: SU (t) = u0 − u1 · t βU
(1)
where u0 and u1 are real constants, t is is the lead time, and βU > 1 represents the time sensitivity of customers. (The value of u0 shows the satisfaction of obtaining the goods with zero lead time. Negative values of SU mean dissatisfaction.) The accuracy is considered as an attractive service element in modern logistic, just-in-time systems. When the supply chain is being extended, that is the lead time is growing, the accuracy, and hit of the time-window is getting harder (see also Fig. 2), thus we can write: A(t) = a0 − a1 · t, (2) where A(t) is the measure of accuracy, a0 and a1 are real constants, t is the time. The satisfaction measure is progressive: SA (t) = (a0 − a1 · t)βA ,
(3)
where βA > 1 is the sensitivity. The cost of the actual logistic service depends on the lead time required, the shorter lead time is the more expensive service. Since the cost reduction is not a linear function of lead time extension we can write: C(t) = c0 +
c1 , t
where C(t) is the cost, t is the lead time and c0 and c1 are real constants. The target is to maximize the total satisfaction over the costs: SU (t) + SA(t) max , 0 < t < ∞. C(t)
(4)
(5)
Increasing the lead-time leads to an “objective” decline in satisfaction. On the other hand, since the future utility is less important [5, 15] in “subjective sense”, another time sensitivity should be applied in the model. Thus instead of considering βU and βA as a constant, the exponents are interpreted as a function of time [1, 9]. In this paper we suggest that let:
βU = βu0 + βu1 · t
(6)
Representation of Loss Aversion and Impatience Concerning Time Utility
and
β A = β a 0 − β a1 · t
279
(7)
So, the function to be maximized is: f=
u0 − u1 · t [βu0 +βu1 ·t] + (a0 − a1 · t)[βa0 −βa1 ·t] . c0 + ct1
(8)
3 Particle Swarm Optimization Particle swarm optimization (PSO) is a population based stochastic optimization technique inspired by social behavior of bird flocking or fish schooling [7, 8]. In these methods a number of individuals try to find better and better places by exploring their environment led by their own experiences and the experiences of the whole community. Each particle is associated with a position in the search space which represents a solution for the optimization problem. Each particle remembers the coordinates of the best solution it has achieved so far. The best solution achieved so far by the whole swarm is also remembered. The particles are moving towards in the direction based on their personal best solution and the global best solution of the swarm. The positions of the particles are randomly initialized. The next position of particle i is calculated from its previous position xi (t) by adding a velocity to it: xi (t + 1) = xi (t) + vi (t + 1).
(9)
The velocity is calculated as: vi (t + 1) = wvi (t) + γ1r1 [pi (t) − xi(t)] + γ2 r2 [g(t) − xi(t)],
(10)
where w is the inertia weight, γ1 and γ2 are parameters, r1 and r2 are random numbers between 0 and 1. The second term in equation 10 is the cognitive component containing the best position remembered by particle i (pi (t)), the third term in the equation is the social component containing the best position remembered by the swarm (g(t)). The inertia weight represents the importance of the previous velocity while γ1 and γ2 parameters represent the importance of the cognitive and social components, respectively. The algorithm becomes stochastic because of r1 and r2 . After updating the positions and velocities of the particles the vectors describing the particles’ best positions and the global best position have to be updated, too. If the predefined iteration number is reached then the algorithm stops. The number of particles is also a parameter of the algorithm.
4 Numerical Example The parameters of function f in Equation 8 used in the simulations are presented in Table 1. Parameters βu1 and βa1 are varied in the simulations. Table 2 shows the PSO
280
P. F¨oldesi, J. Botzheim, and E. S¨ule
parameters. The obtained results for time using different βu1 and βa1 parameters are presented in Table 3. Table 1 Function Parameters Parameter
Value
u0 u1 a0 a1 c0 c1 βu0 βa0
10 0.1 10 0.1 10 100 1.3 1.3
Table 2 PSO Parameters Parameter
Value
Number of iterations Number of particles w γ1 γ2
200 30 1 0.5 0.5
Table 3 Results
βu1
βa1
f
t
0 0.01 0.02 0.03 0.04 0.05 0.06 0.1 0 0.1 1 0 1 2
0 0.01 0.02 0.03 0.04 0.05 0.06 0.1 0.1 0 1 1 0 2
1.368 1.076 0.913 0.804 0.724 0.662 0.611 0.478 0.517 0.962 0.124 0.125 0.521 0.067
15.856 10.825 8.728 7.509 6.688 6.093 5.635 4.516 6.578 6.597 1.3 1.3 2.506 0.65
Representation of Loss Aversion and Impatience Concerning Time Utility
281
It is clearly shown that the increasing impatience presses down the lead-time (t), however this effect is degressive, that is ten times higher impatience (βu1 = 0.01 and βa1 = 0.01 versus βu1 = 0.1 and βa1 = 0.1) results in only around one half alteration in lead time (t = 10.825 versus t = 4.516). By increasing βu1 and βa1 the solution is shifting from non time sensitive case (see Fig. 5) to time sensitive situation (Fig. 6) is subjective sense as well, and it represents the time compression, the “fear” of loss and the impatience of decision makers: they want shorter lead times an they are willing to pay more than it can be derived from the objective value/cost ratio.
5 Conclusions The affects of time sensitivity have objective and subjective features. The utility of possessing goods in time, and the accuracy of delivery times can be described by univariate functions, also the cost of performing that given lead-time can be considered as a hyperbolic function. When maximizing the utility-cost ratio, an other, subjective element can be embedded in the model, by extending the meaning of the power functions used, and time dependent exponents are used. The overall effects that kind of impatience can be detected by using simulations. Particle swarm optimization is an efficient method to explore the side effects of loss aversion, that are turned out to be degressive in our investigation.
References [1] Bleichrodt, H., Rhode, K.I.M., Wakker, P.P.: Non-hyperbolic time inconsistency. Games and Economic Behavior 66, 27–38 (2009) [2] Bosshart, D.: Billig: wie die Lust am Discount Wirtschaft und Gesellschaft ver¨andert. Redline Wirtschaft, Frankfurt/M (2004) [3] Christopher, M.: Logistics and Supply Chain Management (Creating Value-Adding Networks). Prentice-Hall, Englewood Cliffs (2005) [4] F¨oldesi, P., Botzheim, J., S¨ule, E.: Fuzzy approach to utility of time factor. In: Proceedings of the 4th International Symposium on Computational Intelligence and Intelligent Informatics, ISCIII 2009, Egypt, pp. 23–29 (October 2009) [5] Frederick, S., Loewenstein, G., O’Donoghue, T.: Time discounting and time preference: A critical review. Journal of Economic Literature 40, 351–401 (2002) [6] Karmarkar, U.S.: Manufacturing lead times. Elsevier Science Publishers, Amsterdam (1993) [7] Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia, pp. 1942–1948 (1995) [8] Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufmann, San Francisco (2001) [9] Khaneman, D., Tversky, A.: Prospect theory: An analysis of decision under risk. Econometria 47(2), 263–292 (1979)
282
P. F¨oldesi, J. Botzheim, and E. S¨ule
[10] LeHew, M.L.A., Cushman, L.M.: Time sensitive consumers’ preference for concept clustering: An investigation of mall tenant placement strategy. Journal of Shopping Center Research 5(1), 33–58 (1998) [11] Lehmusvaara, A.: Transport time policy and service level as components in logistics strategy: A case study. International Journal of Production Economics 56-57, 379–387 (1998) [12] McKenna, R.: Real Time (The benefit of short-term thinking). Harvard Business School Press, Boston (1997) [13] Nahm, A.Y., Vonderembse, M.A., Koufteros, X.A.: The impact of time-based manufacturing and plant performance. Journal of Operations Management 21, 281–306 (2003) [14] Pangburn, M.S., Stavrulaki, E.: Capacity and price setting for dispersed, time-sensitive customer segments. European Journal of Operational Research 184, 1100–1121 (2008) [15] Prelec, D.: Decreasing impatience: A criterion for non-stationary time preference and “hyperbolic” discounting. Scandinavian Journal of Economics 106, 511–532 (2004) [16] Rosa, H.: Social acceleration: Ethical and political consequences of a desynchronized high-speed society. Constellation 10(1), 3–33 (2003) [17] Saibal, R., Jewkes, E.M.: Customer lead time management when both demand and price are lead time sensitive. European Journal of Operational Research 153, 769–781 (2004) [18] Stalk, G.: Time-based competition and beyond: Competing on capabilities. Planning Review 20(5), 27–29 (1992) [19] De Toni, A., Meneghetti, A.: Traditional and innovative path towards time-based competition. International Journal of Production Economics 66, 255–268 (2000) [20] Tu, Q., Vonderembse, M.A., Ragu-Nathan, T.S.: The impact of time-based manufacturing practices on mass customization and value to customer. Journal of Operations Management 19, 201–217 (2001) [21] Vanteddu, G., Chinnam, R.B., Yang, K., Gushikin, O.: Supply chain focus dependent safety stock placement. International Journal of Flexible Manufacturing System 19(4), 463–485 (2007) [22] Waters, C., Donald, J.: Global Logistics and Distribution Planning. Kogan Page Publishers, Corby (2003)
Robotics Application within Bioengineering: Neuroprosthesis Test Bench and Model Based Neural Control for a Robotic Leg Dorin Popescu, Dan Selişteanu, Marian S. Poboroniuc, and Danut C. Irimia
*
Abstract. This work deals with motion analysis of the human body, with robotic leg control and then with neuroprosthesis test bench. The issues raised in motion analysis are of interest for controlling motion-specific parameters for movement of the robotic leg. The resulting data are used for further processing in humanoid robotics and assistive and recuperative technologies for people with disabilities. The results are implemented on a robotic leg, which has been developed in our laboratories. It has been used to build a neuroprosthesis control test bench. A model based neural control strategy is implemented, too. The performances of the implemented control strategies for trajectory tracking are analysed by computer simulation. Keywords: bioengineering, neural networks, robotics, neuroprosthesis control.
1 Introduction Engineering is facing a challenge by development of a new field, biomedical engineering (application of engineering principles and techniques to the medical field). The present work seeks to close the gap between engineering and medicine, by creating a bridge for motion analysis and neuroprosthesis testing bench. For this reason, the work aims to offer to orthopedic doctors and engineers interested in medical field, a common tool. It is estimated that the annual incidence of spinal cord injury (SCI) in the USA, not including those who die at the scene of the accident, is approximately 40 cases per million of population. The number of people living with a SCI in the USA in Dorin Popescu · Dan Selişteanu Department of Automation & Mechatronics, University of Craiova Craiova, 107 Decebal Blvd., Romania e-mail:
[email protected] *
Marian S. Poboroniuc · Danut C. Irimia Faculty of Electrical Engineering, “Gh. Asachi” Technical University of Iasi Iasi, 53 Mageron Blvd., Romania e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 283–294. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
284
D. Popescu et al.
2008 was estimated to be approximately 259,000 [1]. For the nations of the EU (803,850,858 population estimate in 2009 [2]) there is a lack of data, but some sources estimate that the annual incidence of SCI is approximately 14 cases per million of population [3]. Spinal cord injury results in damage to the upper motor neurons. If there is no damage to the lower motor neurons, the muscles themselves retain their ability to contract and to produce forces and movements. Functional electrical stimulation (FES) is a technology that uses small electrical pulses to artificially activate peripheral nerves causing muscles to contract, and this is done so as to restore body functions. The devices that deliver electrical stimulation and aim to substitute for the control of body functions that have been impaired by neurological damage are termed ‘neuroprostheses’ [4]. An increased effort in terms of organizing and performing experimental tests with a neuroprosthesis and dealing with the required ethic approvals, is required from those testing a neuroprosthesis as well as from a patient by himself. Much more, there is no permanent availability of a disabled person to test a novel control strategy implemented within a neuroprosthesis. Therefore, simulations and equipment that emulate the effects of a neuroprosthesis aiming to rehabilitate disabled people are required [5]. Image processing can be carried out for various purposes (surface analysis, pattern recognition, size determination, etc.). The issues raised in motion analysis are of interest in industrial research, in ergonomics, for the analysis of techniques in movement, for obtaining motion-specific parameters for movements in various sports [6]–[9]. The resulting data (spatial coordinates, velocities and accelerations) can be used for further processing in design, robotics and animation programs. Compared with most other methods of measurement, image analysis has the advantage that it has no direct repercussions (the determination of quantitative dimensions by means of this measuring system has no influence on the behavior of the measured object). Rigid robot systems are subjects of the research in a both robotic and control fields. The reported research leads to a variety of control methods for rigid robot systems [10]. The present paper is addressed to a robotic leg control. High speed and high precision trajectory tracking are frequently requirements for applications of robots. Conventional controllers for robotic leg structures are based on independent control schemes in which each joint is controlled separately ([10], [11]) by a simple servo loop. This classical control scheme is inadequate for precise trajectory tracking. The imposed performances for industrial applications require the consideration of the complete dynamics of the robotic leg. Furthermore, in real-time applications, the ignoring parts of the robot dynamics or errors in the parameters of the robotic leg may cause the inefficiency of this classical control (such as PD controller). An alternative solution to PD control is the computed torque technique. This classical method is in fact a nonlinear technique that takes account of the dynamic coupling between the robot links. The main disadvantage of this structure is the assumption of an exactly known dynamic model. However, the basic idea of this
Robotics Application within Bioengineering
285
method remains important and it is the base of the neural and adaptive control structures [11] - [15]. When the dynamic model of the system is not known a priori or is not available, a control law is erected based on an estimated model. This is the basic idea behind adaptive control strategies. Over the last few years several authors ([14], [16] - [19]) have considered the use of neural networks within a control system for robotic arms. Our work began with motion image analysis (2D and 3D) of the human body motion and interpretation of obtained data (joints positions, velocities and accelerations), which are presented in Section II. Then, in Section III the robotic legs designed for this application are described. In Section IV a model based neural control structure is implemented. The artificial neural network is used to generate auxiliary joint control torque to compensate for the uncertainties in the computed torque based primary robotic leg. The computer simulation is presented. Section V presents a neuroprosthesis control test bench which integrates a robotic leg mimicking the human body movements. The human body is supposed to be under the action of a neuroprosthesis.
2 Motion Analysis The aim of this work is the design and implementation of a recuperative system (neuroprosthetic device) for people with locomotion disabilities [20]. The stages of the evaluation of people with locomotion disabilities are presented in Fig. 1.
Fig. 1 Evaluation of the people with locomotion disabilities.
Kinematic analysis of human body motion can be done using SIMI Motion software [21]. First phase in motion analysis is description of movements with analytical stand out of motion system. Next phase is recording of motion features and analysis of motion. The obtained data are processed, graphically presented and analysed (Fig. 2). Analysis of the leg movements can be done independently, without forces consideration that produces motion, for position, speed and acceleration, or with forces consideration.
286
D. Popescu et al.
Fig. 2 Motion analysis.
After the image analysis of the human body, the obtained results have to be interpreted and used for modelling. We obtained the position, speed and acceleration evolutions that can be used in the modelling and control of the robotic leg.
3 Robotic Leg Another stage in our work is to design and achieve two robotic legs. The legs are similarly, but one was built in Craiova and another in Iasi. Later, both will be connected in order to create a lower part of a humanoid robot. The aim is to use the obtained results from analysis of the human body motion images in order to implement human movements for robotic legs and test some control algorithms for them. The kinematics chain of the robotic leg means five revolute joints: 2 for hip, 1 for knee and 2 for ankle (Fig. 3).
Fig. 3 The kinematic chain of the robotic leg.
Robotics Application within Bioengineering
287
Fig. 4 The robotic leg.
We implemented the actuating system of the robotic leg with five OMRON servomotors and harmonic gearboxes and the control system of the robotic leg with five OMRON servo drivers and Trajexia control unit.
4 Neural Control In this section, model based neural control structures for the robotic leg is implemented. Various neural control schemes have been studied, proposed and compared. The differences in these schemes are in the role that artificial neural network (ANN) is playing in the control system and the way it is trained for achieving desired trajectory tracking performance. The most popular control scheme is one which uses ANN to generate auxiliary joint control torque to compensate for the uncertainties in the computed torque based primary robotic leg controller that is designed based on a nominal robotic leg dynamic model. This is accomplished by implementing the neural controller in either a feedforward or a feedback configuration, and the ANN is trained on-line. Based on the computed torque method, a training signal is derived for neural controller. Comparison studies based on a robotic planar leg have been made for the neural controller implemented in both feedforward and feedback configurations. A feedback error based neural controller is proposed. In this approach, a feedback error function is minimized and the advantage over Jacobian based approach is that Jacobian estimation is not required.
288
D. Popescu et al.
Fig. 5 Neural control.
The dynamic equation of an n-link robotic leg is given by ([10]):
T = J (q )q + V (q, q )q + G (q ) + F (q )
(1)
where: -
T is an (n×1) vector of joint torques; J(q) is the (n×n) manipulator inertia matrix; V (q, q ) is an (n×n) matrix representing centrifugal and Coriolis effects; G(q) is an (n×1) vector representing gravity; F (q ) is an (n×1) vector representing friction forces; q, q , q are the (n×1) vectors of joint positions, speed and accelerations.
In this approach, a feedback error function is minimized and the advantage over Jacobian based approach is that Jacobian estimation is not required. The inputs to the neural controller (Fig. 5) are the required trajectories qd(t), q d (t ), qd (t ) . The compensating signals from ANN, φp, φv, φa, are added to the desired trajectories. The control law is:
(
(
))
T = Jˆ qd + φa + KV (e + φv ) + K P e + φ p + Hˆ Combining (2) with dynamic equation of robotic leg yields: ~ ~ u = e + KV e + K P e = Jˆ −1 J q + H − Φ
(
)
(2)
(3)
where Φ = φa + KV φv + K Pφ p . Ideally, at u = 0, the ideal value of Φ is:
(
~ ~ Φ = Jˆ −1 J q + H
)
(4)
Robotics Application within Bioengineering
289
The error function u is minimized and the objective function is: J=
( )
1 T u u 2
(5)
The gradient of J is: ∂ J ∂u T ∂Φ T = u=− u ∂w ∂w ∂w
(6)
The backpropagation updating rule for the weights with momentum term is: ∂Φ T ∂J + αΔw(t − 1) = η u + αΔw(t − 1) (7) ∂w ∂w For simulation the planar robot leg with three revolute joints is used (only hip, knee and ankle joints in the same plane are considered). The control objective is to track the desired trajectory given by: Δw(t ) = −η
q1d = 0.4 ⋅ sin(0,4πt )
for hip
q2 d = −0.5 ⋅ sin(0 ,5πt )
for knee
q3d = 0.2 ⋅ sin (0,2πt )
for ankle
For feedback error based neural controller with update backpropagation rule (7) the tracking errors for q1 and q2 are presented in Fig. 6. Tracking errors for q1 ( - ) and for q2 ( ... ) 0.04 0.03 0.02 0.01 0 -0.01 -0.02 -0.03 -0.04
0
2
4
6
8
10 t [s]
12
14
16
18
20
Fig. 6 Tracking errors.
5 Neuroprosthesis Control Test Bench The designed robotic leg may be integrated in a test bench which aim to test neuroprosthesis control (Fig. 7).
290
D. Popescu et al.
In our Simulink model we have implemented a three segmental model with nine mono- and biarticular muscle groups, as described in [22]. These muscle groups are modelled in the sagittal plane inducing moments about the ankle, knee, and hip joints. All muscle groups except monoarticular hip flexors can be activated in a real experiment by a proper arrangement of surface electrodes. Each modelled muscle group has its own activation and contraction dynamics. The inputs for the model are the stimulator pulse width and frequency. Muscle activation, muscle contraction and body segmental dynamics are the three main components of the implemented model. The forces computed for any of the nine muscle groups that are activated due to an applied electrical stimulus, are input to the body-segmental dynamics. The interaction (horizontal and vertical reaction forces) with a seat is modelled by means of a pair of nonlinear spring-dampers.
Fig. 7 Neuroprosthesis control test bench
The upper body effort has to be taken into account into any model that aims to support FES-based controllers testing. Within the patient model as developed in [23], shoulder forces and moment representing the patient voluntary arm support are calculated on a basis of a look-up table, as functions deviations of horizontal and vertical shoulder joint position and trunk inclination from the desired values, and their velocities. In fact, the shoulder forces and moment model is based on a reference trajectory of the shoulder position and trunk inclination during the sit-tostand transfer obtained during an experiment on a sole paraplegic patient. In our case the vertical shoulder forces are modelled as a function of measured knee angles by means of a fuzzy controller.
Robotics Application within Bioengineering
291
Fig. 8 Servo-potentiometers housed in neoprene knee cuffs
A number of experimental tests have been conducted in order to determine the knee angles, knee angular velocities which occur during controlled sitting-down or standing-up. These parameters have been monitored with servo-potentiometers housed in neoprene knee cuffs (Fig. 8). The electronic circuit that measures the knee sensors analogue data and presents it to the PC via the serial port is shown in Fig. 9. A microcontroller PIC16F876A controls the data acquisition and sending protocol. The instrumented knee cuff has been intensively tested to verify the reliability of the acquired data during standing-up and sitting-down. Several trials of recorded knee angle and knee angular velocities during sitting-down are presented in Fig. 10, 11 and 12.
Fig. 9 The electronic circuit which collects knee sensorial data
292
Fig. 10 Knee angle versus time
Fig. 11 Knee angular velocity versus time
Fig. 12 Knee angular velocity versus knee angle
D. Popescu et al.
Robotics Application within Bioengineering
293
The recorded data have to be used to compare with those recorded from the human body Simulink&Matlab model and provided to the robotic leg which mimics the human body movements supposed under the action of a neuroprosthesis.
6 Conclusion and Future Work As a non-obtrusive technique, image processing is an ideal method for collecting movement parameters. Mathematical techniques enable the calculation of data using the spatial coordinates from at least two planes. Only a few conditions must be met to achieve good results. After the human motion analysis and interpretation, the human body model has been implemented in Matlab&Simulink and tested on test bench. Then a robotic leg was designed and achieved and we implemented and tested some control algorithms in order to implement human movements for robotic legs. Classical and neural strategies have been applied to the control of the robotic leg. In the future these control algorithms will be hardware implemented into a recuperative system (neuroprosthetic device) for people with locomotion disabilities. Any new proposed control strategy that aims to support standing in paraplegia has to be embedded within a neuroprosthesis and to be intensively tested in order to appreciate its effectiveness. This isn’t an easy task while ethical commission approval is required anytime, a disabled patient won’t be able to perform trials at any moment and therefore a neuroprosthesis test bench would be a good idea. Modelling the human body within Matlab&Simulink allows us to include effects as: nonlinear, coupled and time-varying muscles response in the presence of the electrical stimulus; muscle fatigue, spasticity, etc. Intensively testing a neuroprosthesis control on a test bench reduces the number of experimental trials to be performed on disabled people. As future work we think to implement a training platform which provide a repository of training material with real clinical case studies using digital imaging and accompanying notes, an interactive multimedia database system containing full reports on patients receiving recuperative treatment. Acknowledgments. This work was supported by CNCSIS–UEFISCSU, Romania, project number PNII–IDEI 548/2008.
References 1. Spinal Cord Injury statistics (updated June 2009), Available from the Foundation for Spinal Cord Injury Prevention, Care and Cure http://www.fscip.org/facts.htm (accessed March 3, 2011) 2. Internet world stats, http://www.internetworldstats.com/europa2.htm (accessed March 3, 2011 3. Paraplegic and Quadriplegic Forum for Complete and Incomplete Quadriplegics and Paraplegics and wheelchair users Paralyzed with a Spinal Cord Injury, http://www.apparelyzed.com/forums/ index.php?s=2f63834381835165c0aca8b76a0acc74&showtopic=2489 (accessed March 3, 2011)
294
D. Popescu et al.
4. Poboroniuc, M., Wood, D.E., Riener, R., Donaldson, N.N.: A New Controller for FESAssisted Sitting Down in Paraplegia. Advances in Electrical and Computer Engineering 10(4), 9–16 (2010) 5. Irimia, D.C., Poboroniuc, M.S., Stefan, C.M.: Voice Controlled Neuroprosthesis System. In: Proceedings of the 12th Mediterranean Conf. on Medical and Biological Engineering and Computing MEDICON 2010, Chalkidiki, Greece, May 27-30. IFMBE Proceedings, vol. 29, p. 426 (2010) 6. Moeslund, T., Granum, E.: A Survey of Computer Vision-Based Human Motion Capture. Computer Vision and Image Understanding 81, 231–268 (2001) 7. Sezan, M.I., Lagendijk, R.L.: Motion Analysis and Image Sequence Processing. Springer, Heidelberg (1993) 8. Aggarwal, J.K., Cai, Q.: Human motion analysis: a review. In: Proc. of Nonrigid and Articulated Motion Workshop, pp. 90–102 (1997) 9. Jun, L., Hogrefe, D., Jianrong, T.: Video image-based intelligent architecture for human motion capture. Graphics, Vision and Image Processing Journal 5, 11–16 (2005) 10. Ivanescu, M.: Industrial robots, pp. 149–186. Universitaria, Craiova (1994) 11. Ortega, R., Spong, M.W.: Adaptive motion control of rigid robots: a tutorial. Automatica 25, 877–888 (1999) 12. Dumbrava, S., Olah, I.: Robustness analysis of computed torque based robot controllers. In: 5-th Symposium on Automatic Control and Computer Science, Iasi, pp. 228–233 (1997) 13. Gupta, M.M., Rao, D.H.: Neuro-Control Systems. IEEE Computer Society Press, Los Alamitos (1994) 14. Ozaki, T., Suzuki, T., Furuhashi, T.: Trajectory control of robotic manipulators using neural networks. IEEE Transaction on Industrial Electronics 38, 641–657 (1991) 15. Liu, Y.G., Li, Y.M.: Dynamics and Model-based Control for a Mobile Modular Manipulator. Robotica 23, 795–797 (2005) 16. Miyamoto, H., Kawato, M., Setoyama, T.: Feedback error learning neural networks for trajectory control of a robotic manipulator. Neural Networks 1, 251–265 (1998) 17. Pham, D.T., Oh, S.J.: Adaptive control of a robot using neural networks. Robotica, 553–561 (2004) 18. Popescu, D.: Neural control of manipulators using a supervisory algorithm. In: A&Q 1998 Int. Conf. on Automation and Quality Control, Cluj-Napoca,pp. A576–A581 (1998) 19. Zalzala, A., Morris, A.: Neural networks for robotic control, pp. 26–63. Prentice-Hall, Englewood Cliffs (1996) 20. Poboroniuc, M., Popescu, C.D., Ignat, B.: Functional Electrical Stimulation. Neuroprostheses control, Politehnium Iasi (2005) 21. SIMI MOTION Manual 22. Poboroniuc, M., Stefan, C., Petrescu, M., Livint, G.: FES-based control of standing in paraplegia by means of an improved knee extension controller. In: 4th International Conference on Electrical and Power Engineering EPE 2006, Bulletin of the Polytechnic Institute of Iasi, tom LII (LIV), Fasc.5A, pp. 517–522 (2006) 23. Riener, R., Fuhr, T.: Patient-driven control of FES-supported standing up: a simulation study. IEEE Trans. Rehabil. Eng. 6, 113–124 (1998)
The Improvement Strategy of Online Shopping Service Based on SIA-NRM Approach Chia-Li Lin
*
Abstract. Nowadays, online shopping through shopping platforms is getting more popular. The new shopping styles are more diverse and provide choices that are more convenient for customers. Despite the disputes that result from the misunderstanding caused by differences between the real and virtual products, i.e., the information showed on website, number of users, and shopping platforms has increased continually, many enterprises selling through physical channels have started to sell on shopping platforms, such as online shopping. Thus, shopping through multiple-platform becomes a trend in the future. Understanding customers’ attitude towards multiple-platform shopping helps shopping platform providers not only improve their service quality but also enlarge the market size. This study has revealed the major service influencers of online shopping platform by using DEMATEL (Decision-making trial and evaluation laboratory) technique. The results showed that design & service feature (DS), maintenance & transaction security (MS), and searching & recommendation service (SR) influence mainly online shopping platforms. In addition, the study used satisfaction-importance analysis (SIA) to measure the importance and service gaps between platforms and suggested that online shopping service providers can improve their strategies using NRM (network relation map). Using this research method, service providers can improve existing functions and plan further utilities for shopping platform in the next generation. Keywords: Service performance, Improvement strategy, Online shopping, Decision-making trial and evaluation laboratory (DEMATEL), SIA-NRM.
1 Introduction E-commerce has changed consumer behavior in current years. Consumers who were accustomed to shop in the department stores, shopping malls, and retail Chia-Li Lin Department of Resort and Leisure Management, Taiwan Hospitality & Tourism College, No. 268, Chung-Hsing St. Feng-Shan Village, Shou-Feng Township, Hualien County, 974, Taiwan, ROC e-mail:
[email protected] *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 295–306. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
296
C.-L. Lin
stores several years ago now purchase goods and services through the online shopping platforms. The trend of E-commerce and Internet service force retail service providers to change their operation model and sales channel to adopt an online shopping service. These retail service providers have begun operate online shopping malls and provide diversified online shopping service. Therefore, online shopping changes shopping habits of customers: (1) Customers can now purchase any products and services in online service platform. They can understand the product information through the electronic catalog, and finish the order process using a transaction system. Then they can chose payment style by credit card payment, cash on delivery payments, pick up payment, and receive the product via home delivery services or fixed-point pick up. (2) Customer searching costs can be reduced by a wide margin because of product and price information received from online shopping platforms. In order to increase their identification degree, some service operators limit product service and refund price differences. (3) The customer familiarity will gradually influence their royal behavior. If customers are already familiar with a certain shopping platform, their experience with this shopping platform will become a part of shopping experience. Any service transfer of shopping platform may become inconvenient to customers, except when the original experience is negative, in case of which customers move to another service platform that can offer better service quality. The remainder of this paper is organized as follows. Section 2 presents the customer needs and online shopping service system. We establish the development strategy of the online shopping platform service in Section 3. Section 4 presents the SIA-NRM analysis of online shopping service system. Section 5 presents the conclusion, which uses the empirical results to propose the improvement strategy of online shopping platform service providers.
2 The Service Evaluation System of Online Shopping Platform The service system evaluation becomes more and more important in current online shopping environment. To measure the service quality of e-commerce and internet service, two multi-items scale evaluation models, E-S-QUAL and E-RecS-QUAL, we used to measure the service quality of online shopping website (Parasuraman, et al., 2005) . E-S-QUAL evaluation system evaluates four aspects (efficiency, fulfillment, system availability, and privacy) of website service quality of regular work with twenty-two multi-item scales, and E-RecS-QUAL evaluates three aspects (responsiveness, compensation, and contact) using eleven multi-item scales of website service quality of no regular work (Parasuraman, et al., 2005). Some researchers suggest that existing e-service quality scales focus on goal-oriented service aspect and ignore hedonic quality aspects; therefore, some aspects are not included in evaluation model of e-service quality. Hence, the e-service quality evaluation model that integrated utilitarian and hedonic aspects was proposed, with the evaluation model of e-TransQual including all stages of the electronic service delivery process. The e-TransQual model determines five discriminant quality aspects (functionality/design, enjoyment, process, reliability, and responsiveness)
The Improvement Strategy of Online Shopping Service
297
based on exploratory and confirmatory factor analysis. A previous study found that enjoyment plays a key role in relationship duration and repurchase intention and drives customer lifetime value (Bauer, et al., 2006). A study of market segmentation of online service considered that online shopping databases that provide information on purchasing activity and demographic characteristics also help service operators understand the customers’ consumption attributes (internet usage and service satisfaction). This information also helps service operators establish good customer relations and refine service strategies to match customers’ needs. The current study proposes a soft clustering method based on latent mixed-class membership clustering approach and used customers’ purchasing data across categories to classify online customers. The soft clustering method can provide better results than hard clustering and better within-segment clustering quality compared to the finite mixture model (Wu and Chou, 2010). In order to understand the factors that influence the customers’ use of online shopping service, some studies evaluated three aspects (perceived characteristics of the web as a sale channel, online consumer characteristics, and website and product characteristics). The aspect of perceived characteristics of the web as a sale channel includes five criteria (perceived risk of online shopping, relative advantage of online shopping, online shopping experience, service quality and trust) while the aspect of online consumer characteristics also includes five criteria (consumer shopping orientations, consumer demographics, consumer computer/internet experience, consumer innovativeness, and social psychological variables). The aspect of website and product characteristics has two criteria (risk reduction measures and product characteristics).
3 Building the Service Improvement Model Based in SIA-NRM for Online Shopping Service The analytical process for expanding a firm’s marketing imagination capabilities is initiated by collecting the firm’s marketing imagination capabilities needed to develop a company as well as goals to be achieved after the firm’s enhancing marketing imagination capabilities using the Delphi method. Since any goals to be derived by the Delphi may impact each other, the structure of the MCDM problem will be derived using the DEMATEL. The weights of every goal are based on the structure derived by using the ANP. Finally, the firm’s marketing imagination capabilities expansion process will be based on a multiple objective programming approach based on the concept of minimum spanning tree by introducing marketing imagination capabilities/competences being derived by Delphi and weights corresponding to each objective being derived by ANP in the former stages. This section introduces the service improvement model based on SIA-NRM of online shopping. First, we need to define the critical decision problem of online shopping service and then identify the aspects/criteria that influence the service quality of online shopping service through literature review and expert interviews in the second stage. In the third stage, using SIA analysis, this study indicates that the aspects/ criteria that are still associated with low satisfaction and high importance
298
C.-L. Lin
are also linked to low service quality. The current study determines the relational structure of online shopping service system, and identifies the dominant aspects/criteria of the service system based on NRM analysis in the fourth stage. Finally, this study integrates the results of SIA analysis and NRM analysis to establish the improved strategy path and determine the effective service improvement strategy for online shopping service system. The analytic process includes five stages. (1) It clearly defines the critical decision problems of service system. (2) It establishes the aspects/ criteria of service system. (3) It measures the state of aspects/ criteria based on SIA analysis. (4) It measures the relational structure using network ration map (NRM). Finally, (5) it integrates the results of SIA analysis and NRM analysis to determine the service improvement strategy of service system. The analytic process uses the analytic techniques (SIA analysis, NRM analysis and SIA-NRM analysis) and five analytic stages as shown in Fig. 1.
Fig. 1 The analysis process of SIA-NRM
Fig. 2 The analysis map of satisfied and importance (SIA)
The Improvement Strategy of Online Shopping Service
299
Table 1 The analysis table of satisfied and importance (SIA) Aspects MS SS MI SI (SS, SI ) Searching & recommendation service (SR) 6.839 -0.687 7.839 -1.177 ▼ (-,-) Maintenance & transaction security (MS) 7.222 1.349 8.861 1.492 ○ (+,+) Design & service functions (DS) 7.106 0.734 8.435 0.380 ○ (+,+) Transaction cost & payment method (CP) 6.896 -0.384 8.146 -0.375 ▼ (-,-) Reputation & customer relationship (RR) 6.778 -1.011 8.167 -0.321 ▼ (-,-) Average 6.968 0.000 8.290 0.000 Standard deviation 0.188 1.000 0.383 1.000 Maximum 7.222 1.349 8.861 1.492 Minimum 6.778 -1.011 7.839 -1.177 Note1: ○(+,+) is the criteria of a high degree of satisfaction and a high degree of importance; ●(+,-) is the criteria of a high degree of satisfaction but a low degree of importance; ▼(-,-) is the criteria of a low degree of satisfaction and a low degree of importance; X (-, +) is the criteria of a low degree of satisfaction but a high degree of importance. Note2: MS, SS, MI, SI, stand for satisfaction value, standardized satisfaction value, importance value, and standardized importance value, respectively.
4 The Service Improvement Model Based in SIA-NRM for Online Shopping Service 4.1 The Analysis SIA (Satisfaction and Importance Analysis, SIA Analysis of the degree of importance and satisfaction of criteria is conducted and the surveyed data is normalized into equal measuring scales. According to the results of the surveyed data, we divided the criteria into four categories. The first category is a high degree of satisfaction with a high degree of importance marked with the symbol ○ (+,+). The second category is a high degree of satisfaction with a low degree of importance marked with the symbol ●(+,-). The third category is a low degree of satisfaction with a low degree of importance marked with the symbol ▼( -,-). The fourth category is a low degree of satisfaction with a high degree of importance marked with the symbol X(-, +). In this study, the analysis of SIA (Satisfied Importance Analysis) is proposed as follows: The first step is to improve those aspects (i.e., SR, CP and RR) falling into the third category [ ▲ (-,-)]. The fourth category criteria [X (-, +)] are key factors that affect the whole satisfaction degree of service platform for online shopping service. For the third category criteria [ ▲ (-,-)], a higher degree of importance would affect the whole satisfaction degree of service platform for online shopping in the short run as shown in Fig. 2 and Table 1.
4.2 The NRM Analysis Based in DEMATEL Technique DEMATEL is used to construct the structure of the network relationships map (NRM) of the shopping platform. When users are making-decisions in using shopping platforms, there are many criteria they may consider. The most common problem they face is that those criteria have impacts on one another. Therefore, before making improvements on criteria, it is necessary to know the basic criteria
300
C.-L. Lin
and then make effective improvements to enhance overall satisfaction. When a decision-maker needs to improve a lot of criteria, the best way to handle this is to determine the criteria which impact others most and improve them. It has been widely adopted for complicated problems. In the early stages, it was used on user interface of monitoring system (Hori and Shimizu, 1999), and failure sorting on system failure analysis (Seyed-Hosseini, et al., 2006). In recent years, DEMATEL has drawn lots of attention on decision and management domains. Some recent studies considered the DEMATEL techniques for solving complex studies, such as developing global managers’ competencies (Wu and Lee, 2007) and the evaluation system of vehicle telematics (Lin, et al., 2010). (1) Calculation of the original average matrix Respondents were asked to indicate the perceived influence of each aspect on other aspects on a scale ranging from 0 to 4, with “0” indicating no influence whereas “4” indicating extremely strong influence between aspect/criterion. Scores of “1”, “2”, and “3” indicate “low influence”, “medium influence”, and “high influence”, respectively. As the data Table 2 shows, the influence of “searching & recommendation service (SR)” on “design & service functions (DS)” is 2.889, which indicates a “medium influence”. On the other hand, the influence of “design & service functions (DS)” on “maintenance & transaction security (MS)” is 3.083, which indicates “high influence”. Table 2 Original average matrix (A) Aspects Searching & recommendation service (SR) Maintenance & transaction security (MS) Design & service functions (DS) Transaction cost & payment method (CP) Reputation & customer relationship (RR) Total
SR 0.000 2.444 2.889 2.556 2.722 10.611
MS 2.667 0.000 3.083 2.361 2.194 10.306
DS 2.889 2.583 0.000 2.556 2.556 10.583
CP 2.750 2.889 2.556 0.000 2.278 10.472
RR 2.722 2.806 2.833 2.222 0.000 10.583
Total 11.028 10.722 11.361 9.694 9.750 -
(2) Calculation of the direct influence matrix From Table 2, we processed the “Original average matrix” (A) using Equation (1) and (2) and obtained the “Direct influence matrix” (D). As shown in Table 3, the diagonal items of D are all 0 and the sum of a row is 1 in most cases. We then obtained Table 4 by adding up rows and columns. In Table 4, the sum of row and column for “design & service functions (DS)” is 1.932, which is the most important influence aspect. On the other hand, the sum of row and column for “transaction cost & payment method (CP)” is 1.775, which is the least important influence aspect. (1) D = sA, s > 0 where n
n
s = min [1/ max ∑ aij ,1/ max ∑ aij ], i, j = 1, 2,..., n i, j
1≤ i ≤ n
j =1
1≤ j ≤ n
i =1
(2)
The Improvement Strategy of Online Shopping Service and lim D m = [0]n× n , where D = [ xij ] n× n , when 0 < m →∞
n
∑x
least one
j =1
ij
301 n
∑x j =1
ij
n
≤ 1 or 0 < ∑ xij ≤ 1 , and at i =1
n
or
∑x i =1
ij
equal one, but not all. Therefore, we can guaran-
tee lim D m = [0]n× n . m →∞
Table 3 The direct influence matrix D Aspects Searching & recommendation service (SR) Maintenance & transaction security (MS) Design & service functions (DS) Transaction cost & payment method (CP) Reputation & customer relationship (RR)
SR 0.000 0.215 0.254 0.225 0.240 0.934
MS 0.235 0.000 0.271 0.208 0.193 0.907
DS 0.254 0.227 0.000 0.225 0.225 0.932
CP 0.242 0.254 0.225 0.000 0.200 0.922
RR 0.240 0.247 0.249 0.196 0.000 0.932
Total 0.971 0.944 1.000 0.853 0.858 -
Table 4 The degree of direct influence Aspects
Sum of row
Sum of column
Searching & recommendation service (SR) Maintenance & transaction security (MS) Design & service functions (DS) Transaction cost & payment method (CP) Reputation & customer relationship (RR)
0.971 0.944 1.000 0.853 0.858
0.934 0.907 0.932 0.922 0.932
Sum of row and column 1.905 1.851 1.932 1.775 1.790
Importance of Influence 2 3 1 5 4
(3) Calculation of the indirect influence matrix The indirect influence matrix can be derived from Equation (3) as shown in Table 5. ∞
ID = ∑ D i = D 2 ( I − D) −1
(3)
i=2
Table 5 The indirect influence matrix Aspects Searching & recommendation service (SR) Maintenance & transaction security (MS) Design & service functions (DS) Transaction cost & payment method (CP) Reputation & customer relationship (RR) Total
SR 2.433 2.331 2.440 2.153 2.157 11.513
MS 2.331 2.311 2.375 2.107 2.120 11.244
DS 2.375 2.320 2.484 2.147 2.156 11.481
CP 2.356 2.288 2.420 2.168 2.143 11.375
RR 2.381 2.314 2.437 2.158 2.199 11.489
Total 11.876 11.564 12.156 10.732 10.774 -
(4) Calculation of the full influence matrix Full influence matrix T can be derived from Equation (4) or (5). Table 6 shows the full influence matrix T consisting of multiple elements, as indicated in
302
C.-L. Lin
Equation (6). The sum vector of row value is {d i } and the sum vector of column value is {ri} ; then, the sum vector of row value plus column value is {di + ri}, which indicates the full influence of the matrix T. As the sum of row value plus column value {di + ri} is higher, the correlation of the dimension or criterion is stronger. The sum of row value minus column value is {di - ri}, which indicates the net influence relationship. If di - ri > 0, it means the degree of influence exerted on others is stronger compared to influence from others. As shown in Table 7, the design & service functions (DS) has the highest degree of full influence ( d3 + r3 =25.567) as well as the highest degree of net influence ( d3 − r3 =0.743). The order of other net influences is as follows: the searching & recommendation Service (SR) ( d1 − r1 = 0.400), the maintenance & transaction security (MS) ( d 2 − r2 = 0.357), the transaction cost & Payment method (CP) ( d 4 − r4 = -0.711), and finally, the reputation & customer relationship (RR) ( d5 − r5 = -0.789). Table 6 The full influence matrix (T) Aspects Searching & recommendation service (SR) Maintenance & transaction security (MS) Design & service functions (DS) Transaction cost & payment method (CP) Reputation & customer relationship (RR)
SR 2.433 2.546 2.694 2.378 2.397 12.447
MS 2.566 2.311 2.646 2.315 2.313 12.151
DS 2.629 2.547 2.484 2.372 2.381 12.412
CP 2.598 2.542 2.645 2.168 2.343 12.296
RR 2.621 2.561 2.686 2.354 2.199 12.421
Total 12.847 12.507 13.155 11.586 11.632 -
Table 7 The degree of full influence Aspects Searching & recommendation service (SR) Maintenance & transaction security (MS) Design & service functions (DS) Transaction cost & payment method (CP) Reputation & customer relationship (RR)
T = D + ID =
{d } 12.847 12.507 13.155 11.586 11.632
{r } 12.447 12.151 12.412 12.296 12.421
{d +r } 25.294 24.658 25.567 23.882 24.053
{d −r } 0.400 0.357 0.743 -0.711 -0.789
∞
∑D
i
(4)
i =1
∞
T = ∑ D = D ( I − D) i
−1
i =1
T = [tij ],
i, j ∈ {1, 2,..., n}
(5) (6)
n
d = d n×1 = [∑ tij ]n×1 = (d1 ,..., d i ,..., d n )
(7)
j =1
n
r = rn×1 = [ ∑ tij ]1′× n = (r1 ,..., r j ,..., rn ) i =1
(8)
The Improvement Strategy of Online Shopping Service
303
(5) The analysis of the NRM (network relation map) Experts were invited to discuss the relationships and influence levels of criteria under the same aspects/criteria and to score the relationship and influence among criteria based on the DEMATEL technique. Aspects/criteria are divided into different types so the experts could answer the questionnaire in areas/fields with which they were familiar. The net full influence matrix, C net , is determined by Equation (9). C net = [tij − t ji ],
i , j ∈ {1, 2,..., n}
(9)
The diagonal items of the matrix are all 0. In other words, the matrix contains a strictly upper triangular matrix and a strictly lower triangular matrix. Moreover, while values of the strictly upper and strictly lower triangular matrix are the same, their symbols are opposite. This property helps us; we only have to choose one of the strictly triangular matrices. Table 6 shows the full influence matrix. The Equation (9) can produce the net full influence matrix, as shown in Table 8. Using the values of ( d + r ) and ( d − r ) in Table 7 as X value and Y value, respectively, the network relation map (NRM) can be drawn as in Fig.3. Fig.3 shows that DS (design & service functions) aspect is the major dimension of net influence while RR (reputation & customer relationship) aspect is the major dimension being influenced. DS (design & service functions) aspect is the dimension with the highest full influence while the CP (transaction cost & payment method) is the one with the smallest full influence aspect.
4.3 The Analysis of SIA-NRM Approach The analysis processes of SIA-NRM include two stages, the first stage involves the satisfied importance analysis (SIA) and the second stage involves the analysis of network ration map (NRM). The SIA analysis determines the satisfaction and importance degree of aspects/criteria for online shopping service platforms; the SIA analysis can help decision marking find criteria that should improved while the standard satisfied degree is less than the average satisfied degree. The three improvement strategies are listed in the Table 9. The improvement strategy A (which requires no further improvement) can be applied to the aspects of MS (maintenance & transaction security) and DS (design & service functions) (SS > 0). The improvement strategy B (which requires direct improvements) should be applied to SR (searching & recommendation service) (SR). The improvement strategy C (which requires indirect improvements) can be applied to the aspect of CP (transaction cost & payment method) and RR (reputation & customer relationship). The SIA-NRM approach can determine the criteria that should be improved based on SIA analysis and the improvement path using network ration map (NRM). As shown in Fig. 4, we can determine what aspects of SR, CP, and RR should be improved. The DS is the aspect with major net influence. Therefore, we
304
C.-L. Lin
can improve the SR aspect with the aspect of DS and improve the CP aspect with the aspects of DS, MS, and SR. The RR (reputation & customer relationship) aspect is the major dimension being influenced; therefore, the aspect of RR can be improved when other four aspects (DS, MS, SR, and CP) are improved, as shown in Table 9 and Fig. 4.
5 06 0.
73 5 0.2 .30 0 25 0.2
Fig. 3 The NRM of online shopping service ( d + r / d − r ) Table 8 The net influence matrix of online shopping service Aspect Searching & recommendation service (SR) Maintenance & transaction security (MS) Design & service functions (DS) Transaction cost & payment method (CP) Reputation & customer relationship (RR)
SR -0.020 0.065 -0.221 -0.225
MS
DS
CP
RR
0.099 -0.228 -0.248
-0.273 -0.305
-0.011
-
Table 9 The improvement strategy of aspects for online shopping service Aspects Searching & recommendation service (SR) Maintenance & transaction security (MS) Design & service functions (DS) Transaction cost & payment method (CP) Reputation & customer relationship (RR)
SS -0.687 1.349 0.734 -0.384 -1.011
SIA SI -1.177 1.492 0.380 -0.375 -0.321
(SS,SI ) ź (-,-) ż(+,+) ż(+,+) ź (-,-) ź (-,-)
d+r 25.294 24.658 25.567 23.882 24.053
NRM d-r 0.400 0.357 0.743 -0.711 -0.789
(R D) D (+,+) D (+,+) D (+,+) ID (+,-) ID (+,-)
Strategies B A A C C
Notes: The improvement strategies include three types: Improvement strategy A (which requires no further improvement), Improvement strategy B (which requires direct improvements) and improvement strategy C (which requires indirect improvements).
The Improvement Strategy of Online Shopping Service
305
Fig. 4 The SIA-NRM analysis for online shopping service
5 Conclusions Considering that the amount of internet transactions and frequency of internet transactions increase continually, the complete certification process becomes increasingly more important for online shoppers. The complete certification process can reduce the incidence of account been hacked by unknown subject and assures that users’ personal information and online shopping data would not be jeopardize. Additionally, some online shopping service providers cannot satisfy customers’ consulting and appeal service needs. Some customers who purchase products online consider that the quality and utility of online shopping products cannot conform to their anticipation. They usually hope that the online shopping service providers can listen to their complaints and help them exchange or return products without additional fees. However, customers cannot always exchange a product or receive a reimbursement, and if they can, the process of reimbursing is often quite inconvenient. Customers then complain and their satisfaction with and royalty to the provider decrease continually. Consequently, the customers adjust their decision and move to a better service online shopping provider. Therefore, the online shopping service providers need to deliberate whether their services can satisfy customers’ needs and reduce the number of complaints.
References 1. Bauer, H.H., Falk, T., Hammerschmidt, M.: eTransQual: A transaction process-based approach for capturing service quality in online shopping. Journal of Business Research 59(7), 866–875 (2006) 2. Hori, S., Shimizu, Y.: Designing methods of human interface for supervisory control systems. Control Engineering Practice 7(11), 1413–1419 (1999) 3. Lin, C.L., Hsieh, M.S., Tzeng, G.H.: Evaluating vehicle telematics system by using a novel MCDM techniques with dependence and feedback. Expert Systems with Applications 37(10), 6723–6736 (2010)
306
C.-L. Lin
4. Parasuraman, A., Valarie, A.Z., Arvind, M.: E-S-QUAL: A multiple-Item scale for assessing electronic service quality. Journal of Service Research 7(3), 213–233 (2005) 5. Seyed-Hosseini, S.M., Safaei, N., Asgharpour, M.J.: Reprioritization of failures in a system failure mode and effects analysis by decision making trial and evaluation laboratory technique. Reliability Engineering & System Safety 91(8), 872–881 (2006) 6. Wu, R.S., Chou, P.H.: Customer segmentation of multiple category data in e-commerce using a soft-clustering approach. Electronic Commerce Research and Applications (2010) (in Press, Corrected Proof) 7. Wu, W.W., Lee, Y.T.: Developing global managers’ competencies using the fuzzy DEMATEL method. Expert Systems with Applications 32(2), 499–507 (2007)
The Optimization Decisions of the Decentralized Supply Chain under the Additive Demand Peng Ma and Haiyan Wang**
Abstract. The paper considers a two-stage decentralized supply chain which consists of a supplier and a retailer. Some optimization decisions are studied when the market demand is linear and additive. The optimization decisions include who should decide the production quantity, the supplier or retailer? What is the production-pricing decision? et al. The retailer’s share of channel cost is defined as the ratio of the retailer's unit cost to the supply chain’s total unit cost. The relationship between the supply chain’s total profit and the retailer’s share of channel cost is established. The production-pricing decisions are obtained with the case that the supplier and retailer decide the production quantity, respectively. The results show that the total profit of decentralized supply chain is always less than that of the centralized supply chain, independent of the retailer’s share of channel cost. If the retailer’s share of channel cost is between and , the supplier should decide the production quantity. If the retailer’s share of channel cost is between and , the supplier should decide the production quantity. Keyworks: decentralized supply chain, additive demand, production-pricing decision, optimal channel profit, consignment contract.
1
Introduction
There are two kinds of optimization decisions in the decentralized supply chain. One is who should decide the production quantity, the supplier or retailer? The other is to make the production-pricing decisions which maximize their individual profit.
Peng Ma · Haiyan Wang Institute of Systems Engineering, School of Economics and Management, Southeast University, Nanjing, Jiangsu 210096, P.R. China e-mail:
[email protected],
[email protected] * Corresponding author. J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 307–317. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
308
P. Ma and H. Wang
There is a modification of Vendor Managed Inventory (VMI) in which the supplier makes inventory decisions and owns the goods until they are sold, which is called Consignment Vendor Managed Inventory (CVMI) (Lee and Chu 2005; Ru and Wang 2010). CVMI is used by many retailers, such as Wal-Mart, Ahold USA, Target, and Meijier Stores. There is also a modification of Retailer Managed Inventory (RMI) in which the retailer decides about how much inventory to hold in a period, which is labeled as Consignment Retailer Managed Inventory (CRMI) (Ru and Wang 2010). Ru and Wang (2010) consider a two-stage supply chain with a supplier and a retailer when market demand for the product is dependent on multiplicative functional form. They find that the supplier and retailer have equal profit under both CVMI and CRMI program. They also find that it is beneficial for both the supplier and the retailer if the supplier makes the inventory decision in the channel. Generally, production-pricing decision in supply chain has been extensively studied in the literature. Petruzzi and Dada (1999) provide a review and extend such problems to single-period newsvendor setting. Several researchers consider this problem under multi-period setting (Federgruen and Heching 1999; Chen and Simchi-Levi 2004a, 2004b). In the extended newsvendor setting of decentralized decision-making, some researchers consider a setting where a supplier wholesales a product to a retailer who makes pricing-procurement decisions (Emmons and Gilbert 1998; Granot and Yin 2005; Song et al. 2006). These papers explore how the supplier can improve the channel performance by using an inventory-return policy for items overstocked by the assembler. Granot and Yin (2007) study the price-dependent newsvendor model in which a manufacturer sells a product to an independent retailer facing uncertain demand and the retail price is endogenously determined by the retailer. Granot and Yin (2008) analyze the effect of price and order postponement in a decentralized newsvendor model with multiplicative and price-dependent demand. Huang and Huanget (2010) study price coordination problem in a three-stage supply chain composed of a supplier, a manufacturer and a retailer. There are other recent papers related to the joint production-pricing decisions of decentralized supply chains structure as follows: Bernstein and Federgruen (2005) investigate the equilibrium behavior of decentralized supply chains with competing retailers under demand uncertainty. Wang et al. (2004) consider a supply chain structure where a retailer offers a consignment-sales contract with revenue sharing to a supplier, who then makes production-pricing decisions. Wang (2006) considers n suppliers each producing a different product and selling it by a common retailer to the market. Ray et al (2005) study a serial two-stage supply chain selling a procure-to-stock product in a price-sensitive market. Zhao and Atkins (2008) extends the theory of n competitive newsvendors to the case where competition occurs simultaneously in price and inventory. The purpose of this paper is to investigate the optimization decisions under decentralized supply chain with the additive demand, especially shed light on the problem that who should decide the production quantity in the supply chain. A game-theoretic model is built to capture the interactions between the supplier and
The Optimization Decisions of the Decentralized Supply Chain
309
retailer when different members manage the supply chain inventory. Our key contribution in this paper is to investigate the effect of the retailer’s share of channel cost which is defined as the ratio of the retailer's unit cost to the supply chain’s total unit cost on the supply chain total profit. We also get some different results with Ru and Wang (2010). The paper proceeds as follows. Section 2 details the model assumptions. Section 3 derives the centralized decisions. Section 4 considers the decentralized supply chain where the supplier decides the production quantity. Section 5 considers the decentralized supply chain where the retailer decides the production quantity. Section 6 gives the result and managerial meaning. Section 7 concludes the paper.
2
Model Assumptions
We assume that the supplier’s unit production cost is c , and the retailer’s unit c as the total unit cost for the holding and selling cost is c . We define c c channel, and α c ⁄c as the retailer’s share of channel cost. Supposing the market demand D p is linear and additive, i.e. D p
a
bp
ε
(1)
where p is the retail price, and ε is a random variable which is supported on A, B with B A 0. Let F . and f . be its cumulative density function and probability density function, respectively. Further, we assume that a bc A 0, and F A 0, F B 1. We consider that the supplier produces Q units of the product and delivers them to the retailer, and the retailer sells them to the market at the retail price p. We use some contractual arrangement to specify who makes what decisions. Two different contractual arrangements, namely production quantity decided by supplier (PQDS) and production quantity decided by retailer (PQDR) are considered. Under PQDS, the decisions are made as follows: firstly, the supplier chooses the consignment price w and production quantity Q and delivers them to the retailer. Secondly, the retailer decides the retail price p. On the contrary, Under PQDR, the decisions are made as follows: first, the supplier specifies the consignment price w. Second, the retailer decides production quantity Q for the supplier to deliver and the retail price p.
3
Centralized Decision
We first characterize the optimal solution of the centralized supply chain, in which the retail price p and the production quantity Q are simultaneously chosen by a decision-maker. The expected channel profit can be written as Π p, Q
pE min D, Q
cQ
pE min a
bp
ε, Q
cQ
(2)
Following Petruzzi and Dada (1999), we regarded z Q a bp as a stocking factor. The problem of choosing the retail price p and the production
310
P. Ma and H. Wang
quantity Q is equivalent to choosing the retail price p and the stocking factor z. Substituting Q z a bp into (2), then we can rewrite the above profit function as Π p, z
p
c a
bp
Λ z
z
Theorem 1. For any given stocking factor is given by
pz
Λ z
cz
(3)
x f x dx
(4)
, the unique optimal retail price (5)
and, if the probability distribution function . satisfies the property of increasthat maximizes ing failure fate (IFR), the optimal stocking factor , is uniquely determined by (6) Proof. First, for any given z, A ,
Π
Since
Π
a
,
z
2bp
bc
,
Π
0,
B, we have z
,
Π
Λ z ,
2b Λ
0 implies that p z
0
, which is (5),
and p z is the unique maximum of Π p, z . Next we characterize p z , which maximizes Π p z , z . By the chain rule, we have Π
,
Π
Π
,
,
,
Π
.
Λ
1
Λ
, which is (6). z
F z
c
g z Since
0 implies that F z
Λ
always exists in the support interval A, B of F . , because g z is continuous, and g A 0, g B c 0. To verify the uniqueness of z , we have g′ z g" z
1
F z
3h z 1
a F z
g z h z
bc
z
a
bc
2h z 1
Λ z h z z F z
Λ z
h z a
bc
h z z
Λ z h z where h z
is define as the failure rate of the demand distribution.
Now, if h′ z 0, then g" z 0 at g′ z 0, implying that g z itself is a unimodal function. In conjunction with g A 0 and g B 0, it guarantees the uniqueness of z and A z B. This completes the proof of Theorem 1. □ Increasing failure rate, i.e., h z
being increasing in z, as required by
Theorem 1, is a relatively weak condition satisfied by most commonly used
The Optimization Decisions of the Decentralized Supply Chain
311
probability distributions like Normal, Uniform and exponential, etc. Substituting (5) and (6) into (3), we derive the optimal channel profit as Π
4
z
Λ z
cz
(7)
Decentralized Channel Decisions under PQDS
Under PQDS, the supplier chooses the consignment price w and stocking factor z at the first stage, and then the retailer chooses the retail price p at the second stage. Using a backward-induction procedure similar as Ru and Wang (2010), the retailer’s expected profit can be written as Π
p w, z
,
p p
w E min D, Q cαQ w E min a bp ε, Q p w cα a bp p
cαQ w z Λ z
cαz
Theorem 2. For any given stocking factor and consignment price unique optimal retail price , is given by
(8) 0, the
,
(9)
Proof. We take the partial derivative of (8) with respect to p as Π Π
So
Π
Since Π
,
,
, ,
,
,
a
bw
0 implies that p ,
2b
bcα
z
Λ z α
w, z
0 , so p
2bp
w, z
Λ
, which is (9).
is the unique maximizer of
p w, Q . This completes the proof of Theorem 2.
□
The supplier’s profit function is given by Π
w, Q
,
wE min D, Q
After substituting D we have
a
bp
ε, Q
z
w, z
c
cα a
bp
wz
Π
,
w w
c
cα a bw Λ z wz
bcα
c 1 a
bp Λ z
z Λ z c 1 α z
α Q
(10)
and (9) into (10), c 1
α z (11)
Theorem 3. For any given stocking factor , the supplier’s unique optimal consignment price z is given by z
(12)
and, if the demand distribution satisfies increasing failure rate (IFR), the optimal stocking factor that maximizes , is uniquely determined by ,
312
P. Ma and H. Wang
(13) □
Proof. It is similar to the proof of Theorem 1.
Substituting (12) into (9), then the retail price chosen by the retailer in equilibrium is given as p
z
(14)
Substituting (12), (14) into (8) and (11) respectively, then we obtain their optimal expected profits of the retailer and supplier respectively as Π
z
,
Π , where the optimal stocking factor z profit of supply chain is α
Π
Π
Π
,
Λ z
cαz
(15)
z Λ z c 1 α z (16) is determined by (13). So the total channel z
,
Λ z
cz (17)
In conclusion, for the decentralized channel under PQDS, the supplier chooses the consignment price and stocking factor according to (12) and (13), respectively. Then the retailer chooses the retail price as given by (14).
5
Decentralized Channel Decisions under PQDR
Under PQDR, the supplier chooses the consignment price w at the first stage, and then the retailer chooses the stocking factor z and retail price p to maximize his own expected profit which is calculated as Π
,
p, z w p
p w E min D, Q cαQ w E min a bp ε, Q cαQ p w cα a bp p w z
Λ z
cαz
(18)
Theorem 4. For any given stocking factor z, the unique optimal retail price is given by (19) and, if the probability distribution function ing fate (IFR), the optimal stocking factor is uniquely determined by
. satisfies the property of increasthat maximizes , , (20) □
Proof. It is similar to the proof of Theorem 1. The supplier’s profit function is given by Π
,
w
wE min D, Q
c 1
α Q
(21)
The Optimization Decisions of the Decentralized Supply Chain
ε, Q Substituting D a bp we have Π , w w c cα a bp
z
a
wz
313
bp , and (19), (20) into (21), Λ z
c 1
w c cα a bcα wb z Λ z c 1 α z Theorem 5. The supplier’s unique optimal consignment price
α z w
c 1
α (22)
is given by (23)
Proof. Because z chosen by the retailer at the second stage does not depend on the consignment price w set by the supplier at the first stage, the derivative of Π , w with respect to w can be simplified as , ,
So,
,
,
b
0 implies that w
0
⇔ which is (23). □
This completes the proof of Theorem 5. Substituting (23) into (19) and (20), then we have p
z
(24)
F z
(25)
Substituting (23) and (24) into (18), we can derive the retailer’s optimal expected profit as Π
z
,
Λ z
cαz
(26)
where the optimal stocking factor z is determined by (25). Substituting (23) and (24) into (22), we get the supplier’s optimal expected profit as Π
z
,
Λ z
c 1
α z
(27)
Therefore the total channel profit is Π
α
Π
,
Π
,
z
Λ z
cz
(28)
Note that although the expression of the optimal retail price here in (24) is the same as that of (14) under PQDS, but the size relationship between the equilibrium stocking factor z under PQDR and the equilibrium stocking factor z under PQDS is dependent on parameter α.
314
P. Ma and H. Wang
In conclusion,o for the decentralized channel under PQDR, the supplier chooses the consignment price according to (23). Then the retailer chooses the retail price and stocking factor as given by (24) and (25), respectively.
6
Results and Managerial Meaning
The expected profit of the channel depends on the choice for the retail price p and stocking factor z. In this section, we focus on the difference of total profit between the centralized channel and the decentralized channel under PQDS and PQDR, respectively. We also derive the difference of total profit between the decentralized channel under PQDS and PQDR. Proposition 1. Under PQDS, if 0
is increasing with
;
if
;
, and 1.
if
Proof . From (6) and (13), we can deduce that F z
F z It is obvious that F z ing with α . If α ; z 0 Proposition 1.
is increasing with α, so z
F z , then F z
α
z
F z
0, z
z is increasα
z if
1 . This completes the proof of □
if
is increasing with
Proposition 2. Under PQDS,
α
z , so z
, and
.
Proof. From (5) and (14), we derive that p
α
p
Combining (6) and (13), we can deduce that F z is increasing with α, then p so p
α
α
p
Proposition 3. Under PQDR,
p is increasing with α. When α
,z
. This completes the proof of Proposition 2. is decreasing with
if 0 ; if ; Proof. It is similar to the proof of Proposition 1. Proposition 4. Under PQDR,
is increasing with α, thus z □
, and 1.
if
is decreasing with
z ,
□ , and .
Proof. It is similar to the proof of Proposition 2.
□
The Optimization Decisions of the Decentralized Supply Chain
315
Proposition 1 and 3 indicate that: When the supplier’s unit production cost c is more than the retailer’s unit holding and selling cost c , the stocking factor under PQDS is less than that of centralized supply chain, while the stocking factor under centralized supply chain is less than that of the decentralized supply chain under PQDR, vice versa. When the supplier's unit production cost c is equal to the retailer's unit holding and selling cost c , the stocking factor of the decentralized supply chain under PQDS and PQDR is equal to that of the centralized supply chain. Proposition 2 and 4 indicate that: When the supplier's unit production cost c is equal to the retailer's unit holding and selling cost c , the retail price of the decentralized supply chain under PQDS is equal to that under PQDR. Therefore, the production quantity Q of the decentralized supply chain under PQDS is equal to that under PQDR. Proposition 5. Comparing centralized and decentralized supply chain, we have the following results: (i) Under PQDS, with
1. For all 0
if
if 0
is increasing with
and decreasing
1, we have 0.
(ii) Under PQDR, with
1. For all 0
if
if 0
is increasing with
and decreasing
1, we have 0.
Proof. (i) Define z
H z z
G z
Λ z Λ z
cz, cz,
and Π G z . Under PQDS, It is then Π α H z , Π α H z similar to the proof of Theorem 1, there is an unique z and A z B, such 0 in A, z and H z 0 in [z , B]. that H z F z , and we can derive that 0 , so Π α First, we make F z is increasing with α if 0
. Next, we make F z
F z , we can de-
1, so Π α is increasing with α if 1. rive that From the proof of Theorem 1, we know that G z derives maximum at z , then G z G z , so for all 0 1, we have Π
Π
α
Π
Π
G z z
(ii) It is similar to the proof of (i).
H z
z
Λ z
0.
H z □
316
P. Ma and H. Wang
Proposition 6. Comparing decentralized supply chain under PQDS and PQDR, we have the following results: (i)
if
;
(ii)
if
.
Proof. Let H z (i) We know that H z osition 5. We make F z z z Π .
z
if
0 in A, z F z
z Λ z cz. and H z 0 in [z , B] from Prop-
F z
, then we derive
, then we can get H z
H z
. So
, namely Π
F z F z , we have , then we can get (ii) Next, Let F z H z H z , namely Π Π . This completes the proof of Proposition 6. □ Proposition 5 and 6 indicate that: The total profit of decentralized supply chain is always less than that of the centralized supply chain, independent of the retailer’s share of channel cost. The total profit of the decentralized supply chain not only depends on the retailer’s share of channel cost, but also depends on the decision strategy. When the supplier decides the production quantity, the total profit of the . When the retailer decides the production supply chain reaches maximum at α quantity, the total profit of the supply chain reaches maximum at α . The total profit of supply chain under the supplier deciding the production quantity is more . The total than that of the retailer deciding the production quantity if profit of supply chain under the retailer deciding the production quantity is more . than that of the supplier deciding the production quantity if
7
Conclusions
In this paper, two optimization decisions of the decentralized supply chain under the additive demand are considered. The first optimization decision is productionpricing decision. We derive the production-pricing decisions when the supplier and retailer decide the production quantity, respectively. The other optimization decision is to choose the supplier or retailer to decide the production quantity. We consider the effect of the retailer’s share of channel cost on the two decisions which the supplier or retailer decides the production quantity. We derive the relationship between the total profit of supply chain and the retailer’s share of channel cost. By comparing the results of the centralized supply chain, we find that whether the supplier or retailer decides the production quantity is dependent on the retailer’s share of channel cost. Firstly, the total profit of decentralized supply chain is always less than that of the centralized supply chain, independent of the retailer’s share of channel cost. Secondly, the supplier should decide the production quantity if the retailer’s share of channel cost is between and . The retailer
The Optimization Decisions of the Decentralized Supply Chain
317
should decide the production quantity when the retailer’s share of channel cost is between and . The results have significant implication on the supply chain management. For the future research, one of our interests is to extend our model into a multiperiod setting. We also intend to consider how to control inventory in the supply chain when the retailer is subject to both supply uncertainty and random demand.
References 1. Bernstein, F., Federgruen, A.: Decentralized supply chains with competing retailers under demand uncertainty. Manag. Sci. 51, 18–29 (2005) 2. Chen, X., Simchi-Levi, D.: Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: The finite horizon case. Oper. Res. 52, 887–896 (2004a) 3. Chen, X., Simchi-Levi, D.: Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: The infinite horizon case. Math. Oper. Res. 29, 698–723 (2004b) 4. Emmons, H., Gilbert, S.: The role of returns policies in pricing and inventory decisions for catalogue goods. Manage. Sci. 44, 276–283 (1998) 5. Federgruen, A., Heching, A.: Combined pricing and inventory control under uncertainty. Oper. Res. 47, 454–475 (1999) 6. Granot, D., Yin, S.: On the effectiveness of returns policies in the price-dependent newsvendor model. Nav. Res. Logist. 52, 765–779 (2005) 7. Granot, D., Yin, S.: On sequential commitment in the price-dependent newsvendor model. Eur. J. Oper. Res. 177, 939–968 (2007) 8. Granot, D., Yin, S.: Price and order postponement in a decentralized newsvendor model with multiplicative and price-dependent demand. Oper. Res. 56, 121–139 (2008) 9. Huang, Y., Huang, G.: Price coordination in a three-level supply chain with different channel structures using game-theoretic approach. Int. Soc. Manag. Sci. 5, 83–94 (2010) 10. Lee, C., Chu, W.: Who should control inventory in a supply chain? Eur. J. Oper. Res. 164, 158–172 (2005) 11. Petruzzi, N., Dada, M.: Pricing and the newsvendor problem: A review with extensions. Oper. Res. 47, 184–194 (1999) 12. Ray, S., Li, S., Song, Y.: Tailored supply chain decision-making under price-sensitive stochastic demand and delivery uncertainty. Manag. Sci. 51, 1873–1891 (2005) 13. Ru, J., Wang, Y.: Consignment contracting: Who should control inventory in the supply chain? Eur. J. Operl Res. 201, 760–769 (2010) 14. Song, Y., Ray, S., Li, S.: Structural properties of buy-back contracts for price-setting newsvendors. Manuf. Serv. Oper. Manag. 10, 1–18 (2006) 15. Wang, Y., Jiang, L., Shen, Z.: Channel performance under consignment contract with revenue sharing. Manag. Sci. 50, 34–47 (2004) 16. Wang, Y.: Joint pricing-production decisions in supply chains of complementary products with uncertain demand. Oper. Res. 54, 1110–1127 (2006) 17. Zhao, X., Atkins, D.: Newsvendors under simultaneous price and inventory competition. Manuf. Serv. Oper. Manag. 10, 539–546 (2008)
The Relationship between Dominant AHP/CCM and ANP Eizo Kinoshita and Shin Sugiura
*
Abstract. This paper demonstrates that the equation, Dominant AHP (analytic hierarchy process)+ANP(analytic network process)=Dominant AHP, is valid, when the weights of criteria relative to the evaluation of dominant alternative is chosen as the basis for the evaluation of Dominant AHP. It also substantiates that the equation, CCM(concurrent convergence method)+ANP=CCM, is valid by applying the same approach.
1 Introduction The paper consists of six chapters. Chapter 2 explains AHP/ANP, proposed by Saaty. Chapter 3 and 4 explain Dominant AHP and CCM, proposed by Kinoshita and Nakanishi [4], [5]. Chapter 5 describes mathematically that Dominant AHP equates with ANP, when the weights of criteria relative to the evaluation of dominant alternative is chosen as the basis for the evaluation of Dominant AHP, and proves the equation, Dominant AHP+ANP=Dominant AHP, as valid. The same chapter also shows that the equation, CCM+ANP=CCM, as valid by applying the same approach. And Chapter 6, a conclusion.
2 AHP/ANP [1] [2] AHP, proposed by Saaty, can be summarized as follows: For a set of alternatives A1, …, An, several evaluation criteria C1, …, Cm have been defined and validities ν1i, …, νni of all alternatives will be evaluated using criteria Ci. On the other hand, significances e1, …, em of C1, …, Cm are determined based on the ultimate goal G to give the aggregate score Ej of alternative Aj. Ej = νj1 e1 + νj2 e2 + … + νjm em
(1)
Eizo Kinoshita · Shin Sugiura Meijo University, Kani, Japan e-mail:
[email protected],
[email protected] *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 319–328. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
320
E. Kinoshita and S. Sugiura
In addition, applying the ANP method proposed by Saaty (though it is also applicable to a wider range of problems than typical AHP as stated above) to the above AHP will change the perspective of significances of criteria to the ones viewed from each alternative rather than viewing from the perspective of the ultimate goal as it is the case with AHP. That is to say, ANP has a structure of mutual evaluation, in which an evaluation criteria Ci determines the validity νji of alternative Aj while Aj determines the significance eij of Ci at the same time, and thus contains a feedback structure. For this reason, the solution requires an operation of solving a kind of equation involving a supermatrix as a coefficient matrix, rather than simple multiplication of weights and addition as in the case of AHP. Using the solution of ANP, if the graph structure is such that any node in the graph can be reached from any other node by tracing arrows, the aggregate score (the validity if the nodes represent alternatives or the significance if the nodes represent evaluation criteria) xT=[x1, …, xn] of graph nodes can be obtained as a solution to the following equation. (2)
Sx=x ∞
(Each column vector of S converges to a single vector and Saaty proposed to use it as the aggregate score. This is the same as the solution x for equation (2).) Since equation (2) can also be viewed as a homogeneous equation (S-I)x =0, it is solvable by ordinary Gaussian elimination. In addition, since S is a probability matrix having the maximum eigenvalue of 1 and the solution x for equation (2) is the eigenvector of S. The power method is also applicable. It is obvious that x is uniquely determined except for its multiples and all its elements are positive from the famous Perron-Frobenius Theorem. Here, the supermatrix for this ANP is defined as (3). Where, W be criteria weight and M be evaluation matrix.
⎡0 S=⎢ ⎣M
W⎤ 0 ⎥⎦
(3)
3 Dominant AHP [3][4] In this chapter, the authors describe Dominant AHP, proposed by Kinoshita and Nakanishi. Firstly, the authors explain its model by demonstrating numerical examples, and then describes its mathematical structure. Table-1 shows the results obtained when there are two evaluation criteria and three alternatives. Each figure denotes an evaluated score of respective alternative.
The Relationship between Dominant AHP/CCM and ANP
321
Table 1 Evaluation of alternatives
A1 A2 A3 Sum
C1 84 48 75 207
C2 24 65 21 110
Sum 108 113 96
The Dominant AHP method is an approach where the aggregate score is obtained by selecting a particular alternative (this is called dominant alternative) out of several alternatives, and making it a criterion. The evaluated score of a dominant alternative is normalized to one, and the weights of criteria relative to the evaluation of dominant alternative is chosen as the basis for the evaluation. The evaluated scores as well as the weights of criteria, when selecting alternative A1 as a dominant alternative, are shown in Table-2. Table 2 Evaluated scores of Dominant AHP(dominant alternative A1)
A1 A2 A3
C1 84/84=1 48/84=0.571 75/84=0.362
C2 24/24=1 65/24=2.708 21/24=0.875
The weights of criteria of Dominant AHP(dominant alternative A1)
Weights of Criteria
C1
C2
84/(84+24)=0.778
24/(84+24)=0.222
Based on the results shown in Table-2, the evaluated scores of alternatives are expressed by Formula (4), and the weights of criteria by Formula (5).
1 ⎤ ⎡ 1 ⎢ M = ⎢0.571 2.708⎥⎥ ⎢⎣0.893 0.875⎥⎦
(4)
⎡0.778⎤ W =⎢ ⎥ ⎣0.222⎦
(5)
322
E. Kinoshita and S. Sugiura
The aggregate score is obtained by Formula (6).
1 ⎤ ⎡ 1 ⎡ 1 ⎤ ⎡0.778⎤ ⎢ ⎥ E1 = ⎢⎢ 0.571 2.708⎥⎥ ⋅ ⎢ = ⎥ ⎢1.046 ⎥ 0 . 222 ⎦ ⎢0.889⎥ ⎢⎣0.893 0.875⎥⎦ ⎣ ⎦ ⎣
(6)
By nomalizing the results obtained by Formula (6) so that the total shall be one, Formula (7) which denotes the final evaluated score, is acquired.
A1 ⎡ 0.341⎤ E = A 2 ⎢⎢0.356⎥⎥ A3 ⎢⎣0.303⎥⎦
(7)
Next, the authors describe the case when A2 is picked up as a dominant alternative. The evaluated scores and the weights of criteria in this case are shown in Table-3. Table 3 Evaluated scores of Dominant AHP (dominant alternative A2)
A1 A2 A3
C1 84/48=1.750 48/48=1 75/48=1.563
C2 24/24=0.369 65/24=1 21/24=0.323
The weights of criteria of Dominant AHP(dominant alternative A2)
Weights of Criteria
C1 48/(48+65)=0.425
C2 65/(48+65)=0.575
Based on the results shown in Table-3, Formula (8), showing the evaluated scores of alternatives, and Formula (9), denoting the weights of criteria, are aquired.
⎡1.750 0.369⎤ 1 ⎥⎥ M = ⎢⎢ 1 ⎢⎣1.563 0.323⎥⎦
(8)
⎡0.425⎤ W =⎢ ⎥ ⎣0.575⎦
(9)
As a result, the aggregate score is obtained by Formula (10).
⎡1.750 0.369⎤ ⎡0.956⎤ ⎡0.425⎤ ⎢ ⎥ ⎢ 1 ⎥⋅⎢ = ⎢ 1 ⎥⎥ E2 = ⎢ 1 ⎥ 0.575⎦ ⎢⎣1.563 0.323⎥⎦ ⎣ ⎢⎣0.850⎥⎦
(10)
The Relationship between Dominant AHP/CCM and ANP
323
When nomalizing the result obtained by Formula (10) so that the total shall be one, it matches with the result obtained by Formula (7). The same result can be achieved when A3 is chosen as a dominant alternative. Next, the authors describe the mathematical structure of the aggregate score of Dominant AHP. Suppose that evaluated score of alternative i is denoted as aij under a criterion j in the conventional AHP. The aggregate score pi of AHP is acquired by aij times cj, which signifies the weight of a criterion, and is expressed by Formula (11).
pi = ∑ c j aij
(11)
j
Dominant AHP is a method where the aggregate score is acquired by picking up a particular alternative (this is called a dominant alternative) out of several alternatives, and making it a criterion. When the evaluated score of alternative i is denoted by aij under a criterion j, the normalized evaluated score is expressed by
a ij a~ ij = . The weights of criteria, relative to the evaluation of dominant altera lj native, and whose aggregate score is nomalized to one, is chosen as the basis for the evaluation. As a result, the weight of a criterion is expressed as c~ j =
a lj
∑a
. lk
k
Thus the aggregate score of alternative i is shown as follows. ~ pi =
∑ ~c a~ = ∑ j
j
ij
j
alj
∑a k
a ij lk
a lj
=
1 a lk
∑
∑a j
ij
(12)
k
It was found that the result matches with AHP, when the weights of criteria relative to the evaluation of dominant alternative is chosen as the basis for the evaluation of Dominant AHP.
4 CCM [5][6] In this chapter, the authors describe the concurrent convergence method (CCM) proposed by Kinoshita and Nakanishi [4], [5]. In Dominant AHP, no matter which alternative is selected as a dominant alternative, the aggregate score, obtained respectively, can be the same. However, the aggregate score can be different if there are several weights of criteria, such as when a decision maker declares a different weights of criteria for each dominant alternative he or she selects, and as when the evaluated score of an alternative is not tanjible, but intanjible. As a result, the CCM [5], [6] method becomes vital in order to adjust the weights of criteria. The CCM method functions as follows. Suppose that there are two criteria and three alternatives. In this case, b1 denotes the weight vector of evaluation criteria
324
E. Kinoshita and S. Sugiura
from alternative 1, and b2 denotes the weight vector of evaluation criteria from alternative 2. When the evaluated score A of an alternative is given, the input data is shown as Figure-1.
CriteriaⅠ
b1 Alternative1
CriteriaⅡ
b3
b2 Alternative 2 ⎡ a1I A = ⎢⎢a2I ⎢⎣ a3I
Alternative 3
a1II ⎤ a2II ⎥⎥ a3II ⎥⎦
Fig. 1 Input data of CCM
As a result, the estimation rules of weight vector of evaluation criteria of dominant alternative 2 from dominant alternative 1, are shown as follows: Estimation rule for the weight vector of evaluation criteria: b Estimation rule for the evaluated score:
−1
AA1 → AA2
1
−1
→ A2 A1 b1
−1
Similarly, the estimation rule for the weight vector of evaluation criteria relative to dominant alternative 1 from dominant alternative 2, is shown as follows. Estimation rule for the weight vector of evaluation criteria: Estimation rule for the evaluated score:
−1
AA2 → AA1
−1
b 2 → A1 A2 b 2 −1
Here, the authors describe a case where there are evaluation "gaps" between the weight vector of evaluation criteria b1 and the estimated score A1A2-1b2 of a weight vector, also those between the weight vector of evaluation criteria b2 and the estimated score A2A1-1b1 of a weight vector. When there is no "gap," such a state is called "interchangeability of dominant alternatives"[3]. In reality, however, the interchangeability is seldom maintained and we often end up with minor evaluation gaps. Kinoshita and Nakanishi, in order to adjust those evaluation gaps, proposed CCM. First of all, the adjusted value for the weight vector of evaluation criteria b1 relative to dominant alternative 1 signifies the average score of the original data b11, the estimated score b12 relative to dominant alternative 2, and the estimated value b13 relative to dominant alternative 3, as shown below.
The Relationship between Dominant AHP/CCM and ANP
{
}
1 11 b + b12 + b13 3 −1 −1 −1 A1 A3 b 3 ⎫⎪ A1 A2 b 2 1 ⎧⎪ A1 A1 b1 = ⎨ T + + ⎬ 3 ⎪⎩ e A1 A1 −1b1 e T A1 A2 −1b2 e T A1 A3 −1b3 ⎪⎭
b1 =
325
(13)
By the same token, the adjusted scores for the weight vector for criteria b2 and b relative to dominant alternative 2 and 3, respectively, are shown as Formula (14) and Formula (15). 3
b2 =
{
1 21 b + b 22 + b 23 3
}
−1 −1 −1 A2 A3 b 3 ⎫⎪ A2 A2 b 2 1 ⎧⎪ A2 A1 b1 = ⎨ T + + ⎬ 3 ⎪⎩ e A2 A1 −1b1 e T A2 A2 −1b 2 e T A2 A3 −1b 3 ⎪⎭
b3 =
{
1 31 b + b 32 + b 33 3
(14)
}
−1 −1 −1 A3 A2 b 2 A3 A3 b 3 ⎫⎪ 1 ⎧⎪ A3 A1 b1 = ⎨ T + + ⎬ 3 ⎪⎩ e A3 A1 −1b1 e T A3 A2 −1b 2 e T A.3 A3 −1b 3 ⎪⎭
(15)
In CCM, the same process will be repeated until the "gap" between a new weight vector of evaluation criteria bi and the old weight vector of evaluation criteria bi (i=1,2,3) is eliminated. As to how this process contributes to the convergence of weight vector of evaluation criteria bi, detailed explanation is given in the reference [6]. The aggregate score based on the convergence value of CCM is consistent with Dominant AHP calculation.
5 The Relationship between Dominant AHP/CCM and ANP (1) Dominant AHP and ANP Analytic Network Process (ANP) is an improved form of AHP. It is a model which can deal with different weights of criteria at the same time. In this chapter, the authors explain the mechanism of ANP, by demonstrating numerical example, and then describe its mathematical structure. The authors then prove the equation, Dominant AHP+ANP=Dominant AHP, is valid when the weights of criteria relative to the evaluation of dominant alternative is chosen as the basis for the evaluation of Dominant AHP. ANP is a method, where the aggregate score is obtained by creating Formula (3), which is called a supermatrix, and by acquiring its major eigenvector (eigenvector which corresponds with the eigenvalue 1).
326
E. Kinoshita and S. Sugiura
However, in ANP, parallel sum of W and M in Formula (3) needs to be one of a stochastic matrix. As a result, M must denote the aggregate score of alternatives, which is normalized to one, and W should denote the weight of a criterion relative to Dominant AHP. Formula (16) is a supermatrix, utilizing the numerical example shown in Chapter 3. 0 0.778 0.425 0.781 ⎞ ⎛ 0 ⎜ ⎟ 0 0 0 .222 0.575 0.219 ⎟ ⎜ S = ⎜ 0.406 0.218 0 0 0 ⎟ ⎜ ⎟ 0 0 0 ⎟ ⎜ 0.232 0.591 ⎜ 0.362 0.191 0 0 0 ⎟⎠ ⎝
In Formula (16), when applying
and calculating
(16)
⎛ p⎞ ⎜⎜ ⎟⎟ as a major eigenvector of supermatrix S, ⎝q⎠
⎛ p⎞ ⎛ p⎞ S ⎜⎜ ⎟⎟ = ⎜⎜ ⎟⎟ , Formulas (17) and (18), each of which denotes a ⎝q⎠ ⎝q⎠
major eigenvector, are acquired. p signifies a column vector concerning criteria, while q is a column vector concerning evaluated score.
⎡ 0.653⎤ p=⎢ ⎥ ⎣0.347 ⎦
(17)
⎡ 0.341⎤ q = ⎢⎢0.356⎥⎥ ⎢⎣0.303⎥⎦
(18)
The evaluated score vector q of Formula (18) is consistent with the result obtained by Formula (7). In ANP, the aggregate score is obtained by creating supermatrix S, and acquiring its major eigenvector (eigenvector which corresponds with the eigenvalue one). However, submatrix M is a list of evaluated scores aij, and W is a transposed form of M. But since the sum of respective column is normalized, so that the total shall be one, each element of M and W is expressed as
M ij =
aij
∑a k
,Wij = kj
⎛q⎞ ⎛q⎞
a ji
∑a
jk
, respectively. S ⎜⎜ ⎟⎟ = ⎜⎜ ⎟⎟ is obtained when ap⎝ p⎠ ⎝ p⎠
k
⎛q⎞
plying ⎜⎜ ⎟⎟ as a major eigenvector of supermatrix S. However, when focusing onp
⎝ ⎠
ly on p, which corresponds with the aggregate score, MWp=p is acquired. Here, it is obvious that p denotes a major eigenvector of matrix M.
The Relationship between Dominant AHP/CCM and ANP
When applying pi =
∑a
ij
327
, and calculating the line i of MWp, the result is
j
shown as follows. ⎛ ⎛ ⎞⎞ ⎜ ⎜ ail a jl ⎟ ⎟ (MWp )i = ∑ (MW )ij p j = ∑ ⎜ ∑ ⎜ × ⎟ ⎟ × ∑ a jm j j ⎜ l ⎜ ∑ a kl ∑k a jk ⎟⎠ ⎟ m k ⎝ ⎠ ⎝ ⎛ ⎛ ⎞ ⎞⎞ ⎛ ⎜ 1 ⎜ ail a jl ⎟ ⎜ ail a jl ⎟ ⎟ = ∑⎜ ⎟ ⎜ ⎟ ⎟ × ∑ a jm = ∑∑ ⎜ ∑ j ⎜ ∑ a jk l ⎜ ∑ akl ⎟ ⎟ m j l ⎜ ∑ akl ⎟ ⎝ k ⎠i ⎝ k ⎠⎠ ⎝ k ⎛ ⎞ ⎜ ail a jl ⎟ = ∑∑ ⎜ ⎟= l j ⎜ ∑ a kl ⎟ ⎝ k ⎠
∑a
il
(19)
= pi
l
Formula (19)shows that pi =
∑a
ij
denotes a major eigenvector of MW, in
j
other words, it is what the authors attempted to acquire, or the aggregate score. It is consistent with Formula (12), which signifies the aggregate score of Dominant AHP in Chapter 3. Similarly, it is also possible to prove that q j =
∑a
ij
is valid
i
for each of the elements of q. As a result, the following relation between Dominant AHP and ANP is found valid. Dominant AHP+ANP=Dominant AHP (2) CCM and ANP In this section, the authors describe that when making the weights of criteria and the evaluated scores of alternatives as the value of ANP, by utilizing the weights of criteria which converged through CCM, the acquired result is consistent with that of CCM. It is not difficult to substantiate this, when utilizing the results obtained in the previous section. It is already known that the calculation which converged through CCM is consistent with Dominant AHP calculation[5][6]. When applying the weights of criteria which converged through CCM to a supermatrix of ANP, the acquired result turns out to be CCM. It is proved because the converged value of CCM equals to that of Dominant AHP, which signifies that it is structurally the same as Dominant AHP+ANP=Dominant AHP when it is valid, as shown in the previous section. Thus, the following relationship is applicable also to CCM. CCM+ANP=CCM
328
E. Kinoshita and S. Sugiura
6 Conclusion In this paper, when the weights of criteria relative to the evaluation of dominant alternative is chosen as the basis for the evaluation of Dominant AHP, Dominant AHP+ANP=Dominant AHP is valid. By applying the same approach, CCM+ANP=CCM is also proved valid. However, the authors would like to emphasize the fact that the results would be different unless making the weights of criteria relative to the evaluation of dominant alternative as the basis for the evaluation of AHP/ANP as proposed by Saaty, and of Dominant AHP/CCM as proposed by Kinoshita and Nakanishi. The authors believe that the availability of various decision making support models, such as AHP/ANP and Dominant AHP/CCM, is advantageous in solving a wide range of problems.
References 1. Saaty, T.L.: The Analytic Hierarchy Process. McGraw-Hill, New York (1980) 2. Saaty, T.L.: The Analytic Network Process. Expert Choice (1996) 3. Kinoshita, E., Nakanishi, M.: A Proposal of a New Viewpoint in Analytic Hierarchy Process. Journal of Infrastructure Planning and Management, IV-36 569, 1–8 (1997) (in Japanese) 4. Kinoshita, E., Nakanishi, M.: Proposal of New AHP model in light of Dominant relationship among Alternatives. Journal of Operations Research Society of Japan 42(2), 180–197 (1999) 5. Kinoshita, E., Nakanishi, M.: A Proposal of CCM as a Processing Technique for Additional Data in the Dominant AHP. Journal of Infrastructure Planning and Management, IV-42 611, 13–19 (1999) (in Japanese) 6. Kinoshita, E., Sekitani, K., Shi, J.: Mathematical Properties of Dominant AHP and Concurrent Convergence Method. Journal of Operations Research Society of Japan 45(2), 198–213 (2002)
The Role of Kansei/Affective Engineering and Its Expected in Aging Society Hisao Shiizuka and Ayako Hashizume 1
2
Abstract. Kansei engineering originally evolved as a way to “introduce human ‘Kansei’ into manufacturing.” The recent trend indicates that the development of Kansei engineering is expanding beyond manufacture and design and is widening into relevant fields, creating one of the most conspicuous features of Kansei engineering. This trend can also be felt by presentations made at the recent annual conferences of Japan Society of Kansei Engineering. It is needless to say, therefore, some kind of interdisciplinary development is necessary to find a mechanism for creating Kansei values, which is as important as the Kansei values themselves. This paper consists of three parts. The first part of this paper describes the general history of Kansei and the basic stance and concept of Kansei research. Second part discusses the significance and roles of creating Kansei values in Kansei engineering and also provides basic ideas that will serve as future guidelines, with its positioning in mind debating from many aspects. Finally, the paper emphasizes the necessity for Kansei communication in the Ageing society. It is important for the Kansei communication to give birth to the empathy. Keywords: Kansei/Affecive engineering, Kansei value creation, Kansei communication, senior people, nonverbal communication.
1 Introduction The twentieth century was what we call a machine-centered century. The twentyfirst century is a human-centered century in every respect, with scientific technologies that are friendly to humans and the natural and social environments of humans being valued. Therefore, research, development, and deployment of Hisao Shiizuka Department of Information Design, Kogakuin University 1-24-2 Nishishinjuku, Shinjuku-ku, Tokyo 163-8677, Japan e-mail:
[email protected] 1
Ayako Hashizume Graduate School of Comprehensive Human Science, University of Tsukuba 1-1-1 Tennodai, Tsukuba-shi, Ibaraki 305-8577, Japan e-mail:
[email protected] 2
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 329–339. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
330
H. Shiizuka and A. Hashizume
advanced scientific technologies may no longer be determined solely by technological communities, and issues whose solutions have been carried over to the twenty-first century may not be solved by the technologies of only one specific field [1]. For example, tie-ups and cooperation with human and social scientists are greatly needed since the issues can no longer be solved by the scientific efforts of research and development alone. It is no exaggeration to say that such issues today affect all aspects of social technologies. Therefore, these issues can no longer be solved by conventional solutions. Matured technologies that have brought happiness to humans are among interdisciplinary scientific areas, and the mounting issues to be solved in the twenty-first century cannot be fully understood by conventional technological frameworks; the nature of scientific technology issues that we need to understand is obviously changing. Research focused on Kansei is expected to be the solution to these issues with the greatest potential. Meanwhile, it is only recently that interest in Kansei values has rapidly heightened as the Ministry of Economy, Trade and Industry launched the “Kansei Value Creation Initiative [2].” However, it is true that we have many issues to solve including the development of new viewpoints in defining the universal characteristics of Kansei values, detailed guidelines for future research, etc. Kansei engineering originally evolved as a way to “introduce human ‘Kansei’ into manufacturing.” The recent trend indicates that the development of Kansei engineering is expanding beyond manufacture and design and is widening into relevant fields, creating one of the most conspicuous features of Kansei engineering. It is needless to say, therefore, some kind of interdisciplinary development is necessary to find a mechanism for creating Kansei values, which is as important as the Kansei values themselves. This paper consists of three parts. The first part of this paper describes the general history of Kansei and the basic stance and concept of Kansei research. Second part discusses the significance and roles of creating Kansei values in Kansei engineering and also provides basic ideas that will serve as future guidelines, with its positioning in mind debating from many aspects. Finally, the paper emphasizes the necessity for Kansei communication in the ageing society. It is important for the Kansei communication to give birth to the empathy.
2 History and Definition of Kansei The term Kansei that we use today originates from the aesthesis (an ancient Greek word meaning sensitivity/senses) used by Aristotle and is thought to have similar meaning to ethos. The German philosopher Alexander Gottlieb Baumgarten (1714-1762) specified the study of sensible cognition as “aesthetics” for the first time in the history of philosophy, and this influenced Immanuel Kant. Baumgarten defined the term “sensible cognition” using the Latin word Aesthetica as in Aesthetica est scientia cognitionis senstivae [Aesthetics is the study of sensible cognition]. He defined “beauty” as a “perfection of sensible cognition with a coordinated expression” and defined “aesthetics” as “the study of natural beauty and artistic beauty.” Lucien Paul Victor Febvre (1878-1956) understood Kansei as the French word sensibilite, which can be traced back to the early fourteenth century.
The Role of Kansei/Affective Engineering and Its Expected in Aging Society
331
He also maintained that Kansei meant human sensitivity to ethical impressions such as “truth” and “goodness” in the seventeenth century, and in the eighteenth century, it referred to emotions such as “sympathy” “sadness,” etc. On the other hand, in Japan, aesthetica was translated as bigaku [aesthetics]. Given this international trend, in Japan, also, Kansei research was invigorated and attempts were made to understand Kansei from various perspectives [1]. A major argument was that it could be interpreted in many ways based on the meaning of its Chinese characters, such as sensitivity, sense, sensibility, feeling, aesthetics, emotion, intuition, etc. Another argument was that the word, from a philosophical standpoint, was coined through translation of the German word Sinnlichkeit in the Meiji period. It consists of two Chinese characters – Kan [feel] and Sei [character], which represent yin and yang respectively, and if combined, constitute the universe. The two different natures of yin and yang influence and interact with each other to exert power [3]. Another interpretation was that “Sinn” in the word Sinnlichkeit includes such meanings as senses, sensuality, feeling, awareness, spirit, perception, self-awareness, sensitivity, intuition, giftedness, talent, interest, character, taste, prudence, discretion, idea, intelligence, reason, judgment, mind, will, true intention, soul, etc. These suggest that the word Kansei has not only innate aspects such as giftedness and talent but also postnatal aspects such as interest, prudence, etc. It is supposed to be polished while exerting its power through interaction with others in various environments. Table 1 shows the result of a survey of Kansei researchers’ views on the word Kansei [4]. These findings lead us to believe that Kansei is multifunctional. Table 1 Interpretations of Kansei by researchers
No. 1
2
3
4
5
Description Kansei has postnatal aspects that can be acquired through expertise, experience, and learning such as cognitive expressions, etc. as well as an innate nature. Many design field researchers have this view. Kansei is a representation of external stimuli, is subjective, and is represented by actions that can hardly be explained logically. Researchers in information science often have this view. Kansei is to judge the changes made by integration and interaction of intuitive creativity and intellectual activities. Researchers in linguistics, design, and information science often have this view. Kansei is a function of the mind to reproduce and create information based on images generated. Researchers in Kansei information processing have this view. Kansei is the ability to quickly react and assess major features of values such as beauty, pleasure, etc. Researchers in art, general molding, and robot engineering have this view.
332
H. Shiizuka and A. Hashizume
3 Kansei System Framework Figure 1 shows a two-dimensional surface mapped with the elements required for Kansei dialogue. The vertical axis represents natural Kansei, which depicts the Kansei of real people, and artificial Kansei, which realizes Kansei through artificial means. The left-hand side of the horizontal axis represents “measurement (perception),” which corresponds to perception of understanding people’s feelings and ideas, etc. The right-hand side of the horizontal axis represents “expression (creation, representation and action),” which depicts your own feelings and ideas, etc. Kansei epistemology
Artificial Kansei
Design
Smart agent
Measurement (perception)
Kansei expression theory
Kansei system
Cognitive system
Expression (representation, creation, action)
Soft computing Multivariate analysis
Cognitive science
Modeling Natural Kansei
Fig. 1 Kansei system framework
The idea behind this is the two underlying elements of “receiving” and “transmitting” ideas, as understanding Kansei requires “a series of information processing mechanisms of receiving internal and external information intuitively, instantly, and unconsciously by using a kind of antenna called Kansei, then selecting and determining only necessary information, and, in some cases, transmitting information in a way recognizable by the five senses.” This space formed as mentioned above and required for Kansei dialogue is called the Kansei system framework [1]. The framework shows that past researches on natural Kansei focused almost entirely on understanding natural Kansei and on methods of accurately depicting this understanding in formulae. These researches analyzed data of specific times. Researches on both the third and fourth quadrant were conducted in this context. Researches on Kansei information processing mainly correspond to the third and fourth quadrants. Artificial Kansei, meanwhile, corresponds to the first and second quadrants. The major point of artificial Kansei is to build a system to flexibly support each individual and each circumstance, or, to put it differently, to realize artificial Kansei whose true value is to provide versatile services to versatile people and to support differences in individuals.
The Role of Kansei/Affective Engineering and Its Expected in Aging Society
333
Understanding Kansei as a system is of great significance for future researches on Kansei. First, in natural Kansei, as mentioned earlier, cognitive science and modeling correspond to the third and fourth quadrants, respectively. In artificial Kansei on the upper half, meanwhile, Kansei representation theory corresponds to the first quadrant while Kansei epistemology corresponds to the second quadrant. Kansei representation theory mainly deals with design. Design involves broad subjects. In “universal design” and “information design,” which are gaining increasing attention nowadays, new methodologies, etc. are expected to be developed through increasing awareness about Kansei. Second, Kansei epistemology, which corresponds to the second quadrant, is a field studied very little by researchers so far. While many researches currently underway to make Kansei of robots closer to that of humans are mainly focused on hardware (hardware robots), software robots will be essential for the future development of Kansei epistemology. A major example is non-hardware robots (or software robots) that can surf the internet (cyber space) freely to collect necessary information. Such robots are expected to utilize every aspect of Kansei and to be applied more broadly than hardware robots.
4 History of Kansei Values There will be no single answer to the question of “why create Kansei values now?” [2]. Japan has a long history of magnificent traditional arts and crafts created through masterly skill. It will be necessary to look at Kansei value creation from not only a national (Japanese) viewpoint but also a global perspective by taking the history of Kansei into consideration. Information technology is currently at a major turning point. The “value model” has been changing . The past model in which information technologies brought about values was based on the “automation of works and the resulting energysaving and time-saving.” The value model originates from the “Principles of Scientific Management” released by Taylor in 1911. Seemingly complex works, if broken down into elements, can be a combination of simple processes many of which can be performed efficiently at lower cost and with less time by using computers and machines. In this respect, the twentieth century was a century of machine-centered development with the aim of improving efficiency (productivity) by using computers, etc. The twenty-first century, meanwhile, is said to be a human-centered century in every respect. The quest has already begun, in ways friendly to humans, to streamline business operations that go beyond simple labor-saving. This is represented by such keywords as “inspiration,” “humans,” and “innovation” and requires new types of information technology to support them. Changes have already begun. The most remarkable change, among all the three changes, is the third one or the “change from labor-saving to perception” 4). This changes the relationship between “information technology and humans.” In the twentieth century, the relationship between theory-based IT (information technology) and perception-based humans was represented by the “conflict” in which perception was replaced by theory. In the twentyfirst century, meanwhile, it is expected to be represented by the “convergence of IT
334
H. Shiizuka and A. Hashizume
theory and human perception.” Specifically, how to incorporate mental models into IT models will be an essential issue for this convergence. It is worth noting that Drucker, in his book Management Challenges for the 21st Century [5], said that “the greatest deed in the twentieth century was the fiftyfold improvement in the productivity of physical labor in the manufacturing industry” and “expected in the twenty-first century will be improvement in the productivity of intellectual labor.” From these historical perspectives of productivity improvement, we can understand that we currently need to find ways to incorporate “perception” and “human minds” into products. Finding ways of doing so could lead to improving the values of Kansei that people feel.
5 Resonance of Kansei If you look at Kansei value creation from a system methodology point of view, you will find that one Kansei converter is involved in it. The Kansei converter enables the resonance of Kansei to generate (or create) Kansei values. This phenomenon can be explained by the resonance that you can observe in the physical world. The physical phenomenon of “resonance,” especially in electric circuits, enables electric currents containing resonance frequency components to flow smoothly as the circuits that include inductors and capacitors turn into pure resistance with zero at the imaginary part through interaction of these reactive elements. When manufacturers (senders) have stories with a strong message, as shown in Figure 3, the level of “excitement” and/or “sympathy” increases among users (receivers), causing “resonance of Kansei,” which in turn creates values. The “resonance of Kansei” can be interpreted as a phenomenon that could occur when the distance between a sender and a receiver is the shortest (or in a pure state where all the impurities are removed). The distance between the two becomes shortest when, as shown in Figure 2, the level of the story is strong where excite-
ment or sympathy is maximized . How should we quantitatively assess (measure) the “level of a story” and the level of “excitement or sympathy”? This will be the most important issue in the theoretical part of Kansei value creation.
Manufacturers
Manufacturers (senders) have “stories” with strong Fig. 2 Resonance of Kansei
Resonance of Kansei
Users
Level of excitement” or “sympathy” i
The Role of Kansei/Affective Engineering and Its Expected in Aging Society
335
6 Creativity and Communication Kansei values are created only when some communication between manufacturers (senders) and users (receivers) takes place. The communication may take the form of explicit knowledge where specific Kansei information is exchanged or may take the form of tacit knowledge. Messages exchanged are treated as information and go through “analytic” processing and “holistic” processing for interpreting, deciding on, and processing the information. Analytic processing generally takes place in the left brain while intuitive processing takes place in the right brain. Holistic processing takes place through interaction of the right and left brains. It is based on the idea that the right and left brains work interactively rather than independently. An idea generated in the right brain is verified by the left brain. Theoretical ideas not backed by creativity or intuitive insight sooner or later flounder. It is essential for the right and left brains to work together to solve increasingly complex issues. It may be no exaggeration to say that the most creative products in human culture, whether it be laws, ethics, music, arts, or scientific technology, are made possible through collaboration of the right and left brains. For this reason, human cultures are the work of the corpus callosum. Figure 3 shows a summary of information processing by the right and left brains [6]. We can understand moral, metaphorical, or emotional words or punch lines of jokes only when the right and left brains work together (Professor Howard Gardner, Harvard University). Therefore, the mechanism of giving the stimulus of Kansei information to create values is made possible only through collaboration of the right and left brains. Analytic
Combination
Intuitive
Left brain
Combination of left and right brains Processing data one by one and simultaneously Unconscious Combination of emblematic and signal functions Systemized random processing
Right brain
Processing data one by one Conscious Emblematic, partial, and quantitative functions Systematic processing
analysis/ modification
Processing data simultaneously Unconscious Signal and whole functions Random processing
idea
Fig. 3 Collaboration of the right and left brains (ways of information processing)
336
H. Shiizuka and A. Hashizume
Kansei communication is the foundation of Kansei value creation; creativity is generated through encounters, and these encounters are the source of the creativity (Rollo May, an American psychologist). Therefore, communication and creativity are inseparable and the power of creativity can be fully exerted when knowledge and experience, wisdom and insight regarding the two are freely shared and effectively combined. Kansei values are created through the relationship of “joint creation” between manufacturers and users. “Joint creation” literally means “creation jointly with customers.” To put it differently, “joint creation” is a new word based on the concept of making new agreements or methodologies, through competitive and cooperative relationships among experts and expertise in different fields of study, to solve issues that cannot be solved by experts of any single field of study. Thus, “joint creation” is based on collaboration and, therefore, Kansei encounters and communications with people in different fields of study are essential for “joint creation.”
7 Aging Society and Kansei Communication The United Nations defines an Ageing society as a society that has a population Ageing rate of 7% or higher. The population Ageing rate is defined as the percentage of the overall population that is 65 or older. Japan became an Ageing society in 1970. The population Ageing rate of Japan continued to increase thereafter, and in 2007, it became a super-aged society (a population Ageing rate of 21% or higher). Many European countries and the United States are experiencing an increase in the population Ageing rate, although the rate of Ageing is slower than in Japan. East Asian countries like China, Korea, and Singapore have not reached superaged society status, but the change in the population Ageing rate is similar to Japan, or in some cases, even higher than that of Japan. Ageing population is the biggest issue facing the world today, Japan in particular. Also, because of the trend towards nuclear families and the increase in one-person households, the number of households with only elderly people and the number of elderly people living alone are on the rise in Japan. It is known as an important problem in Aging society that the lack of communications raises the incidence rate of the dementia. We discuss importance of Kansei aspects of communication of the elderly. Kansei communication is “communication that is accompanied by a positive emotional response.” Two important factors that are required for Kansei communication are discussed in some detail below. Empathy is a must in order for a person to feel good when communicating. Empathy is possible only if information is conveyed well from a speaker to a listener. Such effective communication must contain not only the verbal aspects of the story being communicated in a narrative sense, but also nonverbal aspects that are difficult to verbalize, including knowledge, wisdom, and behavior . Mehrabian’s Rule, as proposed by Mehrabian, also states that nonverbal aspects of communication (visual information like facial expressions and body language, as well as auditory information like tone and pitch of voice) determine what is communicated more so than the meanings of the words spoken [7]. Note,
The Role of Kansei/Affective Engineering and Its Expected in Aging Society
337
however, that this is true only when there are inconsistencies between the words and facial expressions and tone of voice, for example. Having said that, human verbal communication is generally supported by nonverbal information even without inconsistencies between these aspects. This is because speakers are unable to convey everything that they want to verbally and listeners unconsciously interpolate to fill the gap with nonverbal information given by the speaker. In clear communication with good nonverbal conveyance, good understanding and strong empathy are generated. Common understanding by nonverbal communication is closely connected to Kansei communication. Humans do not want to communicate with everyone. A comfortable distance, or Kansei distance, exists between people within societal relationships. Cultural anthropologist Edward T. Hall divided proxemics into intimate distance, personal distance, social distance, and public distance. He further refined them into relatively close (close phase) and relatively far (far phase), and then explained the significance of each distance. These distances convey how close people feel toward each other, and what desires they have in terms of communication. Kansei communication exists within intimate distance and personal distance. These distances are limited to face-to-face communication. The significance of physical distance is becoming somewhat obsolete with increased communication through electronic equipment with the advance and spread of information technologies. People often desire face-to-face or telephone communication between truly close acquaintances, but nonverbal communication like e-mail is often used to accommodate the surrounding circumstances of the communicators. Emotional closeness, in addition to physical distance, is another factor that shapes Kansei distance. It is safe to assume that the nonverbal aspects of communications between such communication partners are shared. This closeness and the existence of nonverbal aspects of communication help to form a comfortable distance (Kansei distance) in e-mail communication, for example.
8 The Elderly and Need for Kansei Communication Ageing brings various changes to bodily functions including sensory functions and motor functions. These changes to bodily functions, along with changes in social status and economic circumstances that come with getting older, bring various changes to emotional and psychological functions as well. These changes affect communication and relationships with society and elderly people. The onset of the symptoms of Ageing in elderly people is deeply related to the age of an individual, but there is no rule as to when these symptoms may occur. Rather, the timing largely depends on a person’s physical characteristics and life history. Similarly, the Ageing process and its rate of progress for each function show large differences between individuals. When considering elderly people, we need to understand what types of change occur as a person ages, and we must understand that Ageing shows itself in various dimensions and that it does not always fit into a general stereotype.
338
H. Shiizuka and A. Hashizume
In this section, we will passively observe the changes that come with Ageing as we consider the social relationships of elderly people and Kansei communication. Also we consider changes that come with Ageing. The psychological state of elderly people is affected by a multitude of factors, including societal factors like social status and economic circumstances, psychological factors like subjective sense of well being, as well as physical factors like deterioration of visual and auditory sensory functions. These life changes create a very stressful environment for elderly people. Therefore, we feel that elderly people potentially have a strong need for Kansei communication. Enriched communication has been reported to have preventative effects on diseases like dementia and depression. Likewise, Kansei communication holds its own promise of showing a preventative effect equal to, or better than, the aforementioned. In this regard, it is very necessary to invigorate communication at Kansei distances utilizing nonverbal aspects of communication.
9 Concluding Remarks It is only recently that the interest in Kansei values has rapidly increased as the Ministry of Economy, Trade and Industry launched the “Kansei Value Creation Initiative [2].” However, it is true that we still have many issues to solve including the development of new viewpoints to extract the universal characteristics of Kansei values and the development of detailed guidelines for future research, etc. It is only natural that scientific discussions on Kansei could specifically lead to the “quantification” of Kansei, and further discussions are required on the quantification issue. The main agenda of such discussions will be on how to quantify Kansei in a general way and extract the essence of Kansei by using masterly skills. Further researches are required on this issue [8]. Appropriate convergence of technology and mentality (Kansei) and recovery of the proper balance are expected to be achieved by shifting discussions on Kansei engineering from individual discussions to integrated discussions. Kansei engineering is a cross-sectional (comprehensive) science and should integrate specific technologies in psychology, cognitive science, neuropsychology, sociology, business management, education, psychophysiology, value-centered designing, ethics engineering, and computer science. To that end, a “technological ability + artistic ability + collaborative spirit” needs to converge appropriately. Kansei value creation, among all aspects of Kansei engineering, requires collaboration with the relevant areas of Kansei and needs to make progress while looking at the whole picture. Kansei/Affective engineering is expected in the Ageing society problem, especially in Kansei communication. If the motivation of elderly people to use new communications media rises through the use of Kansei communication, the gap in new media use that exists between the generations can be somewhat closed. Kansei communication will also strengthen bonding-type social capital, leading to enhanced psychological health and societal welfare. Through this, diseases characteristic of elderly people can be prevented and the quality of life of the elderly population can be improved.
The Role of Kansei/Affective Engineering and Its Expected in Aging Society
339
References 1. Shiizuka, H.: Kansei system framework and Outlook for Kansei engineering. Kansei Engineering 6(4), 3–16 (2006) 2. Minister of Economy, Trade and Industry, Japan, Press Release, http://www.meti.go.jp/press/20070522001/20070522001.html 3. Kuwako, T.: Kansei philosophy, NHK Books (2001) 4. Harada, A.: Definition of Kansei, Research papers by Tsukuba University project on Kansei evaluation structure model and the model building, pp. 41–47 (1998) 5. Atsuo Ueda (translation): What governs tomorrow, Diamond (1993) 6. Howell, W.S., Kume, A.: Kansei communication, Taishukan Shoten (1992) 7. Hashizume, A., Shiizuka, H.: Ageing society and Kansei communication. In: PhillipsWren, G., et al. (eds.) Advances in Intel. Decision Technologies. SIST, vol. 4, pp. 607– 615. Springer, Heidelberg (2010) 8. Shiizuka, H.: Positioning Kansei value creation in Kansei engineering. Kansei Engineering 7(3), 430–434 (2008)
Part II Decision Making in Finance and Management
A Comprehensive Macroeconomic Model for Global Investment Ming-Yuan Hsieh, You-Shyang Chen, Chien-Jung Lai, and Ya-Ling Wu
*
Abstract. After the globalization of macroeconomic, the each country economy has the comprehensively inseparable interrelationships. Therefore, in order to fully ascertain the developed pulsation and tendency of the international finances, the global investors need to take vigorous tactics to face the competition for globalization. Financial investment environment changes with each passing day, investors’ satisfaction are more and more discerning, and market demands can fluctuate unpredictably. This research stays focus on the comparison of these industrial regions consisting of ten industrial regions comprised of two developed industrial regions (USA and Japan) and eight high-growth industrial regions (Four Asia Tigers and BRIC). Further, the measurement objectives consists of twenty-four macroeconomic indicators of these above ten industrial regions includes fourteen macroeconomic indicators and ten statistic indexes from four academic institutes comprised of the IMD World Competitiveness Yearbook (WCY), World Economic Forum (WEF), Business Environment Risk Intelligence (BERI), and Economist Intelligence Unit (EIU) that are the focal dimensions of integration investigation in these contexts. Significantly, this research also deals with quantitative and empirical analysis of the prominent features and the essential conditions for portfolio theory and macroeconomic model and to evaluate the relative strengths and weaknesses of twelve stock markets of the ten industrial regions through focusing on the scenario analysis and empirical analysis through the use of the fluctuate percentage of stock price index and stock market capitalization of twelve stock markets. Keywords: Macroeconomic Model, Global Investment. Ming-Yuan Hsieh Department of International Business, National Taichung University of Education *
You-Shyang Chen Department of Information Management, Hwa Hsia Institute of Technology Chien-Jung Lai Department of Distribution Management, National Chin-Yi University of Technology Ya-Ling Wu Department of Applied English, National Chin-Yi University of Technology J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 343–353. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
344
M.-Y. Hsieh et al.
1 Introduction Many investors have confronted more challenges due to the rapid capricious development of the world economic and financial investment environment. Financial investment environment changes with each passing day, investors’ satisfaction are more and more discerning, and market demands can fluctuate unpredictably. While facing the constant changes of the global financial markets, it is important to know how to break through the current situation, maintain an advantage and continuously make a profit. Many investors have the pressure of competing to positively adapt, to form a competitive investment strategy, and to have a great project management strategy. The traditional business investment is not enough to deal with the issues regarding new and various economic challenges. This study will focus on answering the topic of this research - Reducing Systematic Risk through Portfolio Theory and Macroeconomic Model. The extensive investors and researchers commenced to aware of that risks have always the key point to control the realized returns. In order to do a research regarding the risks, however, there are a series of questions to appear for them. For example, how to measure the conceptual risks? How to concretize the abstract risks? How to quantify the national risks? How to connect the risks and expect returns? How to link the risks with realized returns? How to define, measure, and quantify the relationship among expected returns, realized returns and risks? How to decrease the impact of risks if the conceptual risks could be materialized? [1] initially claimed that the risk can be dispersed through diversified invested activities. Further, [1] advocated that risk is equal the covariance ( σ ) of returns of in2
vested portfolios. Covariance ( σ ) is equal that total of that invested portfolio re2
turn rate ( Ri ) is multiplied by invested objective selection percentages (invested objectives weights, Wi ). In Markowitz’s paper, Portfolio Selection, the risk firstly is quantified and the connected with expected returns and realize returns. The initial research of link between increasing invested return rate and decreasing invested risks was Markowitz Portfolio Theory. After the portfolio theory was published in the journal of finance, the another economic expert [2], [2] commenced to do research in Markowitz portfolio theory and he resoundingly created any financial model, Capital Asset Pricing Model (“CAPM”) that cleanly linked the relationship among the expected return rate of a stock, risk-free retuned rate, expected return rate, market portfolio return rate and risk priority number (Beta coefficient, “β”). Often systematic risk results in declining of total portfolio investment value as most portfolio investments declines in value. [2][3] claimed that investors can utilize the invested portfolio invested strategies (separate invested objectives) to decrease or eliminate unsystematic risk, for example, the investors can invest high-profit and high-risk and low-profit and low-risk invested objectives in order to perfect hedge to create efficient invested portfolio. On the contrast, systematic risk resulted from the invested markets including economic growth rate, production rate, foreign exchange rate, inflation, government economic policies and others macroeconomic environment factors. A typology of portfolio theory and macroeconomic model was presented in order to make the
A Comprehensive Macroeconomic Model for Global Investment
345
meaning of the portfolio theory and macroeconomic model more exact in this research, thus avoiding the confusion and misunderstanding rampant in popular and journalistic discussions on the subject research of macroeconomic environment among the ten industrial regions, especially in the rapidly developing BRIC (Brazil, Russia, India and China) , and by the four international world-level economyacademic measure instructions: IMD World Competitiveness Yearbook (WCY), World Economic Forum (WEF), Business Environment Risk Intelligence (BERI) and Economist Intelligence Unit (EIU).
2 Methodologies 2.1 Research Specification of Research Sample and Data Collection Bayesian was the first person to originate the theory regarding measuring probabilities based on historic event data to analyze the expectation rate of probabilities (“Bayes’s Theorem”) in 1701. Bayes’s Theorem has already been utilized for more than two hundred and fifty years and the Bayesian interpretation of probability is still more popular and recent. In Bayes’s Theorem theory, the measurement and revaluation of probability can be approached in several ways. The preliminary application is based on betting that the degree of belief in a proposition is presented in the odds that the assessor is willing to bet on the success of a trial of its truth. In the 1950s and 1960s, the Bayes’s Theorem became the preferred and general approach for assessing probability. [4] was an English psychologist. Spearman was the forerunner to explored factor analysis for analyzing the research factor’s rank correlation coefficient and also addressed the creative work on statistic models for human intelligence, including his theory that disparate cognitive test scores reflect one single general factor (G-factor) that is coining the term g factor (for example, personality, willpower , nerve situation, blood running and others related mental energy factors) and another specific factor (S-factor) which only is related with particular ability. With his strong statistical background, Spearman commenced to estimate the intelligence of twenty-four children in the village school. After a series of research, he discovered that the correlations of factors were positive and hierarchal that resulted in the two-factor theory of intelligence in this study.
2.2 Literature Review on Portfolio Theory Recently, Harry Markowitz delivered that investors have been always pursuing the effect portfolio (minimum risks) of maximum returns under the limited invested capital and average of return rate through different percentage of various invested objectives. [2] extended the portfolio concept of [1] to explore the “Sharpe Ratio”. The fundamental idea of Sharpe Ratio briefly points out the relationship between invested return premium and invested risk that the total that expected return rate subtracted risk-free return rate is divided by standard deviation. To take one step ahead, [2] adopted the portfolio theory of [1] to explore the single-index model (“SIM”). SIM is an asset pricing model commonly used for calculating the
346
M.-Y. Hsieh et al.
invested portfolio to measure risk and return of a stock. In the CAPM, portfolio risk is displayed by higher variance (i.e. less predictability). From another standpoint, the beta of the portfolio is the defining factor in rewarding the systematic exposure taken by an investor. In terms of risks, he further argued that invested risks of a portfolio consists of systematic risk (non-diversifiable risk), and unsystematic risk (diversifiable risk or idiosyncratic risk). Systematic risk results from the risk common to all invested market because each invested market possess their various volatilities. Unsystematic risk is related with individual invested objectives. Unsystematic risk can be diversified away to smaller levels by including a greater number of invested objectives in the portfolio. [7] inaugurated to bring the macroeconomic factors (market factors) into portfolio theory in order to find out the impact of systematic risks on expected return rates of invested objectives.
2.3 Literature Review on Macroeconomic Model Recently, there are a large number of economic and financial scholars and researchers who devoted on doing portfolio researches based on portfolio theory. In the final analysis, the fundamental concept of portfolio theory resulted from the[5]. [5] was one of the extremely impartment and deeply impacted economists in the economic and financial research fields in Europe from 1920s to 1980s. In his released articles and books, in the 1930s, he explored the marvelous concept of elasticity of substitution resulted in a complete restatement of the marginal productivity theory. Initially, in terms of commencement of macroeconomic model, [8] and [9] briefly classified two main research models. One is macroeconomic model (“MEM”) which is complicated to calculate and analyze and another is computable general equilibrium (“CGE”) which can be simply inferred and calculated. Further, [10] assorted MEM to five categories including Keynes–Klein (“KK”) model, Phillips–Bergstrom (“PB”) model, Walras–Johansen (“WJ”) model, Walras–Leontief (“WL”) and Muth–Sargent (“MS”) model. Nowadays, the Keynes–Klein (“KK”) model has been utilized for a large number of academic economic researches [11]. On the other hand, CGE also utilizes the joint equations to explain the correlation between economic factors [12] but though Social Accounting Matrix (“SAM”), CGE accentuates three analytical factors (labor, manufacture product and financial market) in the macroeconomic model [13]. [13] delivered the academic paper, “A Dynamic Macroeconomic Model for short-run stabilization in India”, to do research on India’s complicated economy. The effects of a reform policy package similar to those implemented for Indian’s trade and inflation in 1991 was able to assess through this macroeconomic model in this research. The most different standpoint of this research is that the non-stationarity of the data into this macroeconomic model and estimation procedures with the stationarity assumption. [14] delivered the topic of economic freedom in the Four Asia Tigers. [14] focused on an analysis of the main controversy: the role of the state in their rapid growth in the study. Further, Paldam (2001) briefly produced the economic freedom index in order to quantify the abstract concept of the economic freedom and
A Comprehensive Macroeconomic Model for Global Investment
347
he successfully expressed the level of economic freedom though analyzing the economic freedom index in the Four Asia Tigers by surveys data and additional. In the conclusion, three of the Four Asia Tigers (Shang pore, Hong Kong and Taiwan) have a level of regulation that are familiar with the west European countries due to the result of the economic freedom index from these three are close to level of conditional laisser-faire. [15] investigated the East Asian economic from 1986 to 2001 including the academic lecture reviews and empirical observations though the method of biblimetirc research. In their study, they organized around “4,200 scholarly articles written about the East Asian economies that were indexed by the Journal of Economic Literature from 1986 to 2001 and included on the CD-ROM EconLit.” [15]. Further, they centralized these lecture papers regarding the East Asia economic to identify the leading research authors, remarkable journals and empirically observed papers in detail. After implicitly discussion, they commenced to compare Four Asia Tigers with those of other emerging market economies (Czech Republic, Hungary, Mexico, and Poland) and a developed market economy (Italy) in order to attempts to burrow correspondence between the growth in articles and the growth of the economies. [16] clarified the competitiveness between two financial cities, Hong Kong and Singapore. [16] utilized the rational foundation of the categories of finance centers to expand his research concept and empirical investigation. Since the government of Hong Kong realized the world finance trend due to “Asia Finance Crisis” and “Economic Area Competition”, the economic concepts and financial policies of government to specify Hong Kong to be attractively invested financial market. This research in detail explored the competition, comparison and evaluation of two financial cities as attractively invested financial market. Besides, the contribution of this research is to successfully build the evaluated model based on macroeconomic profile and a financial centre profile of these two cities in qualitative and quantitative theses.
3 Learning of Imbalanced Data In terms of examining the complexity and uncertainty challenges surrounding portfolio theory and macroeconomic model, five years of data was analyzed along with multi-methods and multi-measures field research in order to achieve a retrospective cross-sectional analysis of industrial regions. These industrial regions consisted of ten industrial regions comprised of two developed industrial regions (USA and Japan) and eight high-growth industrial regions (Four Asia Tigers and BRIC). This chapter not only characterizes the overall research design, empirical contexts, and research sample and data collection procedures but also is designed to compare the ten industrial regions.
3.1 Research Specification of Research Sample and Data Collection In terms of the representativeness and correction of the efficient macroeconomic model though factor analysis, the research sample must collectively and statistically constrain all impacted macroeconomic factors as far as possible. Further, the sample
348
M.-Y. Hsieh et al.
in this research contains large and complicated macroeconomic factors that are collected from two authoritative and professional channels. One is the official government statistic departments and another is from the four economic statistics for macroeconomic factors data, including IMD World Competitiveness Yearbook (WCY)-National Competitive Index (NCI), World Economic Forum (WEF)-Global Competitiveness Index (GCI), Business Environment Risk Intelligence (BERI)Business Environment, and Economist Intelligence Unit (EIU) Business Environment. The content of research sample consists of the vertical range and horizontal scope. Specifically, in terms of the validity and reliability of collected data, this study focused on the three important measuring aspects: (1) Content Validity which was judged subjectively; (2) Construct Validity which was examined by factor analysis and (3) Reliability which concluded that the seven measures of quality management have a high degree of criterion-related validity when taken together. Given the sensitive nature of research data, time was devoted to cite the impacted macroeconomic factors of academic institutions. A database of all macroeconomic factors was created using public and primary economic reports including press releases, newspapers, articles, journal articles and analyzing reports. These sources provided a macroeconomic-level understanding of the motivations and objectives, basic challenges, target characteristics, financial market contexts and general sense regarding portfolio theory and macroeconomic model. The economic indicators data includes the annual economic indicators data (leading and lagging indicators). The vertical range stretch over five years from 2004 to 2008 and the horizontal scope consists of twelve stock markets of ten industrial regions including USA, Japan, Four Asia Tigers and BRIC. With regard to the analysis method, whichever gains a higher score will be given full marks and other methods are in accordance with relative value to decide who wins the score. The measured research data includes 24 macroeconomic indicators.
3.2 Research Design The fundamental research design in this research is based on combining the portfolio theory and macroeconomic model in order to create the efficient and effective macroeconomic model to measure the beta priority number (beta coefficient) in CAPM. Further, in order to produce the macroeconomic model, this research follows the above procedure of the research theory development framework to build the research design framework in Figure 1. This research design framework not only focuses on the application of the portfolio theory, the macroeconomic model, and the assessment method but also concentrates on the macroeconomic environment for industrial regions that is measured by some major statistic macroeconomic factors that included GDP, Economic Growth Rate, Import, Export, Investment Environment, and Financial Trade which are from official government statistic departments such as Taiwan’s Bureau of Foreign Trade (“MOEA”) and USA’s Federal Reserve Board of Governors and academic economy institutions such as the IMD, World Economic Forum (WEF), Business Environment Risk Intelligence (BERI), and Economist Intelligence Unit (EIU).
A Comprehensive Macroeconomic Model for Global Investment
349
Identifying Research Topic
Collecting Related Lecture 1. Outstanding papers and journals regarding research methodology 2. Fundamental concept of portfolio theory
Measuring Invested Systematic Risk Indexes - Utilizing Factor Analysis to Produce the Effective Macroeconomic Model for Analytical Industrial regions
Comparison of Invested Systematic Risk among ten industrial regions (USA, Japan, Four Asia Tigers and BRIC)
Bring the annual growth rate of ICI into CAPM model in order to calculate Beta Priority Numbers of twelve stock markets of ten industrial regions
Develop and Apply
Factor Analysis
Macroeconomic Model
Portfolio Theory (CAPM), Scenario Analysis & Empirical Analysis
Conclusion and Recommendations
Fig. 1 The Research Design Framework
4 Classifier for Large Imbalanced Data According to inductive reasoning from two industrial regions areas, technical compatibility is identified to provide a macroeconomic profile and a financial profile of the ten industrial regions in quantitative terms for comparison and evaluation purposes. The analysis suggests that each of the economic financial competitiveness indicators can have positive or negative implications for development of portfolio theory and macroeconomic model and are constructed based on the insights from the research of twelve stock markets of ten industrial regions. This chapter deals with quantitative and empirical analysis of the prominent features and the essential conditions for portfolio theory and macroeconomic model and to evaluate the relative strengths and weaknesses of twelve stock markets of the ten industrial regions by examining three hypotheses. Through factor analysis, the “Component Scores” among the selected main six factors (Academic Economic Institute Score Factor, Economic Production Factor, Economic Trade Factor, Economic Exchange Rate Factor, Economic Interest Rate Factor and Economic Consumer Price Factor) and twenty analytical variables are presented in following explanation formulation. According to the statistics, the analytical measure model
350
M.-Y. Hsieh et al.
of national competition is presented in the following formulation (1) and formulation (2) (Varimax Method): Assumption: all collected data are correct and the formula inaccuracy is given and constant. Invested Systematic Risk Index (Competition of invested financial markets) (df) = Academic Economic Institute Score Factor + Economic Production Factor + Economic Trade Factor + Economic Exchange Rate Factor + Economic Interest Rate Factor + Economic Consumer Price +e (formula inaccuracy) = (WEFCBC, WEFGE, BERIFOR, BERIOR, GDPPC, EIUER, WEFGR, BERIPR, WEFIF, IMDCEP and BERIER) + (IPY, NRFEG and IPG) + (GDPPP, IP and EP) + (ER) + (IR) + (CPI) +e (formula inaccuracy) =0.957*WEFCBC+0.951*WEFGE+0.913*BERIFOR+0.898*BERIOR+0.891* GDPPC+0.882*EIUER+0.881*WEFGR+0.881*BERIPR+0.868*WEFIF+0.833*I MDCEP+0.778*BERIER+0.857*IPY+0.792*NRFEG+0.729*IPG +0.954*GDPPP+0.852*IP+0.677* EP+ 0.807*ER+0.456*IR+0.527 *CPI+ e (formula inaccuracy) (1) Assumption: all collected data are correct and the formula inaccuracy is given and constant. Rotated Invested Systematic Risk Index (Competition of invested financial markets) (df) = Academic Economic Institute Score Factor + E Economic Trade Factor + Economic Profit Factor + Economic Reserves Factor + Economic Consumer Price Index Factor + Economic Exchange Rate Factor = (BERIFOR, BERIPR, BERIOR, WEFCBC, WEFGE, WEFGR, GDPPC, EIUER, IMDCEP, BERIER and WEFIF) + (GDPPP, IP and EP) + (IPY, IPG and EG) + (NRFEG) + (CPI) + (ER) +e (formula inaccuracy) =0.954*BERIFOR+0.931*BERIPR+0.908*BERIOR+0.893*WEFCBC+0.886* WEFGE+0.861*WEFGR+0.843*GDPPC+0.835*EIUER+0.799*IMDCEP+0.796 *WEFIF+0.778*BERIER+0.959*GDPPP+0.984*IP+0.688*EP+0.889*IPG+0.94 8*IP+0.751*EG+0.91*NRFEG+0.884 *CPI+ 0.972*ER + e (formula inaccuracy) (2) Hence, the Invested Systematic Risk Index (ISRI), the Rotated Invested Systematic Risk Index RISRI, the Growth Rating of the Invested Systematic Risk Index, and the Growth Rating of the Rotated Invested Systematic Risk Index of each of the ten industrial regions are presented in Table 1. To take the further step, the five-year beta priority numbers of the twelve stock markets (USA New York, USA NASDAQ, Japan, Taiwan, Singapore, Korea, Hong Kong, Brazil, Russia, India, China Shanghai and China Shenzhen) from the ten industrial regions (USA, Japan, Taiwan, Singapore, Korea, Hong Kong, Brazil, Russia, India and China) through CAPM ( E ( RStock ) = R f + β Stock × ⎡ E ( RMin ) − R f ⎤ ) of portfolio theory and the two ⎣
⎦
equations: macroeconomic model (1) and rotated macroeconomic model (2).
A Comprehensive Macroeconomic Model for Global Investment
351
Table 1 Factor Analysis of Invested Performance in Twelve Stock Market of Ten Industrial Regions from 2004 to 2008
USA 2004 USA 2005 USA 2006 USA 2007 USA 2008 JAPAN 2004 JAPAN 2005 JAPAN 2006 JAPAN 2007 JAPAN 2008 TAIWAN 2004 TAIWAN 2005 TAIWAN 2006 TAIWAN 2007 TAIWAN 2008 SINGAPORE 2004 SINGAPORE 2005 SINGAPORE 2006 SINGAPORE 2007 SINGAPORE 2008 KOREA 2004 KOREA 2005 KOREA 2006 KOREA 2007 KOREA 2008 HONG KONG 2004 HONG KONG 2005 HONG KONG 2006 HONG KONG 2007 HONG KONG 2008 BRAZIL 2004 BRAZIL 2005 BRAZIL 2006 BRAZIL 2007 BRAZIL 2008 RUSSIA 2004 RUSSIA 2005 RUSSIA 2006 RUSSIA 2007 RUSSIA 2008 INDIA 2004 INDIA 2005 INDIA 2006 INDIA 2007 INDIA 2008 CHINA 2004 CHINA 2005 CHINA 2006 CHINA 2007 CHINA 2008
Growth Rating of InInvested Systematic Risk Rotated Invested Systemvested Systematic Risk Index atic Risk Index Index 47939.05 47452.32 5.69% 50829.85 50476.59 5.69% 53582.36 53399.25 5.14% 55452.83 55332.69 3.37% 57553.24 57523.32 3.65% 30578.81 29678.99 5.24% 32269.18 31352.22 5.24% 34259 33336.81 5.81% 35928.98 35016.01 4.65% 37330.76 36426.45 3.76% 22895.45 21885.17 6.80% 24564.83 23524.94 6.80% 26429.42 25309.2 7.05% 28424.24 27234.41 7.02% 29692.29 28442.72 4.27% 36420.57 34631.95 7.75% 39479.08 37566.93 7.75% 42671.31 40631.67 7.48% 45346.12 43236.08 5.90% 47350.66 45142.68 4.23% 20210.04 19636.25 6.21% 21547.26 20934.21 6.21% 23266.19 22606.01 7.39% 24958.46 24273.97 6.78% 26395.33 25685.36 5.44% 29752.28 28453.7 9.16% 32750.99 31345.82 9.16% 35714.21 34170.09 8.30% 38808.21 37158.84 7.97% 40751.45 39027.63 4.77% 8456.545 8129.662 6.50% 9044.654 8707.226 6.50% 9296.606 8964.129 2.71% 9921.351 9582.526 6.30% 11010.61 10675.02 9.89% 11265.09 10840.45 9.40% 12433.47 11980.63 9.40% 13918.89 13445.71 10.67% 15635.57 15157.75 10.98% 17398.09 16928.12 10.13% 5955.555 5917.939 7.64% 6448.421 6418.842 7.64% 7086.197 7069.151 9.00% 8015.897 8065.254 11.60% 6055.779 6207.333 -32.37% 10686.48 11005.24 12.84% 12260.32 12748 12.84% 14676.83 15261.61 16.46% 16902.61 17641.32 13.17% 15107.45 16007.7 -11.88%
Growth Rating of Rotated Invested Systematic Risk Index 5.99% 5.99% 5.47% 3.49% 3.81% 5.34% 5.34% 5.95% 4.80% 3.87% 6.97% 6.97% 7.05% 7.07% 4.25% 7.81% 7.81% 7.54% 6.02% 4.22% 6.20% 6.20% 7.40% 6.87% 5.49% 9.23% 9.23% 8.27% 8.04% 4.79% 6.63% 6.63% 2.87% 6.45% 10.23% 9.52% 9.52% 10.90% 11.29% 10.46% 7.80% 7.80% 9.20% 12.35% -29.93% 13.67% 13.67% 16.47% 13.49% -10.21%
352
M.-Y. Hsieh et al.
5 Concluding Remarks After the measurement of this research, the five research questions are resolved in detail, which are outstanding findings for global investors who desired to invest in these twelve stock markets from ten industrial regions. Further, global investors are able to forecast the variation of fluctuating systematic risk by measuring the ISRI and RISRI through the combined utilization of CAPM ( E ( RStock ) = R f + β Stock × ⎡⎣ E ( RMin ) − R f ⎤⎦ ), macroeconomic model and rotated macroeconomic model. In term of research limitation, despite the measuring significance of all the consequences, this study comes with some research limitations as expected. The most apparent of these limitations is the generalization of the findings. The sample consisted of 5 years of economic indicators data for the ten industrial regions from 2004 to 2008 with related macroeconomic factors. Further these macroeconomic factors were collected from the official government statistic departments and four main academic economic institutions. For all that, the conclusions of this research are not able to take into entire consideration other macroeconomic sectors (e.g., political, legal, technology). This will require additional data collection, greater discussion and further investigation. Notwithstanding, its limitations, this thesis on reducing the invested systematic risk in twelve stock markets from ten industrial regions through portfolio theory and macroeconomic model makes some contributions to the literature and future direction. Further, looking back, this research discusses across ten industrial regions by analyzing five-year beta priority numbers (beta coefficients) of twelve stock markets (USA New York, USA NASQAQ, Japan Tokyo, Taiwan, Singapore, Korea, Hong Kong, Brazil, Russia, India, China Shanghai and China Shenzhen) from 2004 to 2008. Before 2006, the banking industry, based on the disputed stress of public opinion and requirement of Taiwan domestically financial institutes, the Taiwan Government started to allow financial institutions to set up offices and braches (in limited numbers) in Mainland China in 2001. The Taiwan financial institutes established 14 sub-banks, 200 braches and 242 offices in 25 cities in Mainland China. It also included 115 financial institutes directly authorized to run the monetary business of Renminbi. In the stock market, under the protection of Mainland Government financial policies, only 8 overseas stock investment institutions are authorized to established direct joint-stock companies and about 24 overseas fundinvest-institutes are allowed to establish direct joint venture fund management companies. After the candidate of the Kuomintang Party, Ma Ying-jeou, won the competitive presidential elections in March 2008, the party’s main objective was to restart a direct connection channel with Mainland China on a governmental level and to address the concept of the “Cross-Strait Market”. It is the first time that the Taiwan Government seriously considered the concept of “developing hinterland” which means there will be the numerous economic and financial benefits if Taiwan (Taipei) regard as Mainland China (Shanghai) to be a cooperateddeveloping-area. The international funds (hot-money) that flows through Taiwan (Taipei) for the subsequent investment in Mainland China (Shanghai) is going to bring an unaccountable advantage for Taiwan if the “direct three-point transportation policy” can effectively be implemented which will restart the high-growth boom in the Taiwan economy.
A Comprehensive Macroeconomic Model for Global Investment
353
References [1] Markowitz, H.M.: The early history of portfolio theory: 1600-1960. Financial Analysts Journal 55(4), 12–23 (1999) [2] Sharpe, W.F.: Capital asset prices: A theory of market equilibrium under conditions of risk. Journal of Finance 19(3), 425–442 (1964) [3] Ross, S.A.: The Capital Asset Pricing Model (CAPM), Short-sale Restrictions and Related Issues. Journal of Finance 32, 177 (1977) [4] David, B., Calomiris, C.W.: Statement of Charles W. Calomiris Before a joint meeting of the Subcommittee on Housing and Community Opportunity and the Subcommittee on Financial Institutions and Consumer Credit of the Committee on Financial Services. U.S. House of Representatives, 1–34 [5] Hicks, J.R.: The Theory of Uncertainty and Profit. Economica 32, 170–189 (1931) [6] Hicks, J.R.: Value and Capital. Economic Journal 72, 87–102 (1939) [7] Burmeister, E.W., Kent, D.: The arbitrage pricing theory and macroeconomic factor measures. Financial Review 21(1), 1–20 (1986) [8] Bautista, R.M.: Macroeconomic Models for East Asian Developing Countries. AsianPacific Economic Literature 2(2), 15–25 (1988) [9] Bautista, C.C.: The PCPS Annual Macroeconomic Model (1993) (manuscript) [10] Challen, D.W., Hagger, A.J.: Macroeconometric Systems: Construction, Validations and Applications. Macmillan Press Ltd, London (1983) [11] Lai, C.-S., et al.: Utilize GRA to Evaluate the Investment Circumstance of the Ten Industrial Regions after the Globally Financial Crisis. The Journal of Grey System 13(4) (2010) [12] Kidane, A., Kocklaeuner, G.: A Macroeconometric Model for Ethiopia: Specification, Estimation, Forecast and Control. Eastern Africa Economic Review 1, 1–12 (1985) [13] Mallick, S.K., Mohsin, M.: On the Effects of Inflation Shocks in a Small Open Economy. Australian Economic Review. The University of Melbourne, Melbourne Institute of Applied Economic and Social Research 40(3), 253–266 (2007) [14] McKinnon, R.: Money and Capital in Economic Development. The Brookings Institution, Washington, D.C (1973) [15] Arestis, P., et al.: Financial globalization: the need for a single currency and a global central bank. Journal of Post Keynesian Economics 27(3), 507–531 (2005) [16] Hsieh, M.-Y., et al.: An Macroeconomic Analysis in Stock Market of the Ten Industrial and Emerging Regions by Utilizing the Portfolio Theory. In: nternational Conference in Business and Information, Kitakyushu, Japan (2010)
A DEMATEL Based Network Process for Deriving Factors Influencing the Acceptance of Tablet Personal Computers Chi-Yo Huang, Yi-Fan Lin, and Gwo-Hshiung Tzeng
*
Abstract. The tablet personal computers (Tablet PCs) emerged recently as one of the most popular consumer electronics devices. Consequently, analyzing and predicting the consumer purchasing behaviors of Tablet PCs for fulfilling customers’ needs has become an indispensable task for marketing managers of IT (information technology) firms. However, the predictions are not easy. The consumer electronics technology evolved rapidly. Market leaders including Apple, ASUS, Acer, etc. are also competing in the same segmentation by providing similar products which further complicated the competitive situation. How the consumers’ acceptance of novel Tablet PCs can be analyzed and predicted have become an important but difficult task. In order to accurately analyze the factors influencing consumers’ acceptance of Tablet PCs and predict the consumer behavior, the Technology Acceptance Model (TAM) and the Lead User Method will be introduced. Further, the differences in the factors being recognized by both lead users as well as mass customers will be compared. The possible customers’ needs will first be collected and summarized by reviewing literature on the TAM. Then, the causal relationship between the factors influencing the consumer behaviors being recognized by both the lead users as well as the mass customers will be derived by the DEMATEL based network process (DNP) and the Structural Equation Modeling (SEM) respectively. An empirical study based on the Taiwanese Tablet PC users will be leveraged for comparing the results being derived by the DNP and the SEM. Based on the DNP based lead user method, the perceived usefulness, perceived ease of use, attitude and Chi-Yo Huang · Yi-Fan Lin Department of Industrial Education, National Taiwan Normal University No. 162, Hoping East Road I, Taipei 106, Taiwan e-mail:
[email protected] *
Gwo-Hshiung Tzeng Department of Business and Entrepreneurial Administration, Kainan University No. 1, Kainan Road, Luchu, Taoyuan County 338, Taiwan Gwo-Hshiung Tzeng Institute of Management of Technology, National Chiao Tung University Ta-Hsuch Road, Hsinchu 300, Taiwan e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 355–365. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
356
C.-Y. Huang, Y.-F. Lin, and G.-H. Tzeng
behavioral intention are perceived as the most important factors for influencing the users’ acceptance of Tablet PCs. The research results can serve as a basis for IT marketing managers’ strategy definitions. The proposed methodology can be used for analyzing and predicting customers’ preferences and acceptances of high technology products in the future. Keywords: Technology Acceptance Model (TAM), Lead User Method, DEMATEL based Network Process (DNP), Structural Equation Modeling (SEM), Multiple Criteria Decision Making (MCDM), Tablet Personal Computer (Tablet PC).
1 Introduction During the past decades, social and personality psychologists attempted to study human behaviors. However, considering the complexity, explanation and prediction of human behavior is a difficult task. The concepts referring to behavioral intentions like the social attitude, personal intention and personality traits have played important roles in prediction of human behaviors (Ajzen 1989; Bagozzi 1981). The tablet personal computers (Tablet PCs), a portable PC being equipped with a touchscreen as the input device (Beck et al. 2009), emerged recently as one of the most popular consumer electronics devices. Analyzing and predicting the consumer behaviors of Tablet PCs for fulfilling customers’ needs has become an indispensable task for marketing managers. However, the predictions are not easy. The consumer electronics technology evolved rapidly which shorten the product life cycles. Market leaders including Apple, ASUS, Acer, etc. are competing in the same segmentation by providing like products. Further, various alternatives including the notebook computers, large-screen smart phones, etc. may replace the Tablet PCs. Above phenomenon further complicated the competitive situation. How the consumers’ acceptance of novel Tablet PCs can be analyzed and predicted have become an important issue for marketing managers of Tablet PC providers. However, the analysis and prediction of consumer behaviors are not easy from the above aspects mentioned aspects on PLC, competition, and alternative products. . In order to accurately derive the factors influencing consumers’ acceptance of Tablet PCs and predict the purchase behaviors, the Technology Acceptance Model (TAM) and the Lead User Method (LUM) will be introduced as the theoretic basis of this analysis. The TAM can be used for illustrating the factors influencing the consumers’ acceptance of future Tablet PCs. Meanwhile, the LUM is more suitable for evaluating the future high technology innovations which are disruptive in nature. However, to demonstrate the differences between the results being derived based on the mass customers’ opinions and the results based on lead users’, the mass customers’ opinions will also be surveyed to demonstrate the differences. Consequently, a novel DEMATEL based network process (DNP) based multiple criteria decision making (MCDM) framework will be proposed for deriving the acceptance intention of Tablet PCs. The criteria for evaluating factors influencing the acceptance of the Tablet PCs will first be summarized by the
A DEMATEL Based Network Process for Deriving Factors
357
literature review. After that, the casual structure corresponding to the acceptance intention of Tablet PCs will be derived by using the DNP and the SEM methods. The structure versus each criterion from lead users’ perspective will be established by using the DNP. Then, the criteria weights of lead users will be calculated by the DNP method as well as the traditional SEM based statistical techniques will be used to derive the casual relationships for the opinions of the lead users’ and the mass customers’, respectively. Finally, the analytic result based on both the lead users’ and mass customers’ perspectives will be compared. A pilot study on the feasibility of the DNP based LUM on Tablet PCs’ acceptance predictions and the SEM based derivations of the TAM will be based on the opinions of four lead users now serving in the world’s leading IT firms and thirty Taiwanese consumers with the intention to purchase Tablet PCs. The empirical study results can serve as the basis for marketers’ understanding of consumers’ intentions of accepting the Tablet PCs. Meanwhile, such knowledge can serve as the foundation for future marketing strategy definitions. The remainder of this article is organized as follows. The related literature regarding to technology acceptance theories and TAM model will be reviewed in Section 2. The analytic framework based on the DNP framework and the SEM method will be introduced in Section 3. Then, in Section 4, an empirical study follows, designing the Tablet Personal Computer on the proposed DNP and SEM based TAM framework. Managerial implications as well as discussion will be presented in Section 5. Finally, the whole article will be concluded in Section 6.
2 Human Behavior Theory – Concepts and Models During the past decades, scholars have tried to develop concepts and models for formulating and predicting of human behaviors. Further, to explain and predict the factors which influence users’ acceptance of a new product, the theories and models, including the TRA (Theory of Reasoned Action), the TPB (Theory of Planned Behavior), and the TAM have been developed. In this Section, related theories, concepts and models will be reviewed for serving as a basis for the analytic framework development in this research. The TRA, a prediction model referring to the social psychology, attempts to derive the requirements of intended behaviors regarding to the acceptance of users. The relationship versus each variable in the TRA is presented in Fig. 1(a) (Davis 1986; Davis et al. 1989; Fishbein and Ajzen 1974; Hu and Bentler 1999). The TRA is a general theory, which demonstrates that the beliefs are operative for human behavior and user acceptance. Therefore, researchers applying the TRA should identify whether the belief is a prominent variable for users regarding the product acceptance behavior (Davis et al. 1989). The Theory of Planned Behavior (TPB) was developed by Ajzen (1987) to improve prediction capability of the TRA. The TPB framework intends to deal with complicated social behaviors of users toward some specific product (Ajzen and Fishbein 1980). In order to improve and overcome the shortcoming of the TRA, Ajzen concentrated on the cognitive self-regulation as an important aspect of the human behavior. The relationships between those influence factors are demonstrated in Fig 1(b).
358
C.-Y. Huang, Y.-F. Lin, and G.-H. Tzeng
The TAM, an adaptation of the TRA, was proposed especially for predicting the users’ acceptance of information systems (Davis et al. 1989) by Davis (1986). The TAM is in general, capable of explaining user behaviors across a broad range of end-user computing technologies and user populations, while at the same time being both parsimonious and theoretically justified (Davis 1986; Davis et al. 1989). The TAM posits that two particular beliefs, perceived usefulness and perceived ease of use, are of primary relevance for computer acceptance behaviors (Fig. 1(c)).
Source: Fishbein & Ajzen (1974) (a)
Source: Ajzen (1985) (b)
Source: Davis (1986) (c)
Fig. 1 (a) TRA, (b) TPB, and (c) TAM
In general, the early adopters’ of a novel product behave differently from the mass consumers. However, the bulk of the users will follow early adopters when the market is mature (Rogers 1962). Consequently, the LUM, a market analysis technique, is applied to the development of new products and services (Urban and Hippel 1988). The methodology is composed of four major steps based on the work of Urban and Hippel (1988): (1) specify lead user indicators, (2) identify lead user group, (3) generate concept (product) with lead users and (4) test lead user concept (product). Further details can be found in the earlier work by Urban and Hippel (1988).
3 Analytic Framework for Deriving the TAM In order to build the analytical framework for comparing the factors influencing the acceptance of the Tablet PCs from the aspects of lead users and mass customers, the DNP based MCDM framework and the SEM will be introduced. At first, the criteria being suitable for measuring the users’ acceptance of Tablet PCs will be derived based on literature review. The factors being identified by Davis in the TAM will be introduced. The DNP will then be introduced for deriving the causal relationship and weights versus each criterion from the lead users’ aspect. The SEM will be introduced for deriving the causal structure between the factors from the viewpoint of the mass customers’ at the same time. Finally, the result from mass customers and lead users will be compared. In summary, the evaluation framework consists of four main steps: (1) deriving the factors influencing customers’ acceptance of the Tablet PCs by literature review; (2) evaluating the determinants of mass customers’ acceptance by using SEM; (3) evaluating the
A DEMATEL Based Network Process for Deriving Factors
359
determinants, causal relationship and criteria weights of lead users’ acceptance by applying DNP; and finally, (4) comparing the result of mass customers and lead users.
3.1 The DNP The DNP is an MCDM framework consisting of the DEMATEL and the ANP. The DEMATEL technique was developed by the Battelle Geneva Institute to analyze complex “real world problems” dealing mainly with interactive map-model techniques (Gabus and Fontela 1972) and to evaluate qualitative and factor-linked aspects of societal problems. The DEMATEL technique was developed with the belief that the pioneering and proper use of scientific research methods could help to illuminate specific and intertwined phenomena and contribute to the recognition of practical solutions through a hierarchical structure. The ANP is general form of the analytic hierarchy process (AHP) (Satty 1980) which has been used in MCDM problems by releasing the restriction of the hierarchical structure and the assumptions of independence between criteria. Combining the DEMATEL and ANP method, which had been reviewed in this Section, the steps of this method can be summarized on the work of Prof. Tzeng, Gwo-Hshiung (Wei et al. 2010): Step 1: Calculate the direct-influence matrix by scores. Based on experts’ opinions, the relationships between criteria can be derived based on mutual influences. The scale ranges from 0 to 4, representing “no influence” (0), “low influence” (1), “medium influence” (2), “high influence” (3), and “very high influence” (4), respectively. Respondents are requested to indicate the direct influence of a factor i on a factor j , or dij . The direct influence matrix D can thus be derived. Step 2: Normalize the direct-influence matrix based on the direct-influence matrix D . The normalized direct relation matrix X can be derived by using n
N = vD; v = min{1 / max i
n
∑ d ,1 / max ∑ d }, i, j ∈{1,2,..., n} . ij
j =1
ij
j
i =1
Step 3: Attain the total-influence matrix T . Once the normalized direct-influence matrix N is obtained, the total-influence matrix T of NRM can further be derived by using T = N + N 2 + ... + N k = N ( I - N )-1 , where k → ∞ and T is a total influence-related matrix. N is a direct influence matrix and N = [ xij ]n×n ;
(
lim N 2 + " + N k
k →∞
)
n
stands for a indirect influence matrix and 0 ≤
∑ i =1
xij < 1 , and only one
ij
< 1 or
j =1
n
n
0≤
∑x
∑
n
xij
or
j =1
∑x
ij
equal to 1 for ∀i, j . So
i =1
lim N k = [0]n×n . The element tij of the matrix T denotes the direct and indirect
k →∞
influences of the factor i on the factor j .
360
C.-Y. Huang, Y.-F. Lin, and G.-H. Tzeng
Step 4: Analyze the result. In this stage, the row and column sums are separately denoted as r and c within the total-relation matrix T through equations ⎡ n ⎤ ⎡ n ⎤ T = [tij ], i, j ∈ {1, 2,..., n} , r = [ ri ]n×1 = ⎢ tij ⎥ and c = [c j ]1×n = ⎢ tij ⎥ , ⎢ ⎥ ⎢ i =1 ⎥ = 1 j ⎣ ⎦ ⎣ ⎦ n×1 1× n where the r and c vectors denote the sums of the rows and columns,
∑
∑
respectively. Suppose the ri denotes the row sum of the ith row of the matrix T . Then, ri is the sum of the influences dispatching from the factor i to the other factors, both directly and indirectly. Suppose that c j denotes the column sum of the j th column of the matrix T . Then, c j is the sum of the influences that factor i is receiving from the other factors. Furthermore, when i = j (i.e., the sum of the row sum and the column sum), (ri + ci ) represents the index representing the strength of the influence, both dispatching and receiving), where (ri + ci ) is the degree of the central role the factor i plays in the problem. If (ri - ci ) is positive, then the factor i primarily is dispatching influences upon the strength of other factors; and if ( ri - ci ) is negative, then the factor i primarily is receiving influence from other factors. Therefore, a causal graph can be achieved by mapping the dataset of ( ri + si , ri − si ) providing a valuable approach for decision making (Chiu et al. 2006; Huang and Tzeng 2007; Tamura et al. 2002; Huang et al. 2011; Tzeng and Huang 2011). can first Let the total-influence matrix TC = ⎡⎣tij ⎤⎦ . The matrix TD = ⎡⎣tijD ⎤⎦ nxn nxn be derived based on the dimensions (or clusters) from TC . Then, the weights versus each criterion can be derived by using the ANP based on the influence matrix TD . m
⎡ t D11 " t D1 j 1j ⎢ 11 ⎢ # # # ⎢ Dij D i 1 ⎢ TD = ti1 " tij ⎢ ⎢ # # ⎢ Dmj D ⎢t m1 " tmj ⎣ m1
⎤ ⎯⎯ " → d1 = ⎥ # # ⎥ ⎥ D → di = " timim ⎥ ⎯⎯ ⎥ # # ⎥ ⎥ Dmm ⎥ ⎯⎯ →d = " tmm ⎦ m t1Dm1m
∑t
D1 j 1j
j =1 m
∑
D
tij ij , di =
j =1 m
∑t
m
∑t
Dij ij , i
= 1,..., m
j =1
Dmj mj
j =1
Step 5: The original supermatrix of eigenvectors is obtained from the total-influence matrix T = [tij ] . For example, D values of the clusters in matrix TD . Where if tij < D , then tijD = 0 else tijD = tij , and tij is in the total-influence
matrix T . The total-influence matrix T can be normalized as follows.
A DEMATEL Based Network Process for Deriving Factors
361
⎡ t D11 / d " t D1 j / d " t D1m / d ⎤ ⎡ α D11 " α D1 j 1 1 1 ⎥ ⎢ 11 1j 1m 1j ⎢ 11 ⎢ ⎥ ⎢ # # # # # # # # ⎢ ⎥ ⎢ D D Dim D TD = ⎢ tiD1 i1 / di " tij ij / di " tim / di ⎥ = ⎢ α i1 i1 " α ij ij ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ # # # # # # ⎢ ⎥ ⎢ Dmj Dmj D D D m m 1 1 mm ⎢t ⎥ ⎢ ⎣ m1 / d m " tmj / d m " tmm / d m ⎦ ⎣α m1 " α mj D
" α1Dm1m ⎤ ⎥ # # ⎥ ⎥ D " α imim ⎥ ⎥ # # ⎥ ⎥ Dmm ⎥ " α mm ⎦
D
where α ij ij = tij ij / di . This research adopts the normalized total-influence matrix TD (here after abbreviated to “the normalized matrix”) and the unweighted
supermatrix W using the following equation. The unweighted supermatrix can serve as the basis for deriving the weighted supermatrix. D21 ⎡ α D11 × W × W12 α 21 " 11 ⎢ 11 D D 12 22 ⎢ α × W21 α " 22 × W 22 ⎢ 12 * D ji W =⎢ α ji × Wij # " ⎢ ⎢ # # # ⎢ D D ⎢⎣α1m1m × Wm1 α 2m2 m × Wm 2 "
"
α mD1m1 × W1m ⎤
⎥ ⎥ ⎥ Dmi " α mi × Wim ⎥ ⎥ ⎥ # # ⎥ Dmm " α mm × Wmm ⎥⎦ "
#
Step 6: Limit the weighted supermatrix by raising it to a sufficiently large power k, or lim k →∞ (W * ) k , until the supermatrix has converged and become a long-term stable supermatrix. The global priority vectors or the ANP weights can thus be derived.
3.2 The SEM The primary aim of the SEM technique is to analyze latent variables and derive the causal relations between latent constructs to verify a theory. To develop a research based on the SEM method, related multivariate methods and procedures will be summarized based on the work by Schumacker and Lomax (1996): (1) theoretical framework development: the model will be derived and developed by literature review; (2) model specification: the hypothetical model will be constructed and the observed and latent variables will be defined; (3) model identification: the most appropriate number of criteria will be estimated; (4) sample and measurement: user opinions will be collected by questionnaires; (5) parameter estimation: the relationship versus criteria will be estimated by using the multiple regression analysis, the path analysis, and the factor analysis; (6) fitness assessment: the fitness between the criteria and the model will be confirmed by calculating the goodness of fit; (7) model modification: the poorly fit criteria will be deleted or modified if the goodness of fit is poor; (8) result discussion: the management implication will be discussed.
362
C.-Y. Huang, Y.-F. Lin, and G.-H. Tzeng
4 Empirical Study In order to verify the framework being mentioned in Section 3 and demonstrate the efficiency of the LUM in differentiating the casual relationships being derived based on the lead users and the mass customers, an empirical study based on the pilot research results by surveying four experts now serving in the world’s leading IT firms providing Tablet PC related products and thirty mass customers being interested in purchasing Tablet PCs will be surveyed. The empirical study consists of five stages: (1) deriving factors influencing the acceptance of technology by literature review; (2) deriving the causal relationship versus each requirement of lead users by using the DNP; (3) selecting the significant factors based on the degree of central role and criteria weights in the DNP method; (4) deriving the causal relationship versus each requirement of mass customers by using the SEM; (5) selecting the significant factors based on the total effect of the SEM method. At first, the factors influencing the acceptance of Tablet PCs were collected based on literature review. These factors include (1) the perceived usefulness (PU), (2) the perceived ease of use (PEU), (3) the subjective norm (SN), (4) the perceived behavioral control (PBC), (5) the attitude (ATT), (6) the behavioral intention (BI), (7) the actual system use where (B) refers to the actual behaviors of users (Davis 1986). After literature review, the causal relationship and structure versus each requirement (criteria) through the lead users’ perspective was derived by using the DNP. Then, the SEM is applied to derive the relationships and structure from the viewpoints of the mass customers’. In order to derive the causal structure based on the opinions of the lead users’, four experts now serving in the world’s leading IT firms were invited. Then, the causal structure can be derived by using the DNP. After the derivation of the total influence matrix, the casual relationship can be derived by setting 0.599 as the threshold. Based on the empirical study results, the DNP, PU, PEU, BI, ATT, PBC, SN and B can serve as the factors for predicting the acceptance of the Tablet PCs. However, there is no influence between the PBC and other criteria. Consequently, the actual purchase can’t be predicted by the PBC. The casual structure being derived by using the DNP is demonstrated in Fig.2(a). Further, the DNP is applied to derive the weights versus each factor, which are 0.143. 0.155, 0.144, 0.149, 0.133, 0.126, and 0.151 for the PU, PEU, ATT, BI, SN, PBC, and B, respectively. According to the weights being derived, the ATT, PU, PEU and BI are important factors based on the viewpoint of the lead users while the PBC is regarded as non-vital. For deriving the mass customers’ viewpoints by using the SEM, first, this research regards the SNs and the PBCs as external variables in the TAM. In a good fit model, the p-value should be greater than 0.5 while the RMSEA should be smaller than 0.05. However, in this pilot study, the Thus, the assumption couldn’t TAM theory from the mass customers’ perspective. Further, this research attempts to delete subject norms and perceived behavioral controls. The fit statistics, chi-square (74.66), p-value (1.00), RMSEA (0.00) were all indicators of a good fit after the reduction. On the other hand, the path coefficients referring to the refer to the casual relationship between two latent variables, including a
A DEMATEL Based Network Process for Deriving Factors
363
dependent variable and an independent variable can be derived by using the multiple regressions. Namely, the dependent variable can be predicted by the independent variable and path coefficient. The stronger the influence and the casual relationship are, the higher the path coefficient can be. The analytic result being derived by using the SEM is demonstrated in Fig. 2(b). The path coefficients versus each latent variable (refers to the influence) are demonstrated in Fig. 2 (c). According to the analytic results, the PU and the ATT significantly influenced the PEU. The BI also influences the actual behavior most significantly.
(a)
(b)
(c)
Fig. 2 (a) The Causal Relationship being Derived by the DNP, (b) The Causal Relationship by the SEM, (c) The Path Coefficients being Derived by the SEM
5 Discussion This research aims to establish an analytic procedure for deriving the factors influencing consumers’ intention of accepting future Tablet PCs by using the DNP based LUM. Differences between the casual structure being derived based on the opinions of lead users’ and those of the mass customers’. Managerial implications and advances in research methods will be discussed in this Section. Regarding to the managerial implication, both the casual relationship structure and strength of influences between the factors are different from the aspects of the lead users’ and mass customers’. From the aspect of the lead users, the PU, PEU, ATT, and BI per are recognized as significant factors for influencing customers’ acceptance of the Tablet PCs. From the perspective of mass customers, there is no causal relationship between SN, PBC and other criteria. Therefore, the actual usage of Tablet PCs can’t be predicted by SN and the PBC. On the other hand, the BIs influence the actual usage significantly. The PU and PEU will also influence the actual usage indirectly. Further, the SN and the PBC are unsuitable for the acceptance of Tablet PCs. Regarding to the importance of predicting acceptance factors, the PEU and the BIs influence other factors the most. Consequently,
364
C.-Y. Huang, Y.-F. Lin, and G.-H. Tzeng
comparing the analytic results being derived by both the lead users and the mass customers, the BI was recognized by both the lead users and the mass customers. The PU and the PEU were recognized as important by the lead users only while the PEU is recognized as important for mass customers only. Regarding to the advances in the research methods, the SEM and the DNP have been verified and compared in this research. The strength of casual relationship will be derived by SEM. However, the strength of casual relationship and casual structure will be derived by DNP. Both the SEM and the DNP can be used to derive the casual relationship and the corresponding weight versus each criterion. Nevertheless, the casual path versus each criterion should be set by the social sciences theory (i.e. management, consumer behavior and marketing). Then, the strength of the casual relationship will be deriving by the SEM. On the contrary, both of the casual structure and the strength of influence can be derived by the DNP. Namely, the SEM can be applied to solve the problem with the pre-defined casual structure. However, the DNP can be applied to solve the problem without a pre-defined casual structure.
6 Conclusions The consumer behavior prediction of Tablet PCs is a difficult and indispensable task due to the fast emerging technology and severe competitions of the vendors. This research attempted to predict and compare the consumer behaviors based on the acceptance of mass customers’ and lead users’ by using the SEM and the novel DNP method being proposed by Prof. Gwo-Hshiung Tzeng. According to the analytic results, both the preferences and casual structure are divergent between lead users and mass customers. On one hand, the perceived ease of use and behavioral intention are important for predicting the acceptance of mass customers. On the other hand, the perceived usefulness and behavioral intention influence the acceptance of lead users. Last but not least, the feasibility of SEM and DNP had been verified in this research. These two methods can be applied to solve different problems. The SEM can be applied to solve the problem with the defined casual structure. The DNP can be applied to solve the problem without defined casual structure.
References Ajzen, I.: Attitudes, traits, and actions: Dispositional prediction of behavior in personality and social psychology. Advances in experimental social psychology 20, 1–63 (1987) Ajzen, I.: Attitudes, personality, and behavior. Open University Press, Milton Keynes (1989) Ajzen, I., Fishbein, M.: Understanding Attitudes and Predicting Social Behavior. Prentice-Hall, Englewood Cliffs (1980) Bagozzi, R.P.: Attitudes, intentions, and behavior: A test of some key hypotheses. Journal of Personality and Social Psychology 41(4), 607–627 (1981) Beck, H., Mylonas, A., Rasmussen, R.: Business Communication and Technologies in a Changing World. Macmillan Education Australia (2009)
A DEMATEL Based Network Process for Deriving Factors
365
Chiu, Y.J., Chen, H.C., Shyu, J.Z., Tzeng, G.H.: Marketing strategy based on customer behavior for the LCD-TV. International Journal of Management and Decision Making 7(2/3), 143–165 (2006) Davis, F.D.: A Technology Acceptance Model for Empirically Testing New End-User Information Systems: Theory and Results, Doctoral dissertation (1986) Davis, F.D., Bagozzi, R.P., Warshaw, P.R.: User acceptance of computer technology: a comparison of two theoretical models. Management Science 35(8), 982–1003 (1989) Fishbein, M., Ajzen, I.: Attitudes toward objects as predictors of single and multiple behavioral criteria. Psychological Review 81(1), 59–74 (1974) Gabus, A., Fontela, E.: World Problems, an Invitation to Further Thought Within the Framework of DEMATEL. Batelle Geneva Research Center, Switzerland (1972) Hu, L.T., Bentler, P.M.: Cut off criteria for fit indexes in covariance. Structural Equation Modeling 6(1), 1–55 (1999) Huang, C.Y., Tzeng, G.H.: Reconfiguring the Innovation Policy Portfolios for Taiwan’s SIP Mall Industry. Technovation 27(12), 744–765 (2007) Huang, C.Y., Hong, Y.H., Tzeng, G.H.: Assessment of the Appropriate Fuel Cell Technology for the Next Generation Hybrid Power Automobiles. Journal of Advanced Computational Intelligence and Intelligent Informatics (2011) (forthcoming) Rogers, E.M.: Diffusion of innovation (1962) (accessed) Satty, T.L.: The Analytic Hierarchy Process. McGraw-Hill, New York (1980) Schumacker, R.E., Lomax, R.G.: A Beginner’s Guide to Structural Equation Modeling. Lawrence Erlbaum Associates, Publishers, Mahwah (1996) Tamura, H., Akazawa, K., Nagata, H.: Structural modeling of uneasy factors for creating safe, secure and reliable society. Paper Presented at the SICE System Integration Division Annual Conference (2002) Tzeng, G.-H., Huang, C.-Y.: Combined DEMATEL technique with hybrid MCDM methods for creating the aspired intelligent global manufacturing & logistics systems. Annals of Operations Research, 1–32 (2011) Urban, G., Von Hippel, E.: Lead User Analyses for the Development of New Industrial Products. Management Science 34(5), 569–582 (1988) Wei, P.L., Huang, J.H., Tzeng, G.H., Wu, S.I.: Causal modeling of Web-Advertising Effects by improving SEM based on DEMATEL technique. Information Technology & Decision Making 9(5), 799–829 (2010)
A Map Information Sharing System among Refugees in Disaster Areas, on the Basis of Ad-Hoc Networks Koichi Asakura, Takuya Chiba, and Toyohide Watanabe
Abstract. In disaster areas, some roads cannot be passed thorough because of road destruction or rubbles from collapsed buildings. Thus, information on safe roads that can be used for evacuation is very important and has to be shared among refugees. In this paper, we propose a map information sharing system for refugees in disaster areas. This system stores the roads passed by a refugee as map information. When another refugee comes close to the refugee, they exchange their map information each other in ad-hoc network manner. In this exchange, in order to reduce communication frequency, the Minimum Bounding Rectangle (MBR) for map information is calculated for comparing the similarity of map information. Experimental results show that the quantity of map information increases by data exchange between refugees and that the frequency of communication is reduced by using the MBR. Keywords: information sharing, ad-hoc network, disaster area, minimum bounding rectangle. Koichi Asakura Department of Information Systems, School of Informatics, Daido University, 10-3 Takiharu-cho, Minami-ku, Nagoya 457-8530, Japan e-mail:
[email protected] Takuya Chiba Department of Information Systems, School of Informatics, Daido University, 10-3 Takiharu-cho, Minami-ku, Nagoya 457-8530, Japan e-mail:
[email protected] Toyohide Watanabe Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan e-mail:
[email protected]
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 367–376. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
368
K. Asakura, T. Chiba, and T. Watanabe
1 Introduction For communication systems in disaster situations such as a big earthquake, mobile ad-hoc network (MANET) technologies have attracted great attention recently[1, 2]. In such situations, mobile phones and wireless LAN networks cannot be used since communication infrastructures such as base stations and WiFi access points may be broken or malfunction. The MANET is defined as an autonomous network by a collection of nodes that communicate with each other without any communication infrastructures[3, 4, 5]. Thus, the MANET is suitable for communication systems in disaster areas since it does not require any communication infrastructures, In this paper, we propose a map information sharing system among refugees in disaster areas which is based on the MANET technologies. In order for refugees to evacuate to shelters quickly, correct and up-to-the-minute map information is essential. Namely, information on roads’ condition, that is which roads can be passed thorough safely, which roads has to be selected for quick evacuation and so on, is very important for refugees. Our proposed system stores passing roads as map information. When a refugee comes close to another refugee, the systems exchange their map information each other. This exchange is performed in ad-hoc network manner. Thus, map information of refugees is merged and information on roads that can be passed safely in the disaster situation is collected without any communication infrastructures. The rest of this paper is organized as follows. Section 2 describes related work. Section 3 describes our proposed system in detail. Section 4 explains our experiments. Finally, Section 5 concludes this paper and gives our future work.
2 Related Work Many map information systems have been developed so far. Roughly, map information systems can be categorized into two types: static map information systems and networked map information systems. In static map information systems[6, 7], map information such as road networks, building information and so on is installed in users’ computers before they are used. Thus, this system can be used quickly everywhere even if there are no equipments for network connection. However, the installed map information is not updated in real-time manner, which decreases correctness and timeliness of information. On the other hand, many products for networked map information systems such as Google Maps[8], Yahoo Local Maps[9], and so on, have been developed. In this type of map information systems, map information is stored on server computers of a service provider and users acquire map information through the Internet. Thus, ubiquitous network environment must be provided. As mentioned later, to provide a network connection environment is very difficult in disaster areas. In some navigation systems, the optimal route is provided by using real time traffic information. For example, Vehicle Information and Communication System (VICS) provides traffic information such as traffic jams, traffic accidents and so on,
A Map Information Sharing System among Refugees in Disaster Areas
369
in real time[10, 11]. However, such traffic information is gathered by static sensors. Namely, information is measured by static sensors in many different locations and gathered to central servers by using static networks. Thus, infrastructure equipments must be provided and operated correctly. However, such static equipments may be broken or malfunction in disaster situation. Thus, refugees cannot use these systems in disaster situation. Communication systems based on the MANET technologies in disaster areas have been proposed[12, 13]. However, these systems are mainly used for rescuers in government, and thus do not focus on real time information sharing among refugees.
3 Map Information Sharing among Refugees 3.1 System Overview In a disaster area such as a big earthquake, the most important factors with respect to information are correctness and timeliness. In such an area, the situation is different from that in normal times. Furthermore, it changes from moment to moment: for example, roads cannot be used because of fire spread, rubbles from collapsed buildings, and so on. Thus, for developing effective information systems for refugees, correctness and timeliness have to be taken into account. In order to keep correctness and timeliness of information in disaster situation, we must not rely on static infrastructure equipments such as sensors and networks. This is because such static equipments cannot be used in disaster situation. Thus, we have to develop an information sharing mechanism that works with client terminals, namely without any central servers. Figure 1 shows our proposed map information sharing system for refugees in disaster areas. This system provide refugees with information on safe roads that can be used for evacuation to shelters. Our proposed system consists of only client terminals that have static map data for map matching, ad-hoc network communication mechanisms for communicating with neighboring terminals and Global Positioning System (GPS) for acquiring current positions of refugees. When a refugee moves to a shelter for evacuation, the system stores passed roads as map information. This map information states that roads stored in the system can be passed through for evacuation. Such timely map information is very useful not only for other refugees but also for rescuers since information on whether a road can be used or not, can be acquired only on the spot. When another refugee comes close, refugees’ terminals communicate with each other in the ad-hoc network manner, and exchange their map information. By this system, we can collect and share map information in real time, which enables refugees to move to shelters by using safe and available roads.
370
K. Asakura, T. Chiba, and T. Watanabe
Trajectory of refugee A
Trajectory of refugee B
Trajectory of refugee C
Sharing map information by ad-hoc network
Safety road map in the disaster area
Fig. 1 System overview
x x x x x x xxx xx xx x xx x x x x xx x x
(a) History of positions
r34
p3
p4 r23
r12 p1
p2
(b) Extracted road segments
Fig. 2 Map matching
3.2 Map Information Map information stored in the system represents a history of positions of a refugee. In order to reduce data stored as map information, we introduce a map matching method[14, 15, 16] for extracting road segments passed by a refugee. Road segments used by a refugee are determined based on the sequence of position information acquired by GPS and road network data in static map data. Extracted road segments are stored in the system as map information. Definition 1 (Road segment). A road segment ri j is defined as a two-tuple of intersections pi and p j : ri j = (pi , p j ). Definition 2 (Map information). Map information M is defined as a sequence of road segments which are passed by a refugee: M =< ri j , r jk , · · · , rmn >. Figure 2 shows an example of map information. Figure 2(a) represents road network data and a history of positions of a refugee captured by GPS. By using the map matching method, intersections p1 , · · · , p4 are extracted as points passed by the refugee as shown in Figure 2(b). Then, road segments r12 , r23 and r34 are stored as map information of the refugee.
A Map Information Sharing System among Refugees in Disaster Areas
371
Fig. 3 A minimum bounding rectangle (MBR)
3.3 Exchanging Map Information When a refugee comes close to another refugee, they exchange their map information. This exchange is performed by ad-hoc network communication, which enables refugees to share map information without any communication infrastructures such as the Internet. The processing flow is as follows. 1. A refugee’s terminal sends a beacon packet periodically, which notifies existence of the refugee to surrounding refugees’ terminals. This beacon packet is called the HELLO packet. 2. When a terminal receives a HELLO packet from another terminal, the terminal checks whether exchange of map information is required or not based on the information on the HELLO packet (a detailed algorithm is described later). If the exchange is required, the terminal sends back a packet as reply. This packet is called the REPLY packet. 3. The terminal which receives the REPLY packet sends a packet containing map information. This packet is called the MAP packet. 4. The terminal which receives the MAP packet also sends back a MAP packet. In order to achieve effective information exchange and to reduce power consumption of terminals, we have to reduce the frequency of communication for exchange of map information. This is because the size of MAP packets containing map information is relatively big. If two refugees have almost the same map information, they can omit the data exchange. In order to represent the similarity of map information, we introduce the Minimum Bounding Rectangle (MBR) of road segments in map information. The MBR is a rectangle surrounding all the road segments in map information. Figure 3 shows an example of the MBR. If the ratio of the overlapped area of two MBRs is high, the similarity of map information is also regarded as high, and thus two terminals omit to send MAP packets each other. HELLO Packet The HELLO packet consists of the following attributes. source ID:
This shows a unique identifier of the sender terminal.
372
K. Asakura, T. Chiba, and T. Watanabe
ALGORITHM: SIMILARITY Input: MBRs . Output: Similarity. BEGIN Calculate the minimum bounding rectangle for a refugee’s own map information: MBRr . MBRov := overlapped rectangle between MBRs and MBRr . Sov := area of MBRov . S := max(area of MBRs , area of MBRr ). Similarity := Sov / S. return Similarity. END Fig. 4 An algorithm for calculating the similarity between two MBRs
MBR: This shows the MBR of the sender’s map information. This consists of the x, y coordinates of upper-left and bottom-right corners of the MBR. The HELLO packet has no destination ID. Namely, the HELLO packet is received by all neighboring terminals within the communication range of a sender terminal. REPLY Packet When terminals receive a HELLO packet, they calculate the similarity of map information by the MBR in the HELLO packet. Figure 4 shows the algorithm of calculating the similarity. The MBR of the sender of the HELLO packet is denoted as MBRs . First, a receiver terminal calculates the MBR of its map information: MBRr . Then, MBRov , the overlapped part of two MBRs, is calculated. If the ratio of the areas of MBRov to the larger area of the two MBRs is lower than a threshold, the similarity of map information is regarded as low and thus the REPLY packet is sent back to the sender of the HELLO packet. The REPLY packet consists of the following attributes. source ID: This shows a unique identifier of the sender terminal. destination ID: This specifies the destination terminal. MAP Packet The MAP packet is used for exchanging the map information in two terminals. The MAP packet consists of the following attributes. source ID: This shows a unique identifier of the sender terminal. destination ID: This specifies the destination terminal. size: This describes the number of road segments in this packet.
A Map Information Sharing System among Refugees in Disaster Areas
373
Fig. 5 Simulation area
road segments: As shown in Section 3.2, map information consists of a sequence of road segments. Thus, this shows the a sequence of road segments. One road segment is expressed as the two-tuple of x, y coordinates. It is clear that the size of the MAP packet is variable and relatively huge in comparison with other packets. Thus, we have to reduce the number of the MAP packet by using the MBR described above.
4 Experiments We conducted a simulation experiment for evaluating proposed information sharing system for refugees. This section describes the experimental results.
4.1 Simulation Settings In the experiments, we provided a virtual disaster area 2.0 kilometers wide and 1.5 kilometers high. Figure 5 shows the simulation area. In the experiments, 300 refugees were deployed on the road randomly and moved randomly. Refugees are denoted as dots on roads with ID numbers in Figure 5. The communication range was set to 100 [m]. We have two parameters for the experiments: a sending interval of the HELLO packet and a threshold of the similarity of two MBRs. In order to evaluate the system, we measured the following two indicators. The frequency of communication: The number of packets was measured by varying the threshold of the similarity of MBRs in order to evaluate the method using MBRs.
374
K. Asakura, T. Chiba, and T. Watanabe
The degree of information sharing: The number of road segments in map information that were acquired from other refugees by the system was measured by varying the threshold of the similarity of MBRs. For each case, we conducted five experiments and calculated the average values.
4.2 Experimental Results Figure 6, Figure 7 and Figure 8 are the experimental results when the sending intervals of the HELLO packets are 10 [sec], 30 [sec] and 60 [sec], respectively. Figure 6(a), Figure 7(a) and Figure 8(a) show the number of each packet. Figure 6(b), Figure 7(b) and Figure 8(b) show the number of road segments in map information. “Own” describes the number of road segments that are captured by their own GPS devices, and “Others”describes the number of road segments that are acquired
200
50000
40000
Others Own
HELLO REPLY MAP 150
30000 100 20000 50
10000
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Threshold of the similarity of MBRs
Threshold of the similarity of MBRs
(a) The number of packets
(b) The number of road segments
1
Fig. 6 Experimental results when the sending interval is 10 seconds 20000
200
HELLO REPLY MAP
Others Own
15000
150
10000
100
5000
50
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Threshold of the similarity of MBRs
Threshold of the similarity of MBRs
(a) The number of packets
(b) The number of road segments
Fig. 7 Experimental results when the sending interval is 30 seconds
1
A Map Information Sharing System among Refugees in Disaster Areas 10000
8000
375
200
HELLO REPLY MAP
Others Own 150
6000 100 4000 50
2000
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Threshold of the similarity of MBRs
Threshold of the similarity of MBRs
(a) The number of packets
(b) The number of road segments
1
Fig. 8 Experimental results when the sending interval is 60 seconds
by data exchange between other refugees. From these experimental results, we can clarify the followings. • The number of packets depends on the threshold of the similarity between two MBRs. However, the number of packets is almost the same when the threshold value is lower than or equal to 70%, although the number of packets increases exponentially when the threshold value is higher than 70%. This property is independent of the sending interval of the packet. • Refugees can acquire much map information from other refugees by our proposed system. The number of road segments that are acquired by exchanging map information is not influenced by the threshold values although it is highly influenced by the communication frequency. Namely, when the threshold value is lower than 80%, data exchange is achieved effectively with lower communication frequency. This property is also independent of the sending interval of the packet. From these results, we can conclude that our proposed system makes the quantity of map information increase and that communication frequency can be controlled effectively by using the similarity of two MBRs.
5 Conclusion In this paper, we propose a map information sharing system for refugees in disaster areas. In this system, refugees record the history of positions as map information and share map information among neighboring refugees in ad-hoc network manner. By sharing map information, refugees can acquire correct safety road information in disaster areas timely without any central servers. Experimental results show that the proposed sharing method based on ad-hoc network makes the quantity of map information increase and that the frequency of communication is reduced appropriately by using the MBR.
376
K. Asakura, T. Chiba, and T. Watanabe
There is a problem in comparing map information that MBRs cannot represent the accurate area of map information. Namely, MBRs represent the area of map information approximately. Thus, for our future work, we plan to introduce the convex hull[17] of road segments in map information. Furthermore, we have to evaluate the system in practice, not in computer simulation.
References 1. Midkiff, S.F., Bostian, C.W.: Rapidly-Deployable Broadband Wireless Networks for Disaster and Emergency Response. In: The 1st IEEE Workshop on Disaster Recovery Networks, DIREN 2002 (2002) 2. Meissner, A., Luckenbach, T., Risse, T., Kirste, T., Kirchner, H.: Design Challenges for an Integrated Disaster Management Communication and Information System. In: The 1st IEEE Workshop on Disaster Recovery Networks, DIREN 2002 (2002) 3. Toh, C.-K.: Ad Hoc Mobile Wireless Networks: Protocols and Systems. Prentice-Hall, Englewood Cliffs (2001) 4. Murthy, C.S.R., Manoj, B.S.: Ad Hoc Wireless Networks: Architectures and Protocols. Prentice Hall, Englewood Cliffs (2004) 5. Lang, D.: Routing Protocols for Mobile Ad Hoc Networks: Classification, Evaluation and Challenges. VDM Verlag (2008) 6. Esri: ArcGIS, http://www.esri.com/software/arcgis/ 7. Fudemame: Pro Atlas SV6, http://fudemame.net/products/map/pasv6/ (in Japanese) 8. Google: Goole Maps, http://maps.google.com/ 9. Yahoo: Yahoo Local Maps, http://maps.yahoo.com/ 10. Sugimoto, T.: Current Status of ITS and its International Cooperation. In: International Conference on Intelligent Transportation Systems, p. 462 (1999) 11. Nagaoka, K.: Travel Time System by Using Vehicle Information and Communication System (VICS). In: International Conference on Intelligent Transportation Systems, p. 816 (1999) 12. Mase, K.: Communications Supported by Ad Hoc Networks in Disasters. Journal of the Institute of Electronics, Information, and Communication Engineers 89(9), 796–800 (2006) 13. Umedu, T., Urabe, H., Tsukamoto, J., Sato, K., Higashino, T.: A MANET Protocol for Information Gathering from Disaster Victims. In: The 2nd IEEE PerCom Workshop on Pervasive Wireless Networking, pp. 447–451 (2006) 14. Quddus, M.A., Ochieng, W.Y., Zhao, L., Noland, R.B.: A General Map Matching Algorithm for Transport Telematics Applications. GPS Solutions 7(3), 157–167 (2003) 15. Brakatsoulas, S., Pfoser, D., Salas, R., Wenk, C.: On Map-matching Vehicle Tracking Data. In: The 31st International Conference on VLDB, pp. 853–864 (2005) 16. Quddus, M.A., Ochieng, W.Y., Zhao, L., Noland, R.B.: Current Map-matching Algorithms for Transport Applications: State-of-the-art and Future Research Directions. Transportation Research Part C 15, 312–328 (2007) 17. Berg, M.D., Cheong, O., Kreveld, M.V., Overmars, M.: Computational Geometry: Algorithms and Applications. Springer, Heidelberg (2008)
A Study on a Multi-period Inventory Model with Quantity Discounts Based on the Previous Order Sungmook Lim
*
Abstract. Lee [Lee J-Y (2008). Quantity discounts based on the previous order in a two-period inventory model with demand uncertainty. Journal of Operational Research Society 59: 1004-1011] previously examined quantity discount contracts between a manufacturer and a retailer in a stochastic, two-period inventory model in which quantity discounts are provided on the basis of the previous order size. In this paper, we extend the above two-period model to a k-period one (where k > 2) and propose a stochastic nonlinear mixed binary integer program for it. With the k-period model developed herein, we suggest a solution procedure of receding horizon control style to solve n-period (n > k) order decision problems.
1
Introduction
This paper deals with a single-item, stochastic, multi-period inventory model, in which the retailer places an order with the manufacturer in each of the periods to fulfill stochastic demand. In particular, the manufacturer offers a QDP (quantity discounts based on the previous order) contract, under which the retailer receives a price discount on purchases in the next period in excess of the present-period order quantity. This type of quantity discount scheme is generally referred to as an incremental QDP. We intend to construct a mathematical programming formulation of the model and propose a method to solve the problem. For the past few decades, a number of studies have been conducted to develop decision-making models for various supply chain management problems, and a great deal of specific attention has been paid to inventory models with quantity discounts. The majority of quantity discount models have been studied using deterministic settings. Hadley and Whitin (1963), Rubin et al. (1983), and Sethi (1984) studied the problem of determining the economic order quantity for the buyer, given a quantity discount schedule established by the supplier. Monahan Sungmook Lim Dept. of Business Administration, Korea University, Chungnam 339-700, Republic of Korea e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 377–387. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
378
S. Lim
(1984) discussed a quantity discount policy which maximizes the supplier's profit while not increasing the buyer's cost. Lal and Staelin (1984) presented a fixedorder quantity decision model that assumed special discount pricing structure forms involving multiple buyers and constant demands. Lee and Rosenblatt (1986) generalized Monahan’s model to increase suppliers’ profits by incorporating constraints imposed on the discount rate and relaxing the assumption of a lot-for-lot supplier policy. Weng and Wong (1993) developed a general all-unit quantity discount model for a single buyer or for multiple buyers to determine the optimal pricing and replenishment policy. Weng (1995) later presented models for determining optimal all-unit and incremental quantity discount policies, and evaluated the effects of quantity discounts on increasing demand and ensuring Paretoefficient transactions under general price-sensitive demand functions. Hoffmann (2000) also analyzed the impact of all-unit quantity discounts on channel coordination in a system comprising one supplier and a group of heterogeneous buyers. Chang and Chang (2001) described a mixed integer optimization approach for the inventory problem with variable lead time, crashing costs, and price-quantity discounts. Yang (2004) recently proposed an optimal pricing and ordering policy for a deteriorating item with price-sensitive demand. In addition to these studies, Dada and Srikanth (1987), Corbett and de Groote (2000), and Viswanathan and Wang (2003) evaluated quantity discount pricing models in EOQ settings from the viewpoint of the supplier. Chung et al.(1987) and Sohn and Hwang (1987) studied dynamic lot-sizing problems with quantity discounts. Bregman (1991) and Bregman and Silver (1993) examined a variety of lot-sizing methods for purchased materials in MRP environments when discounts are available from suppliers. Tsai (2007) solved a nonlinear SCM model capable of simultaneously treating various quantity discount functions, including linear, single breakpoint, step, and multiple breakpoint functions, in which a nonlinear model is approximated to a linear mixed 0-1 program that can be solved to determine a global optimum. Recently, several researchers have studied quantity discount models involving demand uncertainty. Jucker and Rosenblatt (1985) examined quantity discounts in the context of the single-period inventory model with demand uncertainty (also referred to as the newsvendor model). Based on a marginal analysis, they provide a solution procedure for determining the optimal order quantity for the buyer. Weng (2004) developed a generalized newsvendor model, in which the buyer can place a second order at the end of the single selling period to satisfy unmet demand, and provides quantity discount policies for the manufacturer to induce the buyer to order the quantity that maximizes the channel profit. Su and Shi (2002) and Shi and Su (2004) examined returns-quantity discounts contracts between a manufacturer and a retailer using a single-period inventory model, in which the retailer is offered a quantity discount schedule and is allowed to return unsold goods. Lee (2008), whose study motivated our own, developed a single-item, stochastic, two-period inventory model to determine the optimal order quantity in each period, wherein the manufacturer offers a QDP contract, under which the retailer receives a price discount on purchases in the next-period in excess of the presentperiod order quantity. It can be asserted that Lee's work extended the literature in two directions: firstly, it studied quantity discounts in a two-period inventory
A Study on a Multi-period Inventory Model
379
model with demand uncertainty; and secondly, it evaluated quantity discounts based on the previous order. He acquired three main results. First, under the (incremental and all-units) QDP contract, the retailer’s optimal ordering decision in the second period depends on the sum of the initial inventory (i.e. inventory at the beginning of the second period) and first-period order quantity (i.e. price break point). Second, under the QDP contract, the retailer orders less in the first period and more in the second period, as compared with the order quantities under the wholesale-price-only contract. However, the total order quantity may not increase significantly. Third, the QDP contract always increases the retailer's profit, but will increase the manufacturer’s profit only in cases in which the wholesale margin is large relative to the retail margin. He derived an analytical formula for the optimal second-period order quantity, but due to the complicated structure of the profit function he could propose only a simple search method enumerating all possible values to determine the optimal first-period order quantity. Therefore, his model can be viewed as a one-period model. This study (i) extends Lee's work to develop a k-period model, and (ii) proposes a mathematical programming approach to the model. This paper extends the literature in two directions. First, this is the first work to deal with a k-period (k > 2) inventory model with demand uncertainty and quantity discounts on the basis of the previous order quantity. Second, whereas Lee (2008) did not provide an efficient solution method to determine the optimal first-period order quantity, we developed a mathematical optimization model involving all periods under consideration and proposed an efficient procedure for solving the model.
2
Model
We consider a single-item, stochastic, k-period inventory model involving a single manufacturer and a single retailer. The retailer places replenishment orders with the manufacturer to satisfy stochastic demand from the customer, and the customer's demand in each period is distributed independently and identically. The manufacturer provides the retailer with a QDP contract, under which the manufacturer provides the retailer with a price discount on purchases in the next period in excess of the present-period order quantity. Under this contract, the retailer places replenishment orders with the manufacturer at the beginning of each period, and it is assumed that the orders are delivered immediately, with no lead time. While unmet demands are backordered, those in the k-th period are assumed to be lost. Leftovers left unsold at the end of the k-th period are assumed to have no value, although this assumption can be readily relaxed by adjusting the inventory holding cost for the k-th period. The retailer incurs linear inventory holding and shortage costs, and the manufacturer produces the item under a lot-for-lot system. The following symbols and notations will be used hereafter: · ·
demand in the i-th period, a discrete random variable with mean probability function of demand cumulative distribution function of demand inventory level in the beginning of the i-th period
380
S. Lim
order quantity in the i-th period retail price per unit wholesale price per unit production cost per unit price discount per unit inventory holding cost per unit per period for the retailer shortage cost per unit per period for the retailer in the i-th period retailer's profit in the i-th period retailer's sales in the i-th period retailer's purchasing cost in the i-th period retailer's inventory holding cost in the i-th period retailer's shortage cost in the i-the period With the exception of shortage costs, all price and cost parameters are fixed over periods. The following figure diagrams the inventory levels and order quantities over the i-th period and the i+1-th period.
Here, the value of , which is the inventory level at the end of the i, which is the inventory level at the beth period, coincides with the value of ginning of the i+1-th period. The retailer's profit in the i-th period is the sales revenue minus the purchasing cost, the inventory holding and shortage costs in the same period, which can be expressed as follows: (1) (2) (3) (4) Then, the retailer's profit is:
(5) A mathematical programming model to maximize the retailer's profit over k periods can be formulated as follows:
A Study on a Multi-period Inventory Model (P) max s. t.
381
∑ 0, 0, 0, 1,
, 0,
0,
1,
, ,
1, , , 1, , , 1, , , 1,
,
(6)
is the order quantity in the 0-th period, and if it is not known or given, it where can assume a sufficiently large number, which means that no price discount is offered in the first period. Since is a random variable, it is impossible to solve the above problem (P) directly. Robust optimization is an approach to solving optimization problems with uncertain parameters such as (P). Robust optimization models can be classified into two categories, depending on how the uncertainty of parameters is incorporated into models; stochastic robust optimization and worst-case robust optimization. To illustrate the two models, consider the following optimization problem: , ,
min s. t.
0,
1,
,
,
(7)
is the decision variable, the function is the objective where , 1, , , are the constraint functions, and function, the functions is an uncertain parameter vector. In stochastic robust optimization models, the parameter vector is modeled as a random variable with a known distribution, and we work with the expected values of the constraints and objective functions, as follows: , ,
min s. t.
0,
1,
,
,
8
where the expectation is with respect to . In worst-case robust optimization models, we are given a set in which is known to lie, and we work with the worst-case values of the constraints and objective functions as follows: min sup s. t.
sup
, ,
0,
1,
,
(9)
In this study, we assume that the customer's demand in each period ( ) has a truncated Poisson distribution, and solve the problem (P) using stochastic robust can assume is , and the expected value optimization. The largest value that of is denoted by . If we take the expectation of the objective function and the constraint functions with respect to , we obtain the following problem (P1):
382
S. Lim (P1) max s. t.
∑ 1,
0,
, ,
0,
0, 0, 0, 1,
,
,
1, , , 1, , , 1, , , 1,
10
and Now, let us evaluate the expected values, , involved in the constraint functions. First of all, it can be readily shown that the following equation holds: ∑
∑
(11)
∑ If we introduce two additional sets of variables and ∑ , it holds that . Here, it is obvious that has a truncated Poisson distribution with mean and the largest value that ∑ it can take on, denoted by , is . The probability function of , denoted by
, is and
Then,
0,1,
!
,
,
∑
!
.
can be evaluated as follows:
∑ ∑ 1 ∑
2
2 ∑
3
3
,
(12)
∑ ∑
1 ∑ ∑
1 ,
(13)
, is the cumulative distribution function of , and we define where ∑ 0 if 0. ̃ and ̃ when Since ∑ we set ̃ , the mathematical programming problem (P1) can be transformed to
A Study on a Multi-period Inventory Model
(P2)
max s. t.
∑
383
̂
0, 1,
0, ̂
0, ̃ ∑ 0,
1, ̃
, ,
, , 0,
,
1,
1, , ,
, ,
∑
,
1,
, ,
1,
, ,
1, (14)
which is a nonlinear mixed integer optimization problem. Although the objective function is linear, the complicated structure of the constraint functions renders the problem non-convex. Therefore, it is not an easy task to find the global optimal solution to the problem. In order to make the problem more tractable, we propose a technique for the piecewise linearization of nonlinear functions, which will be described in detail in the following section.
3 Solution Procedure In this section, we develop a linear approximation-based solution procedure for the nonlinear mixed integer optimization problem (P2) derived in the previous section. Furthermore, we propose an algorithm based on the concept of receding horizon control (Kwon and Han, 2005) for solving n-period models using the solutions of k-period models (n > k). First of all, the nonlinear term can be linearized using the binary by the following four inequalities: variable 1
,
, 0
0
1
,
(15)
where M is a sufficiently large number. Similarly, the nonlinear term can be linearized using a binary variable by the following four inequalities: 1
, , 0
0
1
.
(16)
On the other hand, as it is difficult to perfectly linearize the nonlinear term ∑ , we approximate it using piecewise linearization. Let us say that ̃ , which is the function of , can be linearized by the following piecewise linear function over l-intervals ( 1 and integer): ̃
0, , (17) , ,
384
S. Lim
and ( 1, ) are the slopes and the y-intercepts of the piecewhere wise linear function in each interval, respectively, and and are the lower 0, and upper bounds of each interval, respectively. It holds that 1 ( 1, , ). Then, can be linearized using ( 2, ) and ( 1, , ) as follows: binary variables 1 1
,
1
,
1 1
, 1 1 1
1
,
,
1 , ∑
1.
(18)
Based on the above linearization, we can finally obtain the following mathematical programming model: (P3) max . .
∑ 0, 1,
0,
0, , ,
1, , , , 1
1 1
1, ,
1,
, ,
, , 1 1
,
1 1
1
,
1
,
,
1 ,
1,
1, , 1 ,
, ,
0 , 0 1
∑
s 0,
.
, λ,
1, , , 1,
1 0 , 0
,
1, 1
, , ,
1,
, ,
1, (19)
Up to now, we have discussed the development of a k-period model and its solution procedure. With increasing k values, however, the numbers of constraints and variables introduced for linearization increase significantly, as does the
A Study on a Multi-period Inventory Model
385
computation time of the solution procedure. Consequently, there are some practical difficulties associated with the application of the k-period model to real decision-making situations with a long planning horizon. In order to resolve this difficulty, we propose an online algorithm of receding horizon control style for the solution of an n-period (n > k) model using the solutions of k-period models. The algorithm begins with the solution of a k-period model with an initial condition to determine the optimal first-period order quantity, . As a time period passes, the customer's demand in the first-period is realized and thus the inventory level in the beginning of the second period is determined. Then, the optimal second-period order quantity, , is determined by solving another k-period model with the input of the initial inventory level and previous order quantity. These steps are repeated over time. The algorithm can be described, generally, as follows: < Algorithm : The solution procedure for n-period problems > Step 0: 1. Step 1: min 1, . Step 2: Solve the v-period model (P3) with the input of initial inventory level to determine the optimal order quantiand previous order quantity ties over periods [ , 1]. in the beginning of the i-th period. Step 3: Place an order of size Step 4: As one time period passes, the customer's demand in the j-th period, , is realized, and thus the inventory level in the beginning of the j+1-th pe, is determined. riod, Step 5: If , then stop. Otherwise, set 1 and return to Step 1. When v equals 1, that is, when it is required for the solution of a one-period model, the method proposed by Lee (2008) can be used. Lee's method determines the optimal second-period order quantity given an initial inventory in the beginning of the second period and a previous order quantity. The optimal first-period order quantity is obtained via a simple enumeration of all possible values. Therefore, Lee's two-period model, by its nature, can be seen as a one-period model, as previously noted in Section 1.
4
Concluding Remarks
This study established an inventory model with price discounts based on the previous order quantity that determines the optimal order quantities over multiple periods, and developed a solution procedure for the model. This work makes two major contributions to the relevant literature. Firstly, this is the first study to deal with a single-item, stochastic, multi-period inventory model with price discounts on the basis of the previous order quantity. Secondly, this study developed a mathematical programming model that simultaneously determines the optimal order quantities for all periods under consideration, whereas Lee (2008) did not develop any efficient method for the determination of the optimal first-period order quantity.
386
S. Lim
The proposed k-period inventory model for determining the optimal order quantities was formulated as a mixed integer nonlinear programming problem, and a piecewise linearization technique based on an evolutionary algorithm and linear regression was suggested for the transformation of the nonlinear problem into a linear one. A solution procedure of receding horizon control style was also developed for the solution of n-period problems using the solutions of k-period problems (n > k). Although this study extends the literature meaningfully, it has some inherent limitations, owing principally to the fact that the proposed method is a kind of an approximate approach, based on a linearization of nonlinear functions. Considering the practical importance of multi-period inventory models with QDP, the development of optimal algorithms can be considered worthwhile.
References 1. Bregman, R.L.: An experimental comparison of MRP purchase discount methods. Journal of Operational Research Society 42, 235–245 (1991) 2. Bregman, R.L., Silver, E.A.: A modification of the silver-meal heuristic to handle MRP purchase discount situations. Journal of Operational Research Society 44, 717– 723 (1993) 3. Bowerman, P.N., Nolty, R.G., Scheuer, E.M.: Calculation of the poisson cumulative distribution function. IEEE Transactions on Reliability 39, 158–161 (1990) 4. Chang, C.T., Chang, S.C.: On the inventory model with variable lead time and pricequantity discount. Journal of the Operational Research Society 52, 1151–1158 (2001) 5. Chung, C.-S., Chiang, D.T., Lu, C.-Y.: An optimal algorithm for the quantity discount problem. Journal of Operations Management 7, 165–177 (1987) 6. Corbett, C.J., de Groote, X.: A supplier’ optimal quantity discount policy under asymmetric information. Management Science 46, 444–450 (2000) 7. Dada, M., Srikanth, K.N.: Pricing policies for quantity discounts. Management Science 33, 1247–1252 (1987) 8. Hadley, G., Whitin, T.M.: Analysis of Inventory Systems. Prentice-Hall, Englewood Cliffs (1963) 9. Hoffmann, C.: Supplier’s pricing policy in a Just-in-Time environment. Computers and Operations Research 27, 1357–1373 (2000) 10. Jucker, J.V., Rosenblatt, M.J.: Single-period inventory models with demand uncertainty and quantity discounts: Behavioral implications and a new solution procedure. Naval Research Logistics Quarterly 32, 537–550 (1985) 11. Knuth, D.E.: Seminumerical Algorithms. The Art of Computer Programming, 2nd edn. Addison-Wesley, Reading (1969) 12. Kesner, I.F., Walters, R.: Class—or mass? Harvard Business Review 83, 35–45 (2005) 13. Kwon, W.H., Han, S.: Receding horizon control: Model predictive control for state models. Springer, Heidelberg (2005) 14. Lal, R., Staelin, R.: An approach for developing an optimal discount pricing policy. Management Science 30, 1524–1539 (1984) 15. Lee, J.-Y.: Quantity discounts based on the previous order in a two-period inventory model with demand uncertainty. Journal of Operational Research Society 59, 1004–1011 (2008)
A Study on a Multi-period Inventory Model
387
16. Lee, H.L., Rosenblatt, J.: A generalized quantity discount pricing model to increase supplier’s profits. Management Science 33, 1167–1185 (1986) 17. Monahan, J.P.: A quantity pricing model to increase vendor profits. Management Science 30, 720–726 (1984) 18. Rubin, P.A., Dilts, D.M., Barron, B.A.: Economic order quantities with quantity discounts: Grandma does it best. Decision Sciences 14, 270–281 (1983) 19. Sethi, S.P.: A quantity discount lot size model with disposal. International Journal of Production Research 22, 31–39 (1984) 20. Shi, C.-S., Su, C.-T.: Integrated inventory model of returns-quantity discounts contract. Journal of Operational Research Society 55, 240–246 (2004) 21. Sohn, K.I., Hwang, H.: A dynamic quantity discount lot size model with resales. European Journal of Operational Research 28, 293–297 (1987) 22. Su, C.-T., Shi, C.-S.: A manufacturer’ optimal quantity discount strategy and return policy through game-theoretic approach. Journal of Operational Research Society 53, 922–926 (2002) 23. Tsai, J.-F.: An optimization approach for supply chain management models with quantity discount policy. European Journal of Operational Research 177, 982–994 (2007) 24. Viswanathan, S., Wang, Q.: Discount pricing decisions in distribution channels with price-sensitive demand. European Journal of Operational Research 149, 571–587 (2003) 25. Weng, Z.K.: Coordinating order quantities between the manufacturer and the buyer: A generalized newsvendor model. European Journal of Operational Research 156, 148– 161 (2004) 26. Weng, Z.K.: Modeling quantity discounts under general price-sensitive demand functions: Optimal policies and relationships. European Journal of Operational Research 86, 300–314 (1995) 27. Weng, Z.K., Wong, R.T.: General models for the supplier’s all-unit quantity discount policy. Naval Research Logistics 40, 971–991 (1993) 28. Yang, P.C.: Pricing strategy for deteriorating items using quantity discount when demand is price sensitive. European Journal of Operational Research 157, 389–397 (2004)
A Study on the ECOAccountancy through Analytical Network Process Measurement Chaang-Yung Kung, Chien-Jung Lai, Wen-Ming Wu, You-Shyang Chen, and Yu-Kuang Cheng *
Abstract. Enterprises have always sacrificed the benefits of the human and environmental causes in order to pursue the maximization of companies’ profits. With more awareness of the concept of environmental protection and Corporate Social Responsibility (CSR), enterprises have to commence to take the potential duty for the human and the environment beyond the going-concern assumption and the achievement of profits maximization for stockholders. Further, with reference to vigorous environmental issues (unstable variation of climate, huge fluctuation of the Earth’s crust, abnormal raises of the sea and so on), the traditional accounting principles and assumptions do conform to the contemporarily social requirement. Therefore, based on the full disclosure principle, enterprises are further supposed to disclose the full financial cost of complying with environment regulations (regarding the dystrophication resulted from operation, amended honorable of contamination, environmental protection policies and so on) which are going to lead to enterprises are still able to achieve “economy-efficiency” for the environments under the go-concern assumption. Hence, the innovative accounting theory (the ECOAccountancy Theory) is created to establish new accounting principles and regulations for confronting these issues. This paper utilizes three essential relations in consolidation that are evaluated by nineteen assessable sub-criteria of four evaluated criteria through the comparison of the Analytical Network Process (ANP). Specifically, the specific feature of the three-approach models is to calculate the priority vector weights of each assessable characteristic, criteria and subcriteria by pairwise comparative matrix. Furthermore, in the content, the analytical hierarchical relations are definitely expressed in four levels among each criterion Chaang-Yung Kung Department of International Business, National Taichung University of Education *
Chien-Jung Lai · Wen-Ming Wu Department of Distribution Management, National Chin-Yi University of Technology You-Shyang Chen Department of Applied English, National Chin-Yi University of Technology Yu-Kuang Cheng Department of English, National Taichung University of Education J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 389–397. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
390
C.-Y. Kung et al.
which enable enterprises to choose the potential role of ECOAccountancy in CSR in a thriving hypercompetitive commerce environment. Keywords: ECOAccountancy, Analytical Network Process (ANP).
1 Introduction Beyond pursuing maximum profits and developing the giant organization, the enterprises around the world have devoted to collect social resources consisted of the manpower, environmental materials, political supports and such forth which results in direct circumstantial scarification included air pollution, water dystrophication, land contamination and social utilitarianism effect. Hence, in the people’s concentration of environmental concept era, the traditional environmental actions or policies of these enterprises are not enough to deal with the issues regarding various new competitive environment challenges. Enterprises, immediately, have to introspect for the Corporate Social Responsibilities (“CSR”) to have the pressure of competing to positively adapt and to form an effectively and comprehensively environmental strategy. However, in terms of final goals for all enterprises, the financial profits are still the most critical achievement to chase and therefore, the most completely reflected reports are the financial statements contained the income statement, balance sheet, stockholders’ equity statement, retained earnings and cash-flow statement. Further, according to concept of [1], in terms of the development of corporate social responsibilities Phase, the initial CSR development of enterprises focuses on setting up domestic CSR footholds. As the enterprises grow, they then concentrate on establishing central CSR national centers in order to set up the useful accounting department regarding in hypercompetitive commerce environment. Most enterprises employ multiple-national CSR strategies to achieve the most beneficial social responsibilities effectiveness. In terms of general manufacture developmental organization, there are currently three main organization structures which are suited in the multi-national off-shoring CSR strategies: (1) Integrated Device Manufacturer (IDM), (2) Original Design Manufacturer (ODM), and (3) Original Equipment Manufacturer (OEM). [2] Therefore, as long as the environmental concepts are considered into the design of products or procedures of production, the pollution are able to be minimized. However, many of these multi-national enterprises have re-considered their worldwide CSR strategies by analyzing the significant cost of indirect outsourcing and off-shoring of the CSR. Further, in a hypercompetitive and lower profits environment, enterprises are faced with the decision of how to cost down their operational expenditure of the Economy Accountancy (“ECOAccountancy”) through three principle accounting strategies: traditional accounting with passively green accountancy, traditional accounting with actively green accountancy, and effectively diversified ECOAccountancy. In terms of developmental geography for enterprises, enterprises are supposed to institute a complete and competitive global CSR accounting that deal with the pollutions due to cross-industry knowledge and high-contamination technologies with each other
A Study on the ECOAccountancy through ANP Measurement
391
in order to integrate their capacity for the highest benefits of achieving the lowest expenditure strategy of the operational accountancy system (“GAS”) figure 1. Up to the present, the new innovation of a successful ECOAccountancy in the global enterprises spreads wealth far beyond the lead position, and who bears primary responsibility for conceiving, coordinating, and marketing new products through effective and efficient global accounting in order to create the most beneficial synergy. While the enterprises positioned in the lead level and its shareholders are the main intended beneficiaries of the enterprise’s accounting strategic planning, other beneficiaries include partners in the enterprise’s accounting and firms that offer complementary products or services may also benefit. [3] Resource Independent for CSR & ECOAccountancy
Central Resource for CSR & ECOAccountancy
The CSR Expenditure
The national CSR strategy
De-central and outsourcing resource for CSR & ECOAccountancy
The multiple-national CSR strategies– Struggle to build decentralized multiplenational CSR & ECOAccountancy beachheads
CSR Resource Sharing and ECOAccountancy The global competitive CSR strategies– Institute global ECOAccountancy in order to capture highest CSR synergy with lowest extra environmental expenditure
The operational expenditure of the ECOAccountancy
The domestic CSR strategy
Profits Centralization
Domestic Accounting
National Accounting
Multiple-national Accounting
Global Accounting
CSR Strategy Development
Fig. 1 The CSR and ECOAccoutnancy Development Trend
2 Methodologies 2.1 Literature Review on the CSR Performance Analysis There are a large number of qualitative and quantitative papers and journals have studied on the CSR performance analysis. [4] focuses on the alternative performance measure of the operational expenditure of the ECOAccountancy in the new product development (“NPD”) through considering the effect of “acceleration of trap” of a series of time to analyze financial performance of the CSR. [5] addresses top eleven measured metrics out of 33 assessable metrics in his creating
392
C.-Y. Kung et al.
technology value pyramid through evaluating 165 industrial companies. The analytical model bases on the out-put oriented from top-down analytical steps. [1] surveys 150 questionnaires to present the difference between enterprise’s focusspots of the CSR performance and academic research-points because enterprises pay more attention on the based expenditure, time in need, quality of product of the CSR and oppositely, academic researches concentrate on customer-related measure as designing and developing the researches.
2.2 Literature Review on the Analysis Network Process (ANP) Literature on analytical network process (ANP) The initial theory and idea of the analytical network process (“ANP”) is published by the research journal of Thomas L. [6], professor of University of Pittsburgh which is utilized for handling the more complex research questions are not solved by analytical hierarchy process (“AHP”). Due to the original decision hypothesis principle (variable) of AHP defined to the “independence”, AHP is challenged for its fundamental theory by some scholars and decisive leaders because the relationships between characteristic, criteria, sub-criteria and selected candidates are not certain “independence”. [7] develops the new research methodology, positive reciprocal matrix and supermatrix, to pierce out this limited hypothesis in order to implement more complicated hierarchical analysis. More scholars further has combined AHP model into more analytical approach to inductively create the ANP. Afterwards, more researches integrate others analytical methods such as factor analysis to infer more assessable and accurate methods such as data envelopment analysis (“DEA”) and quality function deployment (“QFD”).
3 Learning of Imbalanced Data In terms of assessing the complexity and uncertainty challenges surrounding the ANP model, a compilation of expert’s collection was analyzed along with empirical survey in order to achieve retrospective cross-sectional analysis of the accounting relationship between the enterprises and accounting partners for diminishing the operational expenditure of the ECOAccountancy. This section not only characterizes the overall research design, research specification of analytical and research methodology but also is designed for comparing each assessable criteria of the relationship for characteristic, criteria, sub-criteria and selected candidates.
3.1 Research Design The research design framework, in this research, is presented in Figure 2, which contains four main research design steps: identifying, selecting, utilizing and integrating. Overall research steps includes identifying the research motive, using the research model development, measuring framework, selecting the research methodology, investigating procedures, analyzing empirically collected data, assessing overall analytical criteria through the use of Delphi method, comparing and empirical analysis in order to make a comprehensive conclusion.
A Study on the ECOAccountancy through ANP Measurement
393
Identify the research motive in order to define the clear research purpose Select the research methodology Utilize research methodology to analyze empirical data Integrate overall analysis in to inductively make conclusion Fig. 2 The Research Design Framework [8]
In terms of the representativeness of the efficient ANP model through transitivity, comparing weights principle, evaluated criteria, positive reciprocal matrix and supermatrix, research data source must collectively and statistically constrain all impacted expert’s opinion related to each assessable criteria. Based on the assessment of the ANP model, the pairwise comparison of the evaluation characteristics, criteria and attribution at each level are evaluated with respect to the related interdependence and importance from equal important (1) to extreme important (9) as expressed in Figure 3. Characteristics of ECOAccountancy 1
0 1 2 3 4 5 6 7 8 9 Equal------------------------------------Extreme Important
Characteristics of ECOAccountancy 2
Criteria of ECOAccountancy 1
0 1 2 3 4 5 6 7 8 9 Equal------------------------------------Extreme Important
Criteria of ECOAccountancy 2
Attributes of ECOAccountancy 1
0 1 2 3 4 5 6 7 8 9 Equal------------------------------------Extreme Important
Attributes of ECOAccountancy 2
Selected Candidate of ECOAccountancy 1
0 1 2 3 4 5 6 7 8 9 Equal------------------------------------Extreme Important
Selected Candidate of ECOAccountancy 2
Fig. 3 The Research Assessable Criteria
Based on the principle of consistency ratio, the pairwise comparison matrix can be acceptable when the number of C.R. is equal or small than 0.01. Further, the research data source in this research is derived from the scholars and experts who understand the measurement performance of the operational expenditure of the ECOAccountancy and the ANP model, are employed or served in Taiwan and Mainland China. Additionally, according to the fundamental characteristics of the CSR management and the ANP with concepts of [9] and [10], the three basic performance measurement of The operational expenditure of the ECOAccountancy have been considered into the characteristics of the CSR management are costdown policy, strategic demand and business development. Further, based on the collected data of expert’s opinion, this research is organized based on the following five assessable criteria: productive cost, productive technology, human resource, products marketing and company profits together with their homologous
394
C.-Y. Kung et al.
sub-criteria which are expressed in Figure 4. This criterion is then used in this research to testify and analyze the consistency of three kinds of accounting strategies: traditional accounting with passively green accountancy, traditional accounting with actively green accountancy and effectively diversified ECOAccountancy. (1) Productive Cost. For the overall reflection of the performance evaluation of The operational expenditure of the ECOAccountancy for enterprises in production from three characteristics of CSR management, three principle assessable subcriteria are considered in the criterion of financial perspective: direct cost (“DC”), indirect cost (“IC”) and manufacture expense (“ME”). (2) Productive Technology. In terms of ensuring the rising manufacturing technology after accounting, the three assessable sub-criteria, based on expert’s opinion, are considered in the criterion of qualitative and quantitative review: yield rate (“YR”) and automatic rate (“AR”). (3) Human Resource. In order to realize the effect of the accounting strategy in human resource, the two major sub-criteria according to effective and efficient concepts and expert’s discuss, are considered in the criterion of human resource: productive rate of human resource (“PR-HR”) and capacity-growing rate of human resource (“CGR-HR”). (4) Products Marketing. In terms of evaluating products marketing after accounting, the experts who are surveyed in this research, considered two chief evaluated sub-criteria: market share rate (“MSR”) and customer satisfaction (“CS”) in this criterion. (5) Company Profits. Based on the discussion of experts, in this assessable criterion, they deem that the three principle and crucial evaluated sub-criteria in evaluating profits are return on asset (“ROA”), gross profit rate (“GPR”) and net income after total expenditure (“NI-ATE”). Best accounting strategy with the lowest CSR expenditure Characteristics of CSR management for ECOAccountancy Criteria of assessment
Sub-criteria of each criterion
Selected potential CSR strategy for ECOAccountancy
Cost-down Policy Productive Cost DC IC MC
Strategic Demand
Productive Technology YR AR
The traditional accounting with passively ECOAccountancy
Human Resource PR-HR CGR-HR
Business Development Products Marketing MSR CS
The traditional accounting with actively ECOAccountancy
Company Profits ROA GPR NIR
The effectively diversified accounting (ECOAccountancy)
Fig. 4 The Relationship Among Assessable Attitudes, Criteria, Sub-criteria, and Candidates [11]
A Study on the ECOAccountancy through ANP Measurement
395
4 Classifier for Large Imbalanced Data In the hierarchical relations in the last level, each potential accounting partner has to fit match each assessable sub-criterion matched in each evaluated criterion through pairwise compared performance of each potential accounting strategy following. In order to reflect the comparative score for three kinds of accounting strategies, the equation (1) is applied to computed the comprehensively comparative related priority weight w (eigenvector) in the matrix. Consequently, the appropriate accounting partner is selected by calculating the “accounting comparative index” Di [11], which is defined by: s
kj
Di = ∑∑ PjTkj Rikj
(1)
j =1 k =1
Where the importance of related priority, D , is weight w (eigenvector) for assessable i
criterion j; Tkj is the importance of related priority weight w (eigenvector) for assessable attribute k of criterion j and R is the important potential accounting partner i on ikj
the attribute k of criterion j. Additionally, based on the equation (1) processing manipulation, the ultimate evaluated step is to combine the overall outcome of complete importance of related priority weights w (eigenvector) of Table 1. Table 1 ECOAccountancy Comparative Indexes (Productive cost / Productive technology/ Human resource / Products marketing) ECOAccountancy Comparative Index
Traditional accounting Traditional accounting with actively ECOAcwith passively countancy ECOAccountancy 0.5115
0.2833
Effectively diversified ECOAccountancy 0.2069
First, the highest evaluated score of 0.1842 is in sub-criterion of direct cost (DC) of assessable criterion of productive cost during implementing traditional accounting with passively green accountancy strategy. Then, the highest evaluated score of 0.77 is in sub-criterion of direct cost (DC) of assessable criteria of productive cost during practicing traditional accounting with actively green accountancy strategy as well. However, the highest evaluated score of 0.0717 is in sub-criterion of automatic rate (AR) of productive technology during handling effectively diversified ECOAccountancy strategy. Further, consequently, the highest result of the evaluated score of accounting comparative index of 0.5115 is traditional accounting with passively green accountancy strategy which means the best selection of accounting strategy is traditional accounting with passively green accountancy from minimizing the operational expenditure of the ECOAccountancy Y for the enterprises.
396
C.-Y. Kung et al.
5 Concluding Remarks This study has motivated global enterprises to undertake their fundamental CSR activities through accountings with competing companies (local and foreign). Specifically, as a result in this research, the traditional accounting with passively green accountancy is the best competitive strategy under the lowest The operational expenditure of the ECOAccountancy by evaluating the characteristics, assessable criteria and sub-criteria under the current business environment due to the incomplete ECOAccountancy with difficult accountancy (such as unannounced accounting systems from global accounting boards, none impartial auditing from third-party, and enterprises incompletely financial disclosures and so on). Our contention, therefore, not only focuses on the original central concept of three kinds of accounting strategies but also concentrates on the diminishment of The operational expenditure of the ECOAccountancy during the selection of the best potential accounting strategy through new, financial perspective and novel approach (ANP model). The ANP model is used not only to clearly establish comprehensively hierarchical relations between each assessable criterion but also to assist the decision-maker to select the best potential traditional accounting with passively green accountancy strategy with the lowest operational expenditure of the ECOAccountancy influence through the academic Delphi method and expert’s survey. In the content, there are five main assessable criteria which cover keypoint of evaluating the innovative shortcut in competitive accounting strategy. The next step beyond this research is to focus attention on analyzing additional influences of The operational expenditure of the ECOAccountancy which is created in the accounting strategy through more measurement and assessment. As these comprehensive versions are respected, the enterprises will be able to obtain more comparativeness under the lower operational expenditure of the ECOAccountancy through traditional accounting with passively green accountancy strategy to survive in this complex, higher-comparative and lower-profit era.
References [1] Driva, H., et al.: Measuring product development performance in manufacturing organizations. International Journal of Production Economic 66, 147–159 (2000) [2] Lewis, J.D.: The new power of strategic accountings. Planning Review 20(5), 45–46 (1992) [3] Kottolli, A.: Lobalization of CSR. CSR Management 36(2), 21–23 (2005) [4] Curtis, C.C.: Nonfinancial performance measures in new product development. Journal of Cost Management 1, 18–26 (1994) [5] Curtis, C.C.: Balance scorecards for new product development. Journal of Cost Management 1, 12–18 (2002) [6] Saaty, T.L.: Decision Making with Dependence and Feedback: The Analytic Network Process. RWS Publications, Pittsburgh (1996) [7] Saaty, T.L.: Multi-criteria decision making: the analytic hierarchy process. RWS Publications, Pittsburgh (1998)
A Study on the ECOAccountancy through ANP Measurement
397
[8] Hsieh, M.-Y., et al.: Management Perspective on the Evaluation of the ECOAccountancy in the Corporate Social Responsibility. Electronic Trend Publications, ETP (2010) [9] Bidault, Cummings: CSR strategic challenges for multinational corporations. CSR Management 21(3), 35–41 (1997) [10] Ernst, D., Bleeke, J.: Collaborating to Compete: Using Strategic Accountings and Acquisitions in the Global Marketplace. Wiley, New York (1993) [11] Hsieh, M.-Y., et al.: Management Perspective on the Evaluation of the ECOAccountancy in the Corporate Social Responsibility. In: 2010 International Conference on Management Science and Engineering, Wuhan, China, pp. 393–396 (2010)
Attribute Coding for the Rough Set Theory Based Rule Simplications by Using the Particle Swarm Optimization Algorithm Jieh-Ren Chang, Yow-Hao Jheng, Chi-Hsiang Lo, and Betty Chang
*
Abstract. The attribute coding approach has been used in the Rough Set Theory (RST) based classification problems. The attribute coding defined ranges of the attribute values as multi-thresholds. If attribute values can be defined as appropriate values, the appropriate number of rules will be generated. The attribute coding for the RST based rule derivations significantly reduces unnecessary rules and simplifies the classification results. Therefore, how the appropriate attribute values can be defined will be very critical for rule derivations by using the RST. In this study, the authors intend to introduce the particle swarm optimization (PSO) algorithm to adjust the attribute setting scopes as an optimization problem to derive the most appropriate attribute values in a complex information system. Finally, the efficiency of the proposed method will be benchmarked with other algorithms by using the Fisher’s iris data set. Based on the benchmark results, the simpler rules can be generated and better classification performance can be achieved by using the PSO based attribute coding method. Keywords: Particle Swarm Optimization (PSO); Rough Set Theory (RST); Attribute Coding; optimization. Jieh-Ren Chang Department of Electronic Engineering, National Ilan University No. 1, Sec. 1, Shen-Lung Road, I-Lan, 260, Taiwan, R.O.C e-mail:
[email protected] Yow-Hao Jheng Department of Business and Entrepreneurial Administration, Kainan University No. 1, Sec. 1, Shen-Lung Road, I-Lan, 260, Taiwan, R.O.C e-mail:
[email protected] Chi-Hsiang Lo Institute of Management of Technology, National Chiao Tung University No. 1, Sec. 1, Shen-Lung Road, I-Lan, 260, Taiwan, R.O.C e-mail:
[email protected] Betty Chang Graduate Institute of Architecture and Sustainable Planning, National Ilan University No. 1, Sec. 1, Shen-Lung Road, I-Lan, 260, Taiwan, R.O.C e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 399 – 407. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
400
1
J.-R. Chang et al.
Introduction
In recent years, many expert systems have been established for deriving appropriate responses and answers based on the knowledge based systems. But in the real life, the knowledge is filled with ambiguity and uncertainty. The conflicting and unnecessary rules can usually be found in the knowledge based system. The goal of this research is to avoid excessive system operations through simplification and correct descriptions of inputs for the knowledge based systems. In addition, the overall efficiency of classification and computation can be enhanced by reasonable analysis. The Rough Set Theory (RST) [15, 16, 19] was proposed by Zdzisław Pawlak in 1982. is the RST has widely been applied in various fields such as the prediction of business outcome [4], road maintenance [8], the insurance market analysis [18], consumer behavior analysis [13], material property identification [7] and so on. According to the procedures of the RST, the redundant attributes can be removed automatically. The patterns can be identified directly. Pattern classification is a longstanding problem in various engineering fields, such as radar detection, control engineering, speech identification, image recognition, biomedical diagnostics, etc.. Despite the huge progresses in researches in artificial intelligence during the past decades, the gap between the artificial intelligence based pattern recognitions and the human recognitions are still significant. Thus, novel methods were proposed for improving the classification performance of pattern classification problems. The RST is one of the best methods which can be manipulated easily. However, too many classification rules were generated by the RST based approaches which can be computation time and database space wasting. The attribute coding technique can be introduced as the initial step of the RST based pattern classification approach. The original information can be encoded in the beginning of the RST based procedure. Appropriate attribute code scopes can be defined based on the whole numeric range of information data. An appropriate definition of attribute code scopes can be helpful for appropriate number of rule generation and unnecessary rule reduction. The Particle Swarm Optimization (PSO) [11] algorithm was proposed by Kennedy and Eberhart in 1995 based on simulating the social behavior of birds’ foraging ways. The PSO initiates by a population of candidate solutions, or particles. Then, these particles will be moved around in the search-space according to simple mathematical formulae. Since the procedures of the PSO are simple while the parameters can easily be adjusted, the PSO was widely applied in medicine [10], stock [12], energy [14], construction [9] and so on. The PSO based method is suitable for finding optimal solution in a wide range of search-space. Therefore, the PSO is appropriate to search attributes code scope in the RST. In this study, a method for in the attribute coding in for the RST Based rule simplications by using the PSO algorithm. In the next Section, the basic concepts of the RST and the PSO algorithm are introduced. The overall architecture and the procedures of using the PSO algorithm to improve the RST based pattern classification problem will be demonstrated in the third Section. In addition, the details of
Attribute Coding for the Rough Set Theory
401
the initial conditions, termination criteria and modified steps for the proposed PSO algorithm are also described in this Section three. Performance evaluation and comparison of the analytic results with the ones being derived by other methods based on the Iris data set will be demonstrated in the fourth Section. Finally, concluding remarks of this study will be presented in Section five. In recent years, many expert systems have been established, which offer users ap-propriate responses and answers through the knowledge database. However, in re-al world, a lot of knowledge is filled with ambiguity and uncertainty. The conflict-ing rules and unnecessary rules could be found if no proper attribute coding is set. Our goal is to avoid excessive system operation through simplification such that we can improve the overall efficiency by reasonable analysis.
2
Related Work-Preliminary
2.1
The RST
The aim of this Section is to introduce the basic concepts of the RST [15, 16, 19]. Before the RST based analytic procedure, assuming data collection can be viewed | as an information system , , where 1,2, … , is the original data set with R data objects from a target system, and 1,2, … , is the attribute set which includes attributes. Each data object can be represented by , ,…, ,…, where is usually a real number in Mathematics terminology. For the sake of simplification of RST rules, we would like to transform the numerical data with a special code. First, we should define the attribute code scopes for each attribute. For example, if there are three codes for an attribute, we should define three scopes for this attribute. That means four margin values can confine these three scopes in this attribute. Since the data have their own maximum and minimum value for each attribute, only two boundary values should be decided. After the boundary value decision, all data objects will be represented by the transformed attribute coding. After attribute coding, the information system is represented by , , | 1,2, … , and where , ,…, ,…, , 1,2, … , . For each attribute , the information function is defined as : , where denotes the set of codes for , and is called the domain of attribute , and is called the domain of attribute . The RST method consists of the following procedures: (a) (b) (c)
(d) (e)
transform the data objects of target system by attribute coding, establish the construction of elementary sets, check the difference between the lower approximation and upper approxi-mation of the set which represents a class to select good data for classifi-cation rules, evaluate core and reducts of Attribute, generate decision table and rules.
402
J.-R. Chang et al.
Through the above steps, we could find reducts and cores of attributes for , . The reduct is the essential part of an , which can discern all objects discernible by the original , . The core is the common part of all reducts. Finally, the decision table and classification rules can be established by the reducts and the cores of attributes.
2.2
The PSO
Particle swarm optimization [11] is initialized with a population of random solutions of the objective function. All particles have their own position vector and speed vector at any moment. For the th particle, the position vector and speed vector are represented by and , , , ,…, , , , … , respectively, where is the dimension of space. Each par, , , is the ticle moves by the regulation of Equation (1) and Equation (2), where velocity and is the fittest solution which has been achieved so far at current time , is the global optimum solution at the same time, is the weight valand are parameters, and are random numbers, where0 ue, , , 1 and 0 1. , , (1) ,
,
(2)
PSO algorithm procedures can be described as follows: (a)
(b) (c)
(d)
(e) (f)
| A set of solution 1,2, … , and a set of velocity | 1,2, … , are initialized randomly, where is the total | at number of particles. 1,2, … , and 0. The fitness function value is calculated for each particle. If the fitness function value which is calculated by step (b) is better than the local optimal solution , then update the current local optimal solution . , the best one is Considering the fitness function values of all particles selected. If it is better than the global optimum , then update the current global optimal solution . The particle's velocity and position are changed by the Equations (1) and (2). If stopping criteria is satisfied, then stop the repeat steps. Otherwise, it goes back to step (b).
Attribute Coding for the Rough Set Theory
3 3.1
403
Research Methodology Methodology Structure
The attribute coding is the key issue for building a classifier based on RST. Therefore, how to define the attribute boundary values for each attribute code can be seen as a multi-dimensions optimization problem. In this study, we use PSO algorithm to solve the attribute coding problem in the RST classification algorithm. The purpose of this method is to reduce and simplify the RST classification rules, especially when the original input data have many attributes or the system data belong to continuous real number or widespread numerical value. The flow chart of the proposed algorithm is shown in Figure 3.1.
Fig. 1 Flow chart of the proposed algorithm
3.2
Particle Swarm Initialization
Assume there are data objects, output categories and attributes for the information system as shown in Figure 3.2. If we could encode 1 values for each attribute , then cutting values should be decided for the attribute . The number of classification rules in RST algorithm could be increased or decreased by changing the boundary values of any attribute code. To find every boundary value of attribute code in exact place for each attribute is an optimization problem. Therefore, this study uses PSO to solve this problem. First, the total ∑ number of boundary values need to be defined as , which is the number of dimensions for each particle in PSO algorithm. Then, the position values of all particles which have dimensions are randomly initialized. The
404
J.-R. Chang et al.
position value is confined between maximum and minimum border values of each attribute. The speed is initially set as zero. The whole particle swarm ,1 , data can be represented in Figure 3.2, where 1 , , … , and , , … , . , , , , , ,
:
,
,
,
,
,
:
,
,
,
,
,
,
:
,
,
,
,
,
,
:
,
,
,
,
:
,
,
,
,
:
,
,
,
:
,
,
:
,
,
Fig. 2 The structures of input data and particle swarm data.
3.3
Fitness Function for the RST Procedure
Formally, let , be the fitness or the cost function which must be minimized. The function takes a candidate solution as argument and produces a real number as output which indicates the fitness of the given candidate solution. The for which , , for all in goal is to find a solution the search-space, which would mean is the global minimum. For our problem, the fitness function is constructed to minimize classifier error rate and the number of classification rules in RST procedure. The fitness function is described as the following pseudo codes: fitness function = fit( ,IS){ Number of rules =RST( ,IS); Error_rate = Correct_Check(Number of rules,IS); if(Error_rate != 0) return (1 - Error_rate); else return (Number of rules - Number of input data); }
The beginning step in RST analysis is to find the attribute code scopes for each . The total number of dimensions is divided into groups based on particle the number of attributes. Suppose the number of dimensions is for the th group which is relative to the attribute . It means the boundary values of should be decided and optimized. The structure of dimensions for particle is shown in Figure 3.3. According to Equation (3), the original information data ,
Attribute Coding for the Rough Set Theory
405
, and 1 will be encoded as the new split attribute code , , where 1 . Based on the new attribute code, the decision table and classification rules can be generated by the RST based analysis. , ,
,
∑
∑
,
,
y
,
2
∑
3
∑
,
1
∑
∑
,
∑
,
2
, ,
,
∑
,
,
,
∑
,
,
∑
,
,
3.4
∑
.
Fig. 3 The structure of dimensions for particle
1
1
,
,
∑
(3)
,
,
Modification and Stopping Criteria
During the PSO operation, it records the regional optimal solution for each particle and the global optimum solution among all particles; they are iteratively modified by Equations (1), (2) to minimize fitness function value until stopping criteria are satisfied. In normal condition, the solution values of each particle should be followed by Equation (4). When the location values of a particle are out of order in Equation (4), it can restore order by using bubble sorting method. In this study, the stopping criteria are set as follows: (1) the fitness function values of all particles are identical, and they are smaller than or equal to zero; (2) the iteration counter is greater than _ , which is set to avoid from unlimited iteration condition in the PSO process. x x ∑
4
,
,
,
x x ∑
x
,
,
x
,
x
,
,
x∑
,
,
,
∑
(4) ,
Experiment Results
We use Iris database to test the effectiveness. Iris database can be divided into three categories, which are Setosa, Versicolor, and Verginica. Each flower can be identified by four attributes: sepal length, sepal width, petal length, and petal
406
J.-R. Chang et al.
width. There are 150 data in Iris database. We employed Iris database to generate the critical values of attribute code and rules. Then we obtained accuracy rates from testing data. We programmed with MATLAB R2009b, 90% of the data were applied for training and 10% data for testing. The critical values of attribute code were obtained with PSO algorithm. In this study, we used a total of 500 particles to find the optimal solution. As the output is divided into 3 categories and 4 attributes, there are eight dimensions for each particle. The related parameters of the proposed algorithm include 4, 2, a maximum of 10,000 iterations for _ , 100 simulations were processed every time, then value was averaged. The authors compare the results of this study with those from other researches. The average accuracy rate of the comparison is listed in Table 1. It is found that the average accuracy rate of this study is 97.75%, which is higher than others. In other words, the proposed PSO algorithm is more effective. Table 1 Comparison of Average accuracy rate with 90% training and 10% testing data. Research Method
Average Accuracy Rate
Research Method
Average Accuracy Rate
The Proposed Algorithm
97.75%
Aha-and-Kibler[1]
94.87%
Chang-and-Jheng[2]
96.72%
Dasarathy[3]
94.67%
Hong-and-Chen[6]
96.67%
Quinlan[17]
93.89%
Hirsh[5]
95.78%
The 90% training data of the RST and the proposed method are used to generate rules. The number of rules was compared with those generated with original RST method and shown in Table 2. Table 2 Comparison of number of rules. number of rules
5
The Proposed Algorithm
17
RST by original data
31
Conclusion
The original RST has a weakness while searching the critical values of a large amount of attributes. In this study, we combined PSO algorithm with RST to overcome this weakness. The proposed method was used to classify Iris data. With the advantage of the combination, the overall performance was greatly improved. In conclusion, this hybrid method was able to find the critical values for each attribute division scope and had a good performance in classification. In addition, it also significantly reduces the number of output rules.
Attribute Coding for the Rough Set Theory
407
References [1] Aha, D.W., Kibler, D.: Detecting and removing noisy instances from concept descriptions, Tecnical Report, University of California, Irvine, 88–12 (1989) [2] Chang, J.R., Jheng, Y.H.: Optimization of α-cut Value by using Genetic Algorithm for Fuzzy-based rules extraction. In: The 18th National Conference on Fuzzy and Its Applications, pp. 678–683 (2010) [3] Dasarathy, B.V.: Nosing around the neighborhood: A new system structure and classification rule for recognition in partially exposed environments. PAMI 2-1, 67–71 (1980) [4] Dimitras, A.I., Slowinski, R., Susmaga, R., Zopounidis, C.: Business failure prediction using rough sets. European Journal of Operational Research 114(2), 263–280 (1999) [5] Hirsh, H.: Incremental version-space merging: a general framework for concept learning, Ph.D. Thesis, Stanford University (1990) [6] Hong, T.P., Chen, J.B.: Finding relevant attributes and membership functions. Fuzzy Sets and Systems 103(3), 389–404 (1999) [7] Jackson, A.G., Leclair, S.R., Ohme, M.C., Ziarko, W., Kamhwi, H.A.: Rough sets applied to materials data. ACTA Material 44(11), 4475–4484 (1996) [8] Jang, J.R., Hung, J.T.: Rough set theory inference to study pavement maintenance, National Science Council Research Project. Mingshin University of Science and Technology, Hsinchu, Taiwan (2005) [9] Jia, G., Zhang, W.: Using PSO to Reliability Analysis of PC Pipe Pile. In: The 3rd International Symposium on Computational Intelligence, vol. 1, pp. 68– 71 (2010) [10] Jin, J., Wang, Y., Wang, Q., Yang, B.Q.: The VNP-PSO method for medi-cal image registration.In: The 29th Chinese Control Conference, pp. 5203–5205 (2010) [11] Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. In: Proceeding of the IEEE International Conference on Neural Networks, pp. 12–13 (1995) [12] Khamsawang, S., Wannakarn, P., Jiriwibhakorn, S.: Hybrid PSO-DE for solving the economic dispatch problem with generator constraints. In: 2010 2nd International Conference on Computer and Automation Engineering, vol. 5, pp. 135–139 (2010) [13] Kim, D.J., Ferrin, D.L., RaghavRao, H.: A study of the effect of consumer trust on consumer expectations and satisfaction: The Korean experience. In: Proceedings of the 5th International Conference on Electronic Commerce, Pittsburgh. ACM International Conference Proceeding Series, pp. 310–315 (2003) [14] Kondo, Y., Phimmasone, V., Ono, Y., Miyatake, M.: Verification of efficacy of PSObased MPPT for photovoltaics. In: 2010 International Conference on Electrical Machines and Systems, pp. 593–596 (2010) [15] Pawlak, Z.: Rough sets. International Journal of Parallel Programming 11(5), 341– 356 (1982) [16] Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishing, Dordrecht (1991) [17] Quinlan, J.R., Compton, P.J., Horn, K.A., Lazarus, L.A.: Inductive knowledge acquisition: A case study. In: Quinlan, J.R. (ed.) Applications of Expert systems, Addison-Wesley, Wokingham (1987) [18] Shyng, J.Y., Wang, F.K., Tzeng, G.H., Wu, K.S.: Rough Set Theory in analyzing the attributes of combination values for the insurance market. Expert Systems with Applications 32(1), 56–64 (2007) [19] Walczak, B., Massart, D.L.: Tutorial: Rough sets theory. Chemometrics and Intelligent Laboratory Systems 47, 1–17 (1999)
Building Agents by Assembly Software Components under Organizational Constraints of Multi-Agent System Siam Abderrahim and Maamri Ramdane
*
Abstract. This paper presents an attempt to provide a framework for building agents by automatic assembly of software components; the assembly is directed by the organization of MAS in which the agent belongs. In the proposed model the social dimension of multi-agent system directs the agent depending on its location (according to roles) within the organization to assemble / reassemble the components in order to automatically reconfigure itself; the proposed construction of agent is independent of any component model. Keywords: components, Agents, MAS, Assemblage, Organization.
1 Introduction The multi-agent systems MAS and software components present two important approaches in the world of software development, both propose the software structure as a composition of software elements, this software structuring makes it easy of its development as well as adding and replacement of elements in it. The MAS with their self-organization’s capacity pushing the level of abstraction, we believe that taking the organization of MAS as analytical framework can assist in the use of software components for building agent. Several studies [1], [5], [11] [12] [14] ... were interested in building agent by assembling components indicating the possibility of using the component approach as a framework for the specification of agent’s behavior. In this paper we propose to build agents by automated assembly of components, the assembly is directed by the organization of MAS in which the agent belongs. Siam Abderrahim University of Tebessa Algeria, Laboratory LIRE, University Mentouri-Constantine Algeria e-mail:
[email protected] *
Maamri Ramdane Laboratory LIRE, University Mentouri-Constantine, Algeria e-mail:
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 409–418. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
410
S. Abderrahim and M. Ramdane
In this work we propose a structure of the MAS in terms of roles grouped into alliances, agents agree to play roles in the organization, why agents should have certain skills implemented as software components. In this proposition the social dimension of multi-agent system directs the agent depending on its position in the organization (roles) to assemble / reassemble the components to reconfigure itself automatically and dynamically. .In this model we do an abstraction of the assembly itself and the component models; the emphasis is on the social dimension as a framework for management of component assembly by the agent.
2 The Component Paradigm The component approach is an advantageous design technique of distributed and open applications, this approach offers a balance between on the one hand the production of custom code, so long to develop and validate the other hand reuse of existing software. We can give a vision on the definition of a software component [13] which looks like a piece of software "fairly small to create and maintain, and large enough to install it and reuse it." The component concept is an evolution of the concept object, with practically the same objectives as encapsulation, separation of interface and implementation, reusability and to reduce complexity. We note the existence of several component models in which we do an abstraction in this work, as well as engine assembly or how to assemble the components.
3 Agents and Multi Agents Systems Several definitions have been attributed to the concept of agent which has been the subject of several studies; a definition of agent concept proposed by [8] consists of An agent is a computer system situated in an environment and acting autonomously and flexibly to achieve the objectives for which it was designed. The concepts of «situated ", "autonomy" and "flexible" are defined as follows: •
• •
Situated: the agent can act on its environment from the sensory inputs it receives from that environment. Examples: process control systems, embedded systems, etc. . autonomous agent is able to act without the intervention of a third party (human or agent) and controls its own actions and its internal state; Flexible: the agent in this case is: 9 able to respond in time: the agent must be able to perceive their environment and develop a response within the required time; 9 proactive: agent must produce a proactive and opportunistic behavior, while being able to take the initiative to the "right" time;
Building Agents by Assembly Software Components
411
9 Social: agents should be able to interact with other agents (software and humans) when the situation requires to complete tasks or assist these officers to do their own. This definition is consistent with the vision with which we see the agent in the context of our work. A multi-agent system [3] is a distributed system consisting of a set of agents. Contrasting AI systems, which simulate to some extent the capabilities of human reasoning, MAS are ideally designed and implemented as a set of interacting agents, most often, in ways of cooperation, competition or coexistence.A MAS is usually characterized by: a) each agent has information and problem solving abilities limited, so each agent has a partial point of view; b) there is no overall control of multi-agent system; c) the data are decentralized; d) The computation is asynchronous. Autonomous Agents and Multi-agent systems represent a good approach for the analysis, design and implementation of complex computer systems. Vision based on the agent entity provides a powerful directory tools, techniques, and metaphors that have the potential to significantly improve software systems. [8] Highlights the multi-agent systems as a preferred solution for analyzing design and build complex software systems.
3.1 Organization of MAS A key to the design and implementation of multi-agent systems of significant size is to take a social perspective in order to constrain the behavior of agents. In general, the organization [10] is a model that allows agents to coordinate their actions during the resolution of one or more tasks. It defines the one hand, a structure (eg, a hierarchy) with a set of roles that should be awarded to agents and a set of communication paths between these roles. It defines the other a control system (eg, a master / slave) that dictates the social behavior of agents. Finally, it defines coordination processes that determine the decomposition of tasks into subtasks, the allocation of subtasks to agents, and the realization of dependent tasks consistently. Examples of the kinds of organizations are AGR [4] and MOISE + [7] who are one of the first attempts of methodologies for analyzing multi-agent systems focus on the concept of organization.
4 Mutual Contributions: Components, Agents and MAS We can consider the mutual contributions between the two concepts component and agent, we can use for example agents through negotiation techniques to assist assembly of components, for example an extension of Ugtaze model in works of [9] us which are to define points of adjustment to the interface components as a
412
S. Abderrahim and M. Ramdane
way to have a variability of components, agents occurs mainly to browse the reuse of existing cases and to negotiate the adaptation of an original component, agents can select the adequate adjustment points. On the other hand, the components can help build, integration and deployment of multi-agent systems, as well as structuring agent, as in several works such as architecture VOLCANO [12] where the decomposition of the agent architecture is based on aspects (environment, organization ..) and associated treatments (communication, coordination ...), Architecture MAST [14] which has an extension of the architecture Volcano with a more detailed decomposition, MALEVA component model [2] whose goal is to enable a modular design of agent behavior as an assembly of elementary behaviors, include also the model MADCAR [5] and finally [6] which proposes an agent architecture based on components for home automation systems. Note that in each of these agent’s architectures based component decomposition based on certain criteria of decomposition (decomposition by facets and associated treatments, decomposition by behavior ...). Although the authors of the majority of these studies emphasize the aspects of genericity and flexibility we believe that these models are appropriate for certain types of agents and it is difficult to replace or add components.
5 An Approach Focused on the Organization for the Construction of Agent by Component Assembly We are primarily interested in the approach of the components is the ability to reuse; In this approach the organization is taken as an explicit framework for analysis and design of MAS, we get the needed behaviors from agents through a structure of social space, to facilitate cooperation, interaction between its members. The organization is primarily a matter of support group activity, facilitating the collective action of agents in their areas of action. On the other hand we propose construction of agents from software components in which the basic know-how are implemented. In this model we propose a structure of the SMA in terms of roles grouped into alliances, agents agree to play roles in the organization, for that agents should have certain skills implemented as software components; to acquire skills necessary for engagement in a role the agent must assemble the components that implement it’s skills.
5.1 Presentation of the Approach This model is a way to describe a structural dimension in the organization of multi-agent systems according to which the agent assembles software components. 5.1.1 Role As in MOISE + [7] a role ρ is the abstract representation of a function. An agent may have several roles; one role can be played by several agents. The role is the expected behavior of the agent in the organization.
Building Agents by Assembly Software Components
413
5.1.2 Alliance We can define an alliance of agents such as a group of agents. Each agent can be a member of an alliance. The term alliance is used to describe a community of agents who play the same role. The roles are played by agents within an alliance. Formally we can describe an alliance A as: A (R, np, cr, cmp) R: a role; np: (N * N) number (min, max) that a role must be played in an alliance; cr: indicates the role’s compatibility within the alliance, noted ρ a≡ ρ b means that the agent which is playing the role ρ a can also plays ρ b. Cmp: skills (competencies) set that the agent should have to commit to play the role of the alliance.
5.2 Roles and Components Components contain the necessary know-how (trade) for performing an action, when an agent engages in a role it must have Competencies acquired by assembling components. Implemented competencies Cmp1 :……. ; Cmp3 :……. ; Alliance Rôle
Required competencies Cmp i :…… Cmp j :……
Components Library Verify the number of agents min,max authorized to play the role.
Age
Verify the numbers of agents witch are playing the role. Acquire components depending required competencies to play the role
Fig. 1 General schema
414
S. Abderrahim and M. Ramdane
To play a role ρ1 an agent must have certain Competencies {CMP1, CMP2 ...} i.e. To contain in its composition of a set of components C = {c1, c2, c3 ... ...} or a combination of components in case of compatibility of component Cn with an assembly of Ci + Cj +…. All components of C implement the business code appropriate with the Competencies {CMP1, CMP2 ...}. For example, in robot soccer matches the specification of a defense alliance, defined by the designer, and is well specified in three alliances:
Def=( {ρgoalkeeper,ρ goalkeeper ->(1,1),ρLeader≡ρ goalkeeper } ;{ ρback,ρ back ->(3.3), ρ Leader ≡ ρ back };{ρLeader , ρLeader ->(0,1)} ; ρLeader ≡ ρ back ,ρ Leader≡ρ goalkeeper } ). This specification indicates that within a defense alliance the roles are: goalkeeper, back and leader, leader and back roles are compatible, it means that an agent witch play the role back act as leader, also the roles leader and goalkeeper are compatible. In goalkeeper alliance we have a single agent can act as a goalkeeper, three agents must commit to play the role of back (min = 3, max = 3) in the back alliance. An agent can play the role of leader witch is compatible with the goalkeeper and back roles. We specify for each role the required Competencies
ρ back → Cmp1: play the offside line, Cmp2: marking a striker ... CmpN); This means that the agent acts back must have skills (Competencies): play the offside line, marking a striker........ ; For each component of the system we specify implemented Competencies. Example: [C1→ Cmp1, Cmp4], [C2→ Cmp1, Cmp6], [C3→ Cmp3]… Thereby specifying a defense becomes:
Déf=( {ρgoalkeeper,ρgoalkeeper ->(1,1), ρLeader ≡ρgoalkeeper,{ }} ;{ ρback, ρback >(3.3), ρLeader ≡ ρback,{ ρback → Cmp1: play the offside line, Cmp2: marking a striker, ... Cmpn) } };{ρLeader, ρLeader ->(0,1)} ; ρLeader ≡ ρback, ρLeader ≡ρρgoalkeeper } ).
6 Description of Competencies Competencies represent know-how described by a name and a set of functions witch implement operations (business code)
Competency: Name_Comp { Static data: …….
Building Agents by Assembly Software Components
415
Functions : Function1 (input parameters) : returned type Function2 (input parameters) : returned type …… } For example we have this competency:
Competency: calculate_area { Static data: ……. Functions: Calcul_area_circle (R:real) : real Calcul_area_rectangle (R,L:real) : real …… }
7 Select the Components to Assemble The diagram below summarizes the process of component selection by the agent depending roles in the organization.
Decision to commit to a role
Identify required competencies
Required competencies
From required competencies identify those already acquired (components already assembled) and those in acquiring
Competencies already acquired
Competencies to acquire
Identify components in which required competencies are implemented
Set of components
416
S. Abderrahim and M. Ramdane
Select components process Define the minimum set of components by eliminating the components whose all competencies are implemented in the other components (minimum cover). The agent can keep history of the minimal sets. The history can be used if it is recent, if not repeat this step and the previous step to avoid losing the benefit of new components if they exist.
Components assembly The agent may detach components that didn't need
8 Discussions Take a social point of view for specifying multi-agent systems has advantages in handling the organization as an entity explicitly manipulated such as means for overcoming the complexity. In Figure 2 the set G represents behaviors whose implementation corresponds to the satisfaction of the overall objective of the MAS. The set E represents all possible behaviors of agents in the current environment. A specification of an organization formed for example of roles and alliances constrained the agents to implement the behaviors of the set S. Thus the organization provides a restriction on the set E which implies the elimination of a set of behaviors that do not serve to satisfy the overall objective of the MAS. Possible behaviors in the environment Possible behaviors after Organizational constraints All possible behaviors
S
G
E
Behaviors after all constraints
Fig. 2 Effect of the organization
A problem seems a real challenge is to ensure, or at least estimate, an agent who will be asked to play a role will effectively be able to accomplish it. On this point we can ensure that the agent who undertakes to play a role has the required competences.
Building Agents by Assembly Software Components
417
In this paper we do an abstraction to the assembly of components itself, the emphasis is on organization, we can use for example the engine assembly proposed by [5] at first while thinking to develop our engine assembly. Adding new components in the system does not disturb the agents’ behaviors; The implementation roles as components adds to the benefits of the component approach has the advantage of reorganization and adaptability possible if the model is enriched with mechanisms of reorganization, an agent that changes position in the organization can done by detaching and assembling components depending of the new roles in the organization by example always in robot soccer agent who plays defender can become an attacker. The agents' ability to disassemble (detach) components can be very useful in the case where agents are mobile.
9 Conclusion In this paper we tried to combine the two approaches components and agents to building agents taking the organization as a framework for analysis, this work can be considered a preliminary framework to build around him a more robust and more complete by introducing the dynamic and functional aspects with make from MAS a real autonomous system doesn’t requires interventions of an external actors; in other words enrich the model with the knowledge to identify new components and detect the coherence between them; build an engine component assembly and finally adding to the structural specification a meta structural specification that is to say, roles whose role is to identify new roles.
Bibliographies [1] Brazier, F.M., Jonker, T., Trenr, C. M.: Principles of component based design of intelligent agent. Data Knowl. Eng. 1, 1–27 (2002) [2] Briot, J.P., Meurisse, T., Peschanski, F.: Une expérience de conception et de composition de comportements d’agents à l’aide de composants. L’Objet, Revue des Sciences et Technologies de l’Information 12(4), 11–41 (2006) [3] Chaib-draa, b., Jarras, I., Moulin, B.: Département d’Informatique, Pavillon Pouliot, Article à paraître dans. In: Briot, J.P., Demazeau, Y. (eds.) Agent et systèmes multiagents, chez Hermès en (2001) [4] Ferber, J., Gutknecht, O.: A Meta-Model for the Analysis and Design of Organizations in Multi-Agents Systems (1998), http://www.madkit.org [5] Grondin, N., Bouraqadi, L., Vercouter: Assemblage Automatique de Composants pour la Construction d’Agents avec MADCAR. In: Journées Multi-Agents et Composants, JMAC 2006, Nimes, France, March 21 (2006) [6] Hamoui, F., Huchard, M., Urtado, C., Vauttier, S.: Un système d’agents à base de composants pour les environnements domotiques. In: Actes de la 16ème conférence francophone sur les Langages et Modèles à Objets, Pau, France, pp. 35–49 (March 2010) [7] Hübner, J.F., Sichman, J.S., Boissier, O.: Spécification structurelle, fonctionnelle et déontique d’organisations dans les SMA 2002 (jfiadsma 2002)
418
S. Abderrahim and M. Ramdane
[8] Jennings, N.R.: On agent-based software engineering. Artificial Intelligence Journal (2000) [9] Lacouture, J., Aniorté, P.: vers l’adaptation dynamique de services: des composants monitorés par des agents. In: Journées Multi-Agents et Composants, JMAC 2006, Nimes, France, March 21 (2006) [10] Malville, E.: L’auto-organisation de groupes pour l’allocation de tâches dans les Systèmes Multi-Agents: Application à CORBA. Thèse, Université SAVOIE France (1999) [11] Occello, M., Baeijs, C., Demazeau, Y., Koning, J.-L.: MASK: An AEIO Toolbox to Develop Multi-Agent Systems. In: Knowledge Engineering and Agent Technology, Amsterdam, The Netherlands. IOS Series on Frontiers in AI and Applications (2002) [12] Ricordel, P.-M., Demazeau, Y.: La plate-forme VOLCANO: modularité et réutilisabilité pour les systèmes multi-agents. Numéro spécial sur les plates-formes de développement SMA. Revue Technique et Science Informatiques, TSI (2002) [13] Sébastien, L.: Architectures à composants et agents pour la conception d’applications réparties adaptables. Thèse2006 Université Toulouse III France [14] Vercouter, L.: MAST: Un modèle de composants pour la conception de SMA. In: Journées Multi-Agents et Composants, JMAC 2004, Paris, France, November 23-23 (2004)
Determining an Efficient Parts Layout for Assembly Cell Production by Using GA and Virtual Factory System Hidehiko Yamamoto and Takayoshi Yamada
*
Abstract. This paper describes a system that can determine an efficient parts layout for assembly cell production before setting up a real cell production line in a factory. This system is called the Virtual Assembly Cell production System (VACS). VACS consists of two modules, a genetic algorithm (GA) for determining the parts layout and a virtual production system. The GA system utilizes a unique crossover method called Twice Transformation Crossover. VACS is applied to a cell production line for assembling a personal computer. An efficient parts layout is generated, which demonstrates the usefulness of VACS. Keywords: Production System, Virtual Factory, Assembly Line, Cell Production, GA.
1 Introduction Due to the increasing variety of users’ needs, flexible manufacturing system (FMS) have been introduced in recent years. In particular, cell production [1, 2] is useful for the assembly of modern IT products, thus several companies have adopted cell production line techniques. However, during cell production development or implementation, trial and error is required to determine an efficient parts layout. Namely, the operator places the parts on the shelves and gradually adjusts or changes the layout position such that the overall arrangement is comfortable for him. However, it can take a long period of time to make these adjustments. Trial and error is required to decide the parts layout in a cell production line because of several uncertain factors involved. Because a cell production line assembles several product varieties, each product is required to be assembled from different parts and at different production rates. Therefore, the most efficient parts layout cannot be determined beforehand, and a systematic method for finding an efficient parts layout does not yet exist. Hidehiko Yamamoto · Takayoshi Yamada Department of Human and Information System, Gifu University, Japan e-mail:
[email protected],
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 419–428. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
420
H. Yamamoto and T. Yamada
To solve this problem, our research deals with an assembly cell production line and tries to develop a system that consists of a genetic algorithm (GA) optimizer and a virtual factory system [3] that production engineers can use to plan a cell production line. The GA is used to determine efficient two-dimensional parts layout locations. In particular, the GA utilizes the characteristic crossover method. The system is applied to the development of a cell production line for assembling personal computers, which demonstrates that the proposed system is useful.
2 Assembly Using a Cell Production Line The study deals with an assembly cell production line in which an operator assembles a variety of products individually. The operator takes one part at a time from the shelves, moves to the assembly table, and performs the assembly tasks. The characteristics of an assembly cell production line include the following: {1} Assembly parts (hereafter referred to as “parts”) are located in shelves around an operator and he or she assembles them in front of the assembly table. {2} Some components of the products use the same parts (i.e., common parts) and some components use different parts. {3} The product ratio for each product is different. {4} Depending on the size of each part, an operator will be able to carry either one part at a time or two parts at a time. {5} One by one production is used [4].
3 The VACS System 3.1 Outline of VACS This paper proposes a system to solve the problems mentioned above. The system is called the Virtual Assembly Cell production System (VACS). VACS is able to solve the problem of determining efficient parts layout locations before an assembly cell production line starts operating. VACS is comprised of two modules, the Virtual factory module (V-module) and the Parts layout GA module (GA-module) as shown in Fig.1. Using these two modules, VACS can perform the following four functions. <1> VACS creates a tentative assembly cell production line, tentative line, in the V-module. <2> VACS calculates two dimensional data for the shelf location names and distance from the tentative line, and sends them to the GA-module.
Determining an Efficient Parts Layout for Assembly Cell Production
421
VACS Parts Layout GA Module coordinates data for working table and shelves are visualized
parts layout is decided
Virtual Factory Module
Fig. 1 VACS
<3> The GA-module calculates the distance from the two dimensional data and, using the distance data and the parts names, finds the coordinates of the most efficient parts layout locations. <4> VACS transfers the found coordinates to the V-module, which generates an updated three dimensional virtual cell production line, final line. By operating the final line, visual final judgments can be confirmed.
3.2 Two Dimensional Data In function <1>, a tentative line is generated by designing a virtual assembly cell production line using three dimensional computer graphics (CG). In particular, engineers design the assembly table and shelves and set them in a three dimensional virtual factory space. From the generated tentative line, VACS extracts the shelf names and the location coordinates of the shelves. S(n) are the shelf names for n number of shelves, and Cn = (xn, yn) represents the coordinates of the n shelves. By using the coordinates Cn and the coordinates for the operator’s working standing position (a, b), the distances from the operator’s working standing position to each shelf are calculated as (Ln). The distances are expressed with a set of distances L as shown in equation (1). L={L1,L2,
・・・,L } n
= { (a − x1)2 + (b − y1)2 , (a − x2)2 + (b − y2)2・・・,
(a − xn)2 + (b − yn)2}
・ ・ ・ (1)
422
H. Yamamoto and T. Yamada
VACS transfers the two dimensional data from the tentative line as mentioned above in function <2>. The data are a set Q whose components are the shelf names S(n) and the set of distances L. The data are described by equation (2). Q={(S(1), L1), (S(2), L2),
・・・(S(n), L )} ・
・・(2)
n
3.3 Determining the Parts Layout Locations The GA-module determines the two dimensional part layout locations in the shelves. The first job of the GA-module is to create the initial “individuals” using both the shelf names S(n) and the parts names. Each “individual” is a set of components combined with the shelf names corresponding to part layout locations and the part names in each location. Namely, each component of the set is expressed as the combination of the shelf names S(n) of the tentative line and n part names M(j) located in the shelves. The aggregate of the components corresponding to individual I is generated by equation (3). I = {(S(n), M(j)), |n is a sequential integer from 1 to n, j is an integer without repetition from 1 to n} (3)
crossover point
I(1’) I(1)
I(2)
9 1 4 6 1 9 6 2 4
9 1 4 6 5 7 3 8 2
5 3 8 7 1 9 6 2 4
Fig. 2 Parent individuals
I(2’)
5 3 8 7
5 7 3 8 2
Fig. 3 Children individuals
The GA-module generates several individuals corresponding to equation (3). After the completion of the initial generation, the GA operates in a cycle [5-8] that includes calculating fitness, crossover, and mutation. The parts layout locations are determined as the final output of the GA-module. When conventional GA operations are adopted to solve this cell production problem in VACS, a problem occurs. A significant number of lethal genes will be generated if the conventional crossover method [5, 9-11] is used. For example, consider the two parent individuals I(1) and I(2) shown in Fig. 2. If the crossover point is chosen as shown in Fig.2, and a crossover operation is performed, the next generation individuals, I(1’) and I(2’), will be generated as shown in Fig.3. Because of the crossover operations, the new component nine appears twice in the individual. Thus, the next individual that includes the integer component with
Determining an Efficient Parts Layout for Assembly Cell Production
423
reputation is generated. As equation (3) prohibits the reputation component of an individual, I(1’) becomes a lethal gene. Thus, when VACS adopts the conventional crossover method, several lethal genes will be generated. To solve this problem, the following crossover method (called Twice Transfer Crossover (TTC)) is proposed to perform conversions twice. Using the following procedures, TTC performs the twice conversion processes that results in the two object individuals having crossover. Procedure {1}: Create the standard arrangement (SA). Procedure {2}: Convert the two individuals, I1 and I2, by using SA and express the converted two individuals as two sets that are their replacements. The new sets are T1’ and T2’. Procedure {3}: Perform a crossover between the new sets and create two new sets, T1” and T2.” Procedure {4}: Using SA, perform a reverse conversion on the sets T1” and T2” to acquire new individual expressions whose components are shelf names and parts names. The acquired expressions result in the next generation individuals. The SA of Procedure {1} is expressed with a set whose components are an integer and a part name. Equation (4) shows the set whose element is the sequential number (i.e., “Order”) corresponding to the number of the shelf and a part name randomly located on the shelf. The Order is the integer sequence from 1 to n, placed from left to right. SA={ (Order, M(j’)) | j’ is an integer randomly selected without reputation from among 1~n } (4) In procedure {2}, the conversion executes the following operations using individuals and the SA. The initial value of both k and x used in the operations is 1. Step 2-1: Find the part name M(j) that is the element of the locus number x of the individual I1, find the M(j’) whose part name is the same as M(j) from among the SA, and find the sequence number, Order(k), of the SA. Step 2-2: Set Order(k) in the k element of the set T’. Step 2-3: Renew the SA as follows. Delete M(j’) and the element of its Order(k) from among the elements of SA and move down the Order(k) of each element whose location is behind the location of the deleted element by 1. Step 2-4: As for the all elements that are behind the first location of the individual I1, create the new set, T1’, by repeating k←k+1 x←x 1 and performing Step 2-1 through Step 2-3.
,
T1’={Order(k)|k is an integer from 1~n}
+
(5)
As for the individual I2, the same operations are performed and the new individual I2’ is generated. The reverse conversion of procedure {4} is performed by the following operations. In the operations, the set after the crossover from procedure {3} is expressed as T1”= {Order(k’)| k’ is an integer without reputation from among 1~n } and the initial value of y is 1. Step 4-1: Find the element, Order(k’), of the sequential number y from among the set T1.”
424
H. Yamamoto and T. Yamada
Step 4-2: Find the sequential number k’ from among SA and create the new set newI1 whose k’ element consists of the shelf name S(y) and the part name M(k’). The equation for newI1 is expressed as follows. newI1={(S(y), M(k’) | k’ is an integer randomly selected without reputation from among 1~n } (6) Step4-3: Renew the SA as follows. First, delete the element of M(k’) and Order(k’) from among the SA. Second, move down the Order(k’) of each element whose location is behind the location of the deleted element by 1. Step4-4: As for the all elements that are behind the first location of the individual T1,” generate the new set, newI1, by repeating y←y+1 and performing Step 4-1 through Step 4-3. The fitness in the GA uses the principle that the shorter the operator moving distance, the better the fitness. It is calculated by using the distance Ln, which is one of the elements in equation (2), corresponding to the distance between the operator working standing position and the shelf location.
4 Application Simulations We applied VACS to the cell assembly production of a personal computer. The V-module utilized GP4 from Lexer Reseach Inc. The computer has 12 parts as shown in Table 1 and the number of product variants is 10. Table 1 shows the part numbers that 5 (P(1)~P(5)) of the 10 variant products require for assembly, and the order number for each product. The locations of the parts are on the 18 shelves behind the assemble table, as shown in Fig.4. The shelves that surround the operator are named with letters A through R. In the GA operations, 100 individuals per population of a generation are calculated. In the GA, the crossover operation utilizes roulette selection with five individuals as an elite preservation group and 5% as the mutation probability. As discussed above, the fitness calculates the distance an operator has to move, where the smaller the distance, the better the fitness. In the simulation, first the shelves and the assembly table were designed with AutoCAD and were placed in the V-module. On the basis of the layout of the shelves and the assembly table, the coordinates of the shelves and the assembly table were automatically acquired, sent to the GA-module, and GA operations were started. Fig.5 shows one example of the fitness curves. After the two hundredth generation, the fitness values became constant. The individual present in this constant situation corresponds to the parts layout locations that are most efficient. Table 2 shows the resulting parts layout locations. As shown in the table, the parts whose frequency of use is high are densely located on shelves A, B, and M through R that are close to the assembly table. These locations are judged to be efficient choices. Sending the parts layout locations acquired in the GA-module back to the V-module, the production line shown in Fig. 6 was generated. Thereby, after determining the most efficient production line in the V-module, which is a virtual assembly production line, we can confirm the appropriateness of the configuration by a visual evaluation.
Determining an Efficient Parts Layout for Assembly Cell Production
425
Table 1 Personal computer parts products parts
P(1)
P(2)
P(3)
P(4)
P(5)
case
1
1
1
1
1
power source
1
1
1
1
1
mother board
1
1
1
1
1
CPU
1
1
1
1
1
memory
2
4
2
4
1
TV tuner
0
0
0
1
0
fan
1
1
1
0
0
Order number
20
17
12
9
8
Table 2 Acquired locations Locations A B C D E F G H I J K L M N O P Q R
Parts motor case case fan card reader sound card TV tuner other card capture board LAN card other options FD drive CPU fan CPU memory hard disk mother board video card CD/DVD
Parts number 80 80 35 49 46 13 31 21 35 8 18 15 82 169 126 80 76 82
426
H. Yamamoto and T. Yamada assembly table
A
R
B
Q
C
P
D
O
E
N M F
G
H
I
J
K
L
Fig. 4 PC assembly cell production 2.20E-06 2.10E-06
Best fitness
2.00E-06
ss 1.90E-06 en ti F1.80E-06
Average fitness
1.70E-06 1.60E-06 1.50E-06 0
Fig. 5 Fitness curves
50
100
150
Generations
200
Determining an Efficient Parts Layout for Assembly Cell Production
427
Fig. 6 Example of a final production cell
4 Conclusions This study outlines the development of VACS, a system for determining the most efficient locations for parts in two dimensions when planning an assembly cell production line, i.e., before constructing the real production line. VACS incorporates a GA module, which determines the parts layout locations, and a 3-dimensional virtual factory module. The GA module for determining the parts layout locations adopted a unique crossover procedure that performed coding of each individual twice. By applying the VACS to the development of a cell production line for a personal computer, an efficient parts layout location was determined without physically building the cell production line. The usefulness of VACS was thereby demonstrated.
References [1] Zhang, J., Chan, F.T.S., et al.: Investigation of the reconfigurable control system for an agile manufacturing cell. International Journal of Production Research 40(15), 3709–3723 (2002) [2] Solimanpur, M., Vrat, P., Shankar, R.: A multi-objective genetic algorithm approach to the design of cellular manufacturing systems. International Journal of Production Research 42(7), 1419–1441 (2004) [3] Inukai, T., et al.: Simulation Environment Synchronizing Real Equipment for Manufacturing Cell. Journal of Advanced Mechanical Design, Systems, and Manufacturing, The Japan Society of Mechanical Engineers 1(2), 238–249 (2007) [4] Yamamoto, H.: One-by-One Parts Input Method by Off-Line Production Simulator with GA. European Journal of Automation, Hermes Science Publications, Artiba, A. (ed.), 1173–1186 (2000)
428
H. Yamamoto and T. Yamada
[5] Yamamoto, H., Marui, E.: Off-line Simulator to Decide One-by-one Parts Input Sequence of FTL—Method of Keep Production Ratio by Using Recurring Individual Expression. Journal of the Japan Society for Precision Engineering 69(7), 981–986 (2003) [6] Goldberg, D.E.: Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading (1989) [7] Yamamoto, H.: One-by-one Production Planning by Knowledge Revised-Type Simulator with GA. Transactions of the Japan Society of Mechanical Engineers, Series C 63(609), 1803–1810 (1997) [8] Ong, S.K., Ding, J., Nee, A.Y.C.: Hybrid GA and SA dynamic set-up planning optimization. International Journal of Production Research 40(18), 4697–4719 (2002) [9] Yamamoto, H., Qudeiri, J.A., Yamada, T., Ramli, R.: Production Layout Design System by GA with One by One Encoding Method. The Journal of Artificial Life and Robotics 13(1), 234–237 (2008) [10] Qiao, L., Wang, X.-Y., Wang, S.-C.: A GA-based approach to machining operation sequencing for prismatic parts. International Journal of Production Research 38(14), 3283–3303 (2000) [11] Fanjoy, D.W., Crossley, W.A.: Topology Design of Planar Cross-Sections with a Genetic Algorithm: Part 1—Overcoming the Obstacles. International Journal of Engineering Optimization 34(1), 1–22 (2002)
Development of a Multi-issue Negotiation System for E-Commerce Bala M. Balachandran, R. Gobbin, and Dharmendra Sharma*
Abstract. Agent-mediated e-commerce is rapidly emerging as a new paradigm to develop business intelligent systems. Such systems are built upon the foundations of agent technology with a strong emphasis on the automated negotiation capabilities. In this paper, we address negotiation problems where agreements must resolve several different attributes. We propose a one-to-many multi-issue negotiation model based on the Pareto optimal theory. The proposed model is capable of processing agents’ preferences and arriving to an optimal solution from a set of alternatives by ranking them according to the score that they achieved. We present our prototype system architecture, together with a discussion of the underlying negotiation framework. We then illustrate on our implementation efforts using the JADE and Eclipse platform. Our concluding remarks and possible further work are presented. Keywords: software agents, e-commerce, Eclipse, JADE, scoring functions, multi-issue negotiation, Pareto-efficient.
1 Introduction Recent advances in Internet and web technologies have promoted the development of intelligent e-commerce systems [8]. Such systems are built upon the foundations of agent technology with a strong emphasis on the agent negotiation capabilities. We define negotiation as a process in which two or more parties with different criteria, constraints and preferences jointly reach an agreement on the terms of a transaction [9]. In most e-commerce situations, what is acceptable to an agent can not be described in terms of a single parameter. For example, a buyer of a PC will consider the price, the warranty, the speed of the processor, the size of the memory, etc. A buyer of a service, like Internet access, will look at the speed and reliability of the connection, the disk space offered, the quality of customer Bala M. Balachandran · R. Gobbin · Dharmendra Sharma Faculty of Information Sciences and Engineering The University of Canberra, ACT, Australia e-mail:{bala.balachandran,renzo.gobbin, dharmendra.sharma}@canberra.edu.au J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 429–438. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
430
B.M. Balachandran, R. Gobbin, and D. Sharma
service, the pricing scheme, etc. Agreements in such cases are regions in a multidimensional space that satisfy the sets of constraints of both sides [6]. In this paper, our interest is in the development of a multi-issue based negotiation model for e-commerce. We use three agents in our study: a buyer agent, a seller agent, and a facilitator agent. The seller agent allows a seller to determine his negotiation strategies for selling merchandise. Similarly, the buyer agent allows a buyer to determine his negotiation strategies for buying merchandise. The facilitator agent serves to handle the negotiation strategies for both the buyer and the seller agents. In our approach, agents’ preferences are expressed in fuzzy terms. The application domain for our prototype implementation is buying and selling laptop computers. Our paper is organised as follows. First, we review some related works. We then present our proposed negotiation model, discussing its ability to handle customer preferences based on multiple parameters. We describe the model in terms of the negotiation object, the negotiation protocol and the negotiation strategy. Then we show how the principles of fuzzy logic and scoring functions are used in our model to facilitate multi-issue based negotiation. We then describe details of a prototype system we have developed using JADE [3] within the Eclipse platform [12]. Finally, we present our concluding remarks and discuss possible future work.
2 Related Works Automated bilateral negotiation has been widely studied by artificial intelligence and microeconomics communities. AI-oriented research has focused on automated negotiation among agents. Merlat discusses the potential of agent-based multiservice negotiation for e-commerce and demonstrates a decentralized constraint satisfaction algorithm (DCSP) as a means of multiservice negotiation. Badica et al [1] present a rule based mechanism for agent price negotiation. Lin et al [5] present an automatic price negotiation using fuzzy expert system. Sheng [10] presents work that offers customers online business-to-customer bargaining service. There have also been some efforts in applying automated negotiation to tackle travel planning problems[2] [7]. Several approaches to agent-mediated negotiation on electronic marketpalces have been introduced in the literature. For example, Kurbel et al [4] present a system called FuzzyMAN: An agent-based electronic marketplace with a multilateral negotaitaion protocol. They argued that there is no universally best approach or technique for automated negotiation. The negotaition strategies and protocols need to be set according to the siatuation and application doamin. Ragone et al [10] present an approach that uses fuzzy logic for automated multi-issue negotiation.They use logic to model relations among issues and to allow agents express their preferences on them.
Development of a Multi-issue Negotiation System for E-Commerce
431
3 A Multi-issue Bilateral Negotiation Model for E-Commerce In this section we present a multi-issue negotiation model for e-commerce in which agents autonomously negotiate multi-issue terms of transactions in an eCommerce environment. We have chosen a laptop computer trading scenario for our prototype implementation. The negotiation model we have chosen for our study is illustrated in Figure 1. In this model, issues within both the buyer’s request and the seller’s offer can be split into hard constraints and soft constraints. Hard constraints are issues that have to be necessarily satisfied in the final agreement, whereas soft constraints represent issues they are willing to negotiate on. We utilise a facilitator agent which collects information from bargainers and exploits them in order to propose an efficient negotiation outcome.
Buyer 1
Buyer 2
Seller 1
Seller 2 Facilitator Agent
Buyer 3
Seller 3
Fig. 1 One-to-many negotiation scheme.
The negotiation module consists of three components: negotiation object, decision making model and negotiation protocol. The negotiation object is characterised by a number of attributes for which the agents can negotiate. The decision making model consists of an assessment part which evaluates an offer received and determines an appropriate action, and an action part which generate and send a counter-offer or stop the negotiation. The assessment part is based on the fact that different values of negotiation issues are of different value for negotiating agents. We model the value of negotiating issues by scoring functions [4]. The bigger the value of a scoring function for a certain value of an issue is, the more suitable is this value for a negotiating agent.
432
B.M. Balachandran, R. Gobbin, and D. Sharma
3.1 Scoring Functions The scoring functions represent private information about their preferences regarding the negotiation issues. This information is not given to other participants in the negotiation process. A scoring function is defined by four values of a negotiation issue. They are the minimal, maximal, optimum minimal and optimum maximal as illustrated in Figure 2 below: Value 1
0 Min
Opt Min
Opt Max
Max
Years
Fig. 2 Scoring Function for negotiation issue “number of years warranty"
We also consider the fact that different negotiation issues are of different importance for each participant. To model this situation, we introduce the weighting factor representing the relative importance that a participant assigns to an issue under negotiation. During negotiation, the value of an offer received is calculated using two vectors: a vector-valued offer received by an agent and a vector of relative importance of issues under negotiation. The value of an offer is the sum of the products of the scoring functions for individual negotiation issues multiplied by their relative importance.
3.2 The Negotiation Protocol The negotiation facilitator receives this request and registers the customer. Once this is done, the negotiation process can begin with the suppliers. The negotiation facilitator requests the suppliers to provide offers conforming to the restrictions imposed by the customer agent. Please note that each restriction has an importance rating (0% to 100%), which means there is some leniency in the restrictions
Development of a Multi-issue Negotiation System for E-Commerce
433
imposed by the customer. For example if the customer wants the colour Red, but provides an importance rating of 50%, it is quite lenient and the negotiation facilitator will request suppliers to make offers for a range of different colours. The negotiation facilitator and suppliers go through several rounds of negotiation until they reach the maximum number of rounds. Then the best offer (optimal set) is sent back to the customer agent. The customer agent then displays the results of the negotiation process to the end user who is ultimately responsible for making the decision on which item to buy.
3.3 The Negotiation Strategy The facilitator’s strategy is to gather a set of offers from the listed suppliers which satisfy the customer’s wishes. Each offer is compared with the last offer by using a Pareto optimality algorithm. The facilitator has a utility algorithm which shows how good a particular offer is, this facilitator may modify the customers preferences (those which have an importance rating of less than 100%) in order to find other offers which may satisfy the user’s needs. Once the set of optimal results are obtained, it is sent back to the customer agent. The buyer’s strategy is one which aims to maximise their profits on the goods sold to customers. They would also like to sell the goods as fast as possible, but at the highest possible price. The supplier does not want any old stock which cannot be sold
3.4 The Negotiation Process The negotiation process begins with registered buyers and sellers and a single facilitator. The seller sends a list of all items for sale to the facilitator. These items are registered for sale and available for all the buyers to bargain on and purchase. The buyer then registers with the facilitator and sends all their preferences. Once the preferences have been received by the facilitator, the negotiation process can begin between the facilitator and the supplier: 1. 2. 3. 4.
Facilitator runs the Pareto optimality algorithm to remove any suboptimal solutions Facilitator runs the Utility function to get the item with the highest utility. This item with the highest utility is selected as the negotiation item and set as the base item with all its properties (price, hard drive space etc) The item’s properties are changed so that the property with the highest importance factor is increased. If importance factor of price was highest, it would be reduced by 10%. If the importance factor of the hard drive space or any other property was the highest, then it would be increased by 10%
434
B.M. Balachandran, R. Gobbin, and D. Sharma
5. 6.
7.
8.
This is sent to the supplier to see if they agree with the properties This counter offer is received by the supplier who has a negotiable threshold amount (set to 10% by default) by which they are willing to negotiate on the items properties a. If the negotiable threshold is not crossed, the counter offer is agreed to and sent back to the facilitator b. If the negotiable threshold has been crossed, then check by how much. This difference is added to the price. If the threshold is crossed by 5%, then the price is increased by 5% and sent back to the facilitator When the facilitator receives this offer, it calculates the utility of the offer and if it is greater, then it becomes the new base item. The next round of bargaining begins (back to step 4) The bargaining process happens for a fixed number of rounds, 4 by default.
4 System Development and Evaluation In this section, we present our implementation efforts to automating multi-issue negotiations. As described before there are three different agents, namely the buyer, the seller and the facilitator. Although there can be more than one instance of the buyer and the seller, there can only be one instance of the facilitator running at any one time. This is a limitation on the system imposed to reduce the complexity of the application. The main focus in this implementation has been the negotiation component which implements the multi-issue bargaining model described in the previous sections. The architecture of the system consists of three modules: a communication module, a module for interaction with the agents’ owners, and a negotiation module. The communication module manages the exchange of messages between agents. The interaction module is responsible for communication between an agent and its owner. A user can manage his or her agent through a graphical user interface, and an agent can communicate with its owner in certain situations through e-mail. For example, to initialise a buyer’s agent, the user has to specify attributes such as price, memory size, hard disk, warranty, etc as shown in Figure 3. The user can also specify the tactic the agent is supposed to employ in the negotiation process. and the scoring functions and weights for the negotiation issues.
Development of a Multi-issue Negotiation System for E-Commerce
435
Fig. 3 One sample screen shot used by a buyer
4.1 The Buyer Agent The buyer agent is designed to get the preferences from the user, register with the facilitator and then receive the results of the negotiation process. From the point the user clicks on search, there is no interaction between this agent and the end user, until the negotiation results are returned. The end user specifies their preference values on various negotiation issues and their relative importance factors. This information is used by the facilitator during the bargaining process.
4.2 The Negotiation Object The negotiation object in this case is the item which is being negotiated upon. This item, A, has several properties and each property has a name and value. For the purposes of this project, the name is a string and the value is an integer whole number. These item details are read in by the agents and the properties manipulated during the negotiation process. Figure 4 shows an example XML document showing a sale Item and its properties.
436
B.M. Balachandran, R. Gobbin, and D. Sharma
Fig. 4 A sale item representation in XML.
4.3 The Facilitator Agent The facilitator agent receives registration requests from both the buyer and seller and then process the request (either accepts or denies the registration request). The most important functionalities of the facilitator agent are management of other agents, pre-selection of agents for a negotiation, and looking after the negotiation process. During the pre-selection phase, the facilitator selects and ranks those agents of the marketplace that will start a specific negotiation over certain issues with a given agent. Once the maximum number of negotiation rounds has been completed, the facilitator sends the best offer back to the buyer.
4.4 The Seller Agent The seller agent is responsible for registering with the facilitator and sending a list of sale items which are available. This agent also manages the counter offers received from the facilitator. The agent has a threshold limit as to how much it is able to negotiate. All offers where it needs to negotiate more incur an increase in the price of the good.
Development of a Multi-issue Negotiation System for E-Commerce
437
4.5 JADE Implementation The proposed multi-issue negotiation system was implemented using the JADE framework [3] and the Eclipse platform [12] as our programming environment. Figure 5 shows a screen dump of the development environment. The system provides graphical user interfaces for users (buyers and sellers) to define scoring functions, weighting factors, negotiation tactics. It also has a customer management system for the system administrator. We have done some evaluation to investigate the satisfaction of using the proposed system. The performance of the system is very promising and we are currently investigating further improvements to the system.
Fig. 5 The Development environment using Eclipse and JADE
5 Conclusions and Further Research In this paper, we have attempted to model multi-issue negotiation in the ecommerce environment. We showed how a one-to-many negotiation could be handled by co-ordinating a number of agents via a facilitator agent. The facilitator agent utilises scoring functions, relative importance factors and Pareto optimality principles for the evaluation and ranking of offers. We have also demonstrated a prototype implementation using JADE[3] within the Eclipse environment. The facilitator agent currently uses a simple strategy, providing a proof of concept. In the future, we plan to extend the approach using more expressive logics, namely fuzzy logic[2] to increase the expressiveness of supply/demand descriptions. We
438
B.M. Balachandran, R. Gobbin, and D. Sharma
are also investigating other negotiation protocols, without the presence of a facilitator, allowing to reach an agreement in a reasonable amount of communication rounds.
References [1] Badica, C., Badita, A., Ganzha, M.: Implementing rule-based mechanisms for agentbased price negotiation, pp. 96–100. ACM, New York (2006) [2] Cao, Y., Li, Y.: An intelligent fuzzy-based recommendation system for consumer electronic products. Expert Systems with Applications 33, 230–240 (2007) [3] Bellifemine, F., Caire, G., Greenwood, D.: Developing Multi-Agent Systems with JADE. John Wiley & Sons, UK (2007) [4] Kurbel, K., Loutchko, I., Teuteberg, F.: FuzzyMAN: An agent-based electronic marketplace with a multilateral negotiation protocol. In: Lindemann, G., Denzinger, J., Timm, I.J., Unland, R. (eds.) MATES 2004. LNCS (LNAI), vol. 3187, pp. 126– 140. Springer, Heidelberg (2004) [5] Lin, C., Chen, S., Chu, Y.: Automatic price negotiation on the web: An agent-based web application using fuzzy expert system. Expert Systems with Applications, 142 (2010), doi:10, 1016/j.eswa.2010.09 [6] Merlat, W.: An Agent-Based Multiservice Negotiation for ECommerce. BT Technical Journal 17(4), 168–175 [7] Ndumu, D.T., Collis, J.C., Nwana, H.S.: Towards desktop personal travel agents. BT Technol. J. 16(3), 69–78 (200x); Nwana, H.S, et al.: Agent-Mediated Electronic Commerce: Issues, Challenges and Some Viewpoints. In: Autonomous Agents 1998, MN, USA (1998) [8] Nwana, H.S., et al.: Agent-Mediated Electronic Commerce: Issues, Challenges and Some Viewpoints. In: Autonomous Agents 1998, MN, USA (1998) [9] Paprzycki, M., et al.: Implementing Agents Capable of Dynamic Negotiations. In: Petcu, D., et al. (eds.) Proceedings of SYNASC 2004: Symbolic and Numeric Algorithms for Scientific Computing, pp. 369–380. Mirton Press, Timisoara (2004) [10] Ragone, A., Straccia, U., Di Noia, T., Di Sciascio, E., Donini, F.M.: Towards a fuzzy logic for automated multi-issue negotiation. In: Hartmann, S., Kern-Isberner, G. (eds.) FoIKS 2008. LNCS, vol. 4932, pp. 381–396. Springer, Heidelberg (2008) [11] Takayuki, I., Hattori, H., Klein, M.: Multi-Issue Negotiation Protocol for Agents: Exploring Nonlinear Utility Spaces. In: IJCAI 2007, pp. 1347–1352 (2007) [12] The Eclipse Platform, http://www.eclipse.org/ [13] Tsai, H.-C., Hsiao, S.-W.: Evaluation of alternatives for product customization using fuzzy logic. Information Sciences 158, 233–262 (2004)
Effect of Background Music Tempo and Playing Method on Shopping Website Browsing Chien-Jung Lai, Ya-Ling Wu, Ming-Yuan Hsieh, Chang-Yung Kung, and Yu-Hua Lin *
Abstract. Background music is one of the critical factors that affect browsers’ behavior on shopping website. This study adopted a laboratory experiment to explore the effects of background music tempo and playing method on cognitive response in an online store. The independent variables were background music tempo (fast vs. slow) and playing method of music (playing the same music continuously, re-playing the same music while browsing different web pages, and playing different music while browsing different web pages). The measures of the shifting frequency between web pages, perceived browsing time, and recalling accuracy of commodity were collected. Results indicated that participants had more shifting frequency and shorter perceived browsing time for fast music tempo than for slow music tempo. The effect of music tempo on recalling accuracy was not significant. Continuous playing method and re-playing method had similar shifting frequency and perceived browsing time. Different method had more shifting frequency, longer perceived time, and less recalling accuracy. Continuous playing had greater accuracy than re-playing method and different method. The findings should help in manipulating background music for online shopping website. Keywords: Background music, music tempo, playing method.
1 Introduction Online retailing has attracted a great deal of attention in recent years due to the rapid developing of Internet. Some researchers have been already begun to call for more systematic research on the nature of this format by using established Chien-Jung Lai Department of Distribution Management, National Chin-Yi University of Technology Ya-Ling Wu Department of Applied English, National Chin-Yi University of Technology Ming-Yuan Hsieh · Chang-Yung Kung Department of International Business, National Taichung University of Education Yu-Hua Lin Department of Marketing & Distribution Management., Hsiuping Institute of Technology J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 439–447. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
440
C.-J. Lai et al.
retailing and consumer behavior theories. Researches of consumer on-line behavior are more important than ever. As a natural departure from the stimuli present in a traditional retail store, the online retail environment can only be manipulated by visual and auditory cues. In the past, research of online store focused on the design of website structure and interface from visual stimulus, few carried on the discussion from the auditory stimulus. Recently many websites place background music on web to attract browsers’ attention. Some researches start to study the effect of background music on consumer response [20]. These researches focused on the structures of music, such as rhythm, tempo, volume, melody and mode, and browsers’ affective response. Little research has been done on the measure of cognitive response to background music in an online setting. The present study will address the music tempo and playing method of background music and examine the browsers’ cognitive response.
2 Research on Store Atmospheric 2.1 Brick-and-Mortar Atmospherics The impact of atmospherics on the nature and outcomes of shopping has been examined by researchers for some time. To explain the influence of atmospheres on consumer, atmospheric research has focused heavily on the Mehrabian-Russel affect Model [13]. Donovan and Rossiter [5] tested the Stimulus-OrganismResponse (S-O-R) framework in a retail store environment and examined Mehrabian and Russell’s three-dimensional pleasure, arousal, and dominance (PAD) emotional experience as the intervening organism state. Empirical work in the area has examined specific atmospheric cues and their effects on shopper response. As shown throughout the literature, atmospherics can be a means of influencing the consumer in brick-and-mortar environment.
2.2 Web Atmospherics Dailey [3] defined web atmospherics as “the conscious designing of web environments to create positive effects (e.g., positive affect, positive cognitions, etc.) in user in order to increase favorable consumer responses (e.g., site revisiting, browsing, etc.). Eroglu et al. [6] propose a typology that classifies web atmospheric cues into high task-relevant cues (i.e., descriptions of the merchandise, the price, navigation cues, etc.) and low task-relevant cues (i.e., the colors, borders and background patterns, typestyles and fonts, music and sounds, etc.). These cues form the atmosphere of a web site. As in brick-and-mortar environments, atmospheric cues have been posited to influence consumers on the web [3] [6]. However, research on web atmospheric thus far is limited in its theoretical explanation of why web atmospherics influence consumers. Dailey [3] indicated that web atmospheric researchers should begin to focus on specific web atmospheric cues (i.e., color cues, navigation cues, etc.) and theoretical explanations of how and why these cues may influence consumers. The present study will address this issue by focusing specifically on design of music cues.
Effect of Background Music Tempo and Playing Method on Shopping Website
441
2.3 Music Cues There have been a number of studies that investigate the effect of background music on the physical environment in the interest of various research fields, for example, advertising, retail/manufacturing, and ergonomics [8][15][17]. Bitner [1] argued that music is a critical ambient condition of the servicescape and that music influences people’s emotions and physiological feelings, and mood. Various structure characteristics of music, such as time (rhythm, tempo, and phrasing), pitch (melody, keys, mode, and harmony), and texture (timbre, orchestration, and volume), influence consumer response and behavior [2]. Academic research suggests that music affect how much time shoppers spend in a store, music appears to affect shoppers’ perceptions of the amount time they spent shopping [22]. However, few studies have reported the effect of music on website.
2.4 Music Tempo and Playing Method Music tempo has been considered representative of an essential dimension of music and has received wide attention in previous research [2] [12]. In empirical studies, the variation in the tempo of background music has been found to have a differential effect on listeners’ affective responses and shopping behavior. Milliman [14] found that fast-tempo background music speeds up in-store traffic flow and increases daily gross sales volume compared with slow tempo music. Further, Milliman [15] found that fast-tempo background music shortens restaurant patrons’ dining time. BrunerII [2] summarized that faster music can induce a more pleasant affective response than slower music. In a meta analysis, Garlin and Owen [7] concluded that tempo is the main structural component of music to induce listeners’ arousal. Besides the study of the traditional physical environment, the study of Wu et al. [20], focusing on the effect of music and color on participants’ emotional response in an online store setting, indicated that participants felt more aroused and experienced greater pleasure when they were exposed to fast-tempo music website than those people who experienced slow-tempo music. Day et al. [4] examined the effects of music tempo and task difficulty on the performance of multi-attribute decision-making. The results indicated participants made more accurately with the presentation of faster than slower tempo music. Further, the faster tempo music was found to improve the accuracy of harder decision-making only, not the easier decision-making. The music is always played as background music in a continuous way in the traditional retail environment. However, the music could be played in various methods in the online setting for the progress advancement of information technology. The music could be played continuously during the browsing time for a browser. It can also be played in different ways, such as re-playing the same music while browsing a different web page, or playing distinct music while browsing a different web page. Various playing method may cause different attention and induce different browsers’ response.
442
C.-J. Lai et al.
2.5 Music and Cognitive Response A number of studies have been conducted using affective response as measurement of shopper in a retailer store environment [3][5][6]. According to different theoretical perspective, background music has been focus to act on listeners’ cognitive process [11]. Cognitive response such as shoppers’ attention, memory, and perceived time are also critical for the assessment of atmospheric cues. A possible general explanation for the effect of background music can be based on the mediating factor of attention [9][10]. Background music can play at least two different roles in the operation of attention, i.e. the distractor vs. the arousal inducer. From the distractor perspective, both browsing website and processing background music are basically assumed to be cognitive activity demanding the attention resource [9][18]. On the other hand, from the arousal inducer perspective, background music may affect the arousal level of listeners, increasing the total amount of momentary mental resource available to browsing website and thus, impacts the recalling of memory. According to the distractor perspective, the performance of a task is impaired by the interruption of background music, while from the arousal inducer perspective, it is improved by the supplement of resource stimulated by the background music. This contradiction makes the prediction of background music effect problematic. Time is an important factor in retailing because retailers strongly believe in a simple correlation between time spent shopping and amount purchased. Milliman’s [14] study suggests that music choices affect actual shopping times. His study on grocery store indicated that consumers spent 38% more time in the store when exposed to slow music compared with fast music. It is likely that shoppers spent more time in the store during the slow music periods than the fast music periods. Yalch and Spangenberg [21] showed that actual shopping time was longer in the less familiar background music condition, but perceived shopping time was longer in the more familiar foreground music condition. Further, Yalch and Spangenberg [22] indicated that individuals reported themselves as shopping longer when exposed to familiar music but actually shopped longer when exposed to unfamiliar music. Shorter actual shopping times in the familiar music condition were related to increased arousal. Longer perceived shopping times in the familiar music condition appear related to unmeasured cognitive factors. Although substantial studies have been performed on the time perception in traditional retailing setting, few studies have reported the effects in online setting. Here, the present study focused on the effect of music on browsers’ cognitive activities. The measures of shifting frequency between web pages, perceived browsing time, and recalling accuracy of website commodity were collected to explore the relation of attention and memory recall.
3 Research Method The main purposes of this study are to design the background music playing method and examine the effects of these methods and music tempo on browsers’ cognitive response. A laboratory experiments was conducted to evaluating these effects.
Effect of Background Music Tempo and Playing Method on Shopping Website
443
3.1 Participants A total of 54 university students (18 male and 36 female) who were between 19 and 26 years old (M= 21.3, SD = 1.21) were recruited as participants in the experiment. Each participant has surfing experiences on the internet for one year at least. They were paid a cash reward of NT$ 100 for their participation.
3.2 Experimental Design The experiment used a 2 x 3 between-participant factorial design. The first independent variable to be manipulated was the music tempo which had two levels: faster tempo vs. slower tempo. According to the definition of North and Hargreaves [16], we considered music tempo of 80 BPM or less as slow and 120 or more as fast. Two categories of songs from the same music collection were used, one fast and the other slow. The second independent variable was music playing methods. There were three levels: playing the same music continuously (Continuous), re-playing the same music while browsing a different web page (Re-play), playing distinct music while browsing a different web page (Different). All of the three methods have been used as background music on online website. Participants were randomly assigned to one of the six treatments condition. The dependent measures collected in this experiment were the shifting frequency between web pages, perceived browsing time, and recalling accuracy of website commodity. Shifting frequency between web pages was the participants’ shift frequency between web pages during the browsing period. It was used to express the variations of browsers’ attention. The perceived browsing time was the participant’s estimation of the amount of time they had spent browsing the website. Recalling accuracy of commodity was the percentage of the correct recall of commodities attributes after the browsing task.
3.3 Materials The shopping website conducted in the experiment had 3 levels of contents. The first level introduced two types of commodity store for participant to browse. The second level had a brief description of commodity for each store. Each store had 3 commodities. The third level described the detail information for each commodity. In total, there were nine pages for the experimental website. In order to match up the music tempo and playing method of playing a different music while browsing different web pages, nine faster tempo songs and nine slower tempo songs were selected from the same music collection. All of the songs had no lyrics. The volume was controlled at 60 dB which was considered as comfortable level for shopping [12].
444
C.-J. Lai et al.
3.4 Task and Procedure The participants were tested individually in the experiment setting. A participant was instructed to browse the experimental shopping website for four minutes as he/she was planning to purchase from the site. Background music was played from the computer’s speakers. After browsing the website, participants were asked to estimate the amount of time they had spent on browsing the website by circling the points of minutes (from 1 to 8) and second (from 0 to 60) [19]. Then a short test for the recall of attributes for each commodity was given to participants. Participants’ shift frequency between web pages during the browsing was recorded by the FaceLab eye-tracking system automatically.
4 Results Table 1 shows the means and standard deviations of shifting frequency between web pages, perceived browsing time, and recalling accuracy of commodity under each level of the independent variables. An analysis of variance (ANOVA) was conducted for each of the dependent measures. The factors that were significant were further analyzed using Tukey test to discuss the difference among the factor levels. Table 1 Means and standard deviations of shifting frequency between web pages, perceived browsing time, and recalling accuracy of commodity under levels of the independent variables. Independent variables Music tempo
Playing method
Level
n
Shifting frequency Mean S.D.
Perceived time (s) Recalling accuracy (%)
TukeyMean S.D. Tukey Mean S.D.
Tukey
Fast
27 33.59 10.46 A
245 51 A
63.1
0.11
A
Slow
27 19.64 4.48
288 52 B
58.7
0.15
A
B
Continuous 18 25.33 7.76
AB
265 55 AB
71.2
0.09
A
Re-play
18 24.06 8.94
A
245 55 A
59.3
0.11
B
Different
18 30.44 13.73 B
289 51 B
52.2
0.12
B
Note: Different letters in Tukey group indicate significant difference at 0.05 level.
4.1 Shifting Frequency between Web Pages Results of the analysis of variances for shifting frequency have shown that the main effects of music tempo were significant (F(1, 48) = 44.422, p < 0.01). Participants had more shifting frequency between web pages for fast tempo (33.59) than for slow tempo (19.63). There was also a significant main effects for playing method (F (2, 48) = 3.472, p < 0.05). Multiple comparisons using Tukey test demonstrated that shifting frequency for re-playing method (24.06) was less than for different method (30.44). However, there was no significant difference between
Effect of Background Music Tempo and Playing Method on Shopping Website
445
continuous playing method (25.33) and re-playing method (24.06), and between continuous playing method (2.33) and different method (30.44). The interaction of music tempo and playing method was not significant on shifting frequency.
4.2 Perceived Browsing Time Analysis of variances shows that music tempo had significant difference on perceived browsing time (F(1, 48) = 9.926, p < 0.01). Participants had shorter perceived browsing time for fast tempo (245 s) than for slow tempo (288 s). There was also a significant main effects for playing method (F (2, 48) = 3.447, p < 0.05). Tukey test demonstrated that perceived browsing time for re-playing method (245 s) was shorter than for different method (289 s). However, there was no significant difference between continuous playing method (265 s) and replaying method (245 s), and between continuous playing method (265 s) and different method (289 s). The interaction of music tempo and playing method was not significant on perceived browsing time.
4.3 Recalling Accuracy of Commodity Analysis of variances shows that playing method had significant difference on recalling accuracy of commodity (F (2, 48) = 15.638, p < 0.01). Tukey test demonstrated that continuous playing method had greater accuracy (71.2%) than replaying method (59.3%) and different method (52.2%). However, there was no significant difference between re-playing method and different method. Music tempo had no significant difference on recalling accuracy of commodity. The interaction of music tempo and playing method was also not significant on recalling accuracy of commodity.
5 Discussion and Conclusion The purpose of the study was to examine the differential effect of music tempo and playing method on cognitive response in an online store. The results found that participants had more shifting frequency and shorter perceived browsing time under the faster tempo music condition than under the slower. The more shifting frequency for fast music tempo revealed that participants browsed a web page in a shorter time, and then changed to another page quickly. The findings may provide support that faster tempo music tends to be an arousal inducer rather than a distractor. According to Kaneman’s capacity model [10], faster tempo arouses the participant, which may motivate the participant to increase the processing resources. As expected, recalling accuracy could be improved by the additional mental resources. However, there was not significant difference for music tempo on recalling accuracy. The results can be explained by the finding of Day et al. [4] which indicated the faster tempo music was found to improve the accuracy of harder decision-making only, not the easier decision-making. In the present study,
446
C.-J. Lai et al.
participants were asked to browse the same shopping website including 3 books and 3 notebooks in four minutes. The online shopping website in the experimental had only 3 levels and 9 pages of 6 commodities. In addition to the background music, there was not any difference for the participants. It is not hard to complete the browsing task. With regard to playing method of background music, the results showed that continuous method and re-play method had the similar shifting frequency and perceived browsing time. Continuous method had the greatest recalling accuracy of commodity. Further, different method had more shifting frequency, longer perceived time, and less recalling accuracy. The findings may imply that different playing method tends to be a distractor rather than an arousal inducer. The method of using distinct music while browsing a different web page as background music may interfered with participants’ browsing on web contents. Extra attention resources are demanded to process the cognitive activities. On the contrary, continuous playing method may be an arousal inducer for the shorter perceived time and higher recalling accuracy. The less shifting frequency may infer that participants allocate more attention in a website page for the less interference from the background music. The results of this study support the belief that browsers’ behavior is affected by a retail environmental factor like background music. It confirms that music tempo is an important structural component of music that relate to the arousal response from the cognitive view. However, the relationships between the measures have not examined clearly in the present study. It is necessary to conduct another research to understand the relationships in the future. Acknowledgments. This research was funded by the National Science Council of Taiwan, Grant No. NSC 99-2221-E-167-020-MY2.
References [1] Bitner, M.J.: Servicescape: The impact of physical surroundings on consumers and employees. Journal of Marketing 56, 57–71 (1992) [2] Bruner II, G.C.: Music, mood, and marketing. Journal of Marketing 54(4), 94–104 (1990) [3] Dailey, L.: Navigational web atmospherics: Explaining the influence of restrictive navigation cues. Journal of Business Research 57, 795–803 (2004) [4] Day, R.F., Lin, C.H., Huang, W.H., Chuang, S.H.: Effects of music tempo and task difficulty on multi-attribute decision-making: An eye-tracking approach. Computers in Human Behavior 25, 130–143 (2009) [5] Donovan, R.J., Rossiter, J.R.: Store atmosphere: an environmental psychology approach. Psychology of Store Atmosphere 58(1), 34–57 (1982) [6] Eroglu, S.A., Machleit, K.A., Davis, L.M.: Atmospheric qualities of online retailing: A conceptual model and implications. Journal of Business Research 54(2), 177–184 (2001) [7] Garlin, F.V., Owen, K.: Setting the tone with the tune: A meta-analytic review of the effects of background music in retail settings. Journal of Business Research 59, 755– 764 (2006)
Effect of Background Music Tempo and Playing Method on Shopping Website
447
[8] Herrington, J.D., Capella, L.M.: Practical application of music in service settings. Journal of Services Marketing 8(3), 50–65 (1994) [9] Jones, D.: The cognitive psychology of auditory distraction: The 1997 BPS broad bent lecture. British Journal of Psychology 90(2), 167–187 (1999) [10] Kahneman, D.: Attention and Effort. Prentice-Hall, NJ (1973) [11] Kellaris, J.J., Cox, A.D., Cox, D.: The effect of background music on adprocessing: A contingency explanation. Journal of Marketing 57, 11–125 (1993) [12] Kellaris, J.J., Rice, R.C.: The influence of tempo, loudness, and gender of listener on responses to music. Psychology and Marketing 10(1), 15–29 (1993) [13] Mehrabian, A., Russel, J.A.: An Approach to Environmental Psychology. MIT Press, Cambridge (1974) [14] Milliman, R.E.: Using background music to affect the behaviors of supermarket shoppers. Journal of Marketing 46(3), 86–91 (1982) [15] Milliman, R.E.: The influence of background Music on the behavior of restaurant patrons. Journal of Consumer Research 13, 286–289 (1986) [16] North, A., Hargreaves, D.J.: The effects of music on responses to a dining area. Journal of Environmental Psychology 16(1), 55–64 (1996) [17] Oakes, S.: The influence of the musicscape within service environments. Journal of Services Marketing 14(7), 539–550 (2000) [18] Payne, J.W., Bettman, J.R., Johnson, E.J.: The adaptive decision maker. Cambridge University Press, New York (1993) [19] Seawright, K.K., Sampson, S.E.: A video method for empirically studying waitperception bias. Journal of Operations Management 25(5), 1055–1066 (2007) [20] Wu, C.S., Cheng, F.F., Yen, D.C.: The atmospheric factors of online storefront environment design: An empirical experiment in Taiwan. Information & Management 45, 493–498 (2008) [21] Yalch, R.F., Spangenberg, E.: Using store music for retail zoning: A field experiment. In: McAlister, L., Rothschild, M.L. (eds.) Advances in Consumer Research, vol. 20, pp. 632–636. Association for Consumer Research, Provo (1993) [22] Yalch, R.F., Spangenberg, E.R.: The effect of music in a retail setting on real and perceived shopping times. Journal of Business Research 49, 139–147 (2000)
Forecasting Quarterly Profit Growth Rate Using an Integrated Classifier You-Shyang Chen, Ming-Yuan Hsieh, Ya-Ling Wu, and Wen-Ming Wu
*
Abstract. This study proposes an integrated procedure based on four components: experiential knowledge, feature selection method, rule filter, and rough set theory for forecasting quarterly profit growth rate (PGR) in the financial industry. To evaluate the proposed procedure, a called PGR dataset collected from Taiwan’s stock market in the financial holding industry is employed. The experimental results indicate that the proposed procedure surpasses the listing methods in terms of both higher accuracy and fewer attributes. Keywords: Profit growth rate (PGR), Rough sets theory (RST), Feature selection, Condorcet method.
1 Introduction In economics, financial markets play a major role in the price determination of the financial instrument. The markets enable to set the prices not only for newly issued financial assets but also for an existing stock of financial assets. Financial markets climates and gains/losses can be vastly changed within seconds; thus, the accuracy of information for investment planning is very crucial. A rationally informed investor analyzes expected returns to decide whether they will participate in the stock market or not, but an irrational ‘sentiment’ investor depends on intuition or other factors unrelated to expected returns. In stock market, the most effective way to assess operating performance of a specific company is to conduct a profitability analysis that is the necessary tool for determining how to allocate investor’s resource to ensure profits for themselves best. A profitability analysis is You-Shyang Chen Department of Information Management, Hwa Hsia Institute of Technology Ming-Yuan Hsieh Department of International Business, National Taichung University of Education Ya-Ling Wu Department of Applied English, National Chin-Yi University of Technology Wen-Ming Wu Department of Distribution Management, National Chin-Yi University of Technology J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 449–458. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
450
Y.-S. Chen et al.
the most significant of financial ratios, which provide a definitive evaluation of the overall effectiveness of management based on the returns generated on sales and investment quarterly and/or annually. Financial ratios are widely used for modeling purposes by both practitioners and researchers. Practically, profit growth rate (PGR) is one of core financial ratios. Hence, this study aims to further explore quarterly PGR on financial statement. Based on related studies, the performances of proposed models for classification may be depended on the field of application strongly [1], the study goal [2], or the user experience [3][4]. Therefore, it is a valuable issue to find more suitable ways that are applied in the context of the financial markets. Recently, artificial intelligence (AI) techniques for classification had great popularity not only in the research area and but also in commercialization, and they were flexible enough to perform satisfactorily in a variety of application areas, including finance, manufacturing, health care, and service industry. It is therefore recommended to employ more efficient classifiers as evaluation tools based on artificial intelligence techniques in the financial industry. Due to rapid changing of information technology (IT) today, the most common tools [5] of AI techniques for classification (i.e., prediction or forecast), such as rough set theory (RST), decision trees (DT), and artificial neural networks (ANN), have become significant research trends to both academicians and practitioners [6][7]. As such, this study plans to propose an integrated procedure, combining feature selection method, rule filter, and rough sets LEM2 algorithm, for classifying quarterly PGR problems faced by interested parties, particularly in investors. A trustworthy forecasting model is extremely desired and expected by them.
2 Related Works This section mainly reviews related studies of profit growth rate, feature selection, rough set theory, and LEM2 rule extraction method and rule filter.
2.1 Profit Growth Rate Profitability represents the ability that company can raise earnings in a periodic time (may be monthly, quarterly or yearly). Profitability analysis regarded as one of the financial analyses is an important tool for judging whether is capable of value on investment portfolios or not by investors. Interestingly, financial ratios are used as the most common way of profitability analysis for academicians and practitioners [8]. Generally, financial ratios are categorized as profitability, stability, activity, cash flow and growth [9]. The growth is focused in this study. Growth includes profit growth rate (PGR), revenue growth rate (RGR), sales increase ratio (SIR), year-over-year change on growth rate (YoY), quarter-overquarter change on growth rate (QoQ) month-over-month change on growth rate (MoM), etc. [10]. Particularly, with view of stock investors, the PGR is an effective evaluation indicator for them to see how big the potential power of future development is, and measures how about the growth of future development for a target company that may be selected to investment portfolios [11].
Forecasting Quarterly Profit Growth Rate Using an Integrated Classifier
451
Simply, PGR refers to the proportion of periodic variation quarterly of operating profits for a specific company in the dissertation, and there are four quarters, named first, second, third and fourth quarter in a year. The equation of quarterly PGR can be calculated in the following: Quarterly PGR (%) = ((Current quarter Profit – Last quarter Profit)*100)/Last quarter Profit. However, the PGR can be either positive or negative depended on which PGR is bigger between this quarter and last quarter. Furthermore, the PGR has no limit with respect to its range. The positive PGR represents good and optimistic with company for future, whereas negative PGR is bad and pessimistic. With its application, the PGR is a critical determinant of a firm's success [12]. The higher the PGR of firms is, the better their future is [11]. However, the overall performance for PGR is varied greatly by industry and firm size. The higher PGR represents increasing the operating profit, which may be motivated by a growth of sale amount of products, contingent with enlarging the market share rate, and it is regarded as optimistic with future development for a firm; simultaneously, a better future of firms will trigger stock prices higher particularly.
2.2 Feature Selection In order to completely remove redundant information and enable quicker and more accurate training, a number of preprocess were done against the dataset. One of the critical aspects of any knowledge discovery process is feature selection. Feature selection will help to evaluate usefulness of attributes, to select relevant attributes and then to reduce the dimensionality of datasets by deleting unsuitable attributes that may degrade the performance of classification, and hence improves the performance of data mining algorithms [13]. Feature selection techniques can be roughly categorized into three broad types, the filter model [13] [14], the wrapper model [15] [16], and the hybrid model [16] [17]. From Witten and Frank [18], the five feature selection methods of the filter model, ReliefF, InfoGain, Chi-square, Gain Ratio, Consistency, OneR, and Cfs are well known in academic work and are adopted in common usage. Thus, the study mainly introduces the five-subset evaluator methods for selecting features, as follows [18]: (1) Cfs (Correlation based feature selection) [19]:The method evaluates subsets of attributes that are highly correlated with the class while having the low intercorrelation is preferred. (2) Chi-squared [20]: Chi-squared method evaluates the worth of an attribute by computing the value of the chi-squared statistic with respect to the class. (3) Consistency [21]: This method values an attribute subset by using the level of consistency in the class values when the training instances are projected onto the attribute subset. (4) Gain Ratio [22]: The method evaluates the worth of an attribute by measuring the gain ratio with respect to the class. (5) InfoGain [23]: InfoGain is one of the simplest attribute ranking methods and is often used in text categorization applications. The method evaluates the worth of an attribute by measuring the information gain with respect to the class.
452
Y.-S. Chen et al.
2.3 Rough Set Theory Rough set theory (RST), first proposed by Pawlak [24], employs mathematical modeling to deal with data classification problems. Let B ⊆ A and X ⊆ U be an information system. The set X is approximated using information contained in B by constructing lower and upper approximation sets, BX = {x | [ x ]B ⊆ X } and
BX = {x | [ x ]B ∩ X ≠ ∅} respectively. The elements in BX can be clas-
sified as members of X by the knowledge in B. However, the elements in BX can be classified as possible members of X by the knowledge in B. The set
BN B ( x ) = BX − BX is called the B-boundary region of X and it consists of those objects that cannot be classified with certainty as members of X with the knowledge in B [25]. The set X with respect to the knowledge in B is called ‘rough’ (or ‘roughly definable’) if the boundary region is non-empty.
2.4 The LEM2 Rule Extraction Method and Rule Filter Rough set rule induction algorithms were implemented for the first time in a LERS (Learning from Examples) [26] system. A local covering is induced by exploring the search space of blocks of attribute-value pairs, which are then converted, into the rule set. The LEM2 (Learning from Examples Module, version 2) algorithm [27] for rule induction is based on computing a single local covering for each concept. The large number of rules limits the classification capabilities of the rule-set as some rules are redundant or of ‘poor quality.’ Some rule-filtering algorithms [28] can be used to reduce the number of rules.
3 Methodology The nature filled with risk and uncertainty of financial markets is that the greater the uncertainty is, the greater the price volatility is. Conversely, so is the risk of investment in the financial markets. The changing economic conditions and characteristic of uncertainty and risk have also existed and have made a financial forecast even more difficult for investors. Hence, increasing the need for a more reliable choice to forecast quarterly PGR accurately is obvious for investors. Accordingly, this study aims at to construct an integrated procedure for classifying PGR and for improving the accuracy of a rough set classification system. Figure 1 illustrates the flowchart of the proposed procedure.
Forecasting Quarterly Profit Growth Rate Using an Integrated Classifier
453
1. Collect practical dataset 2. Preprocess the dataset
Feature Selection
3. Select the core attributes 4. Build the decision table
RST (LEM2)
5. Extract the decision rules 6. Improve rule quality
Rule Filter
7. Evaluate the experimental results
Other Methods
Fig. 1 The flowchart of the proposed procedure
The proposed procedure is introduced step-by-step roughly as follows: Step 1: Collect the attributes of practical dataset by experiential knowledge. Step 2: Preprocess the dataset. Step 3: Select the core attributes by five feature selection methods. Step 4: Build the decision table by global cuts. Step 5: Extract the decision rules by LEM2 algorithm. Step 6: Improve rule quality by rule filter. Step 7: Evaluate the experimental results.
4 Empirical Case Study To verify the proposed procedure, a collected dataset will be used in the experiment. The computing processes of using the target dataset are expressed in detail as follows: Step 1: Collect the attributes of practical dataset by experiential knowledge. At first, based on experiential knowledge of the author (having ever-skillful experiences working in the financial industry about 14 years and individual investments in this field over 20 years) in the financial markets, a practically collected dataset is selected from 70 publicly traded financial holding stocks listed on Taiwan’s TSEC and OTC as experimental dataset that is quarterly financial reports during 2004-2006. There are a total of nine selected attributes: (i) F_assets, (ii)T_assets, (iii) T_liabilities, (iv) O_expenses, (v) O_income, (vi) T_salary, (vii) A_costs, (viii) C_type, and (ix) PGR (Class); except for the ‘C_type,’ all attributes
454
Y.-S. Chen et al.
are continuous data. They are referred to total fixed assets, total assets, total liabilities, operating expenses, operating income, total salary, agency costs, company type, and profit growth rate of accounting terms on quarterly financial reports, respectively. The first eight items are condition attributes and the last ‘PGR’ is a decision attribute. Therefore, the target dataset is called the PGR dataset in this study. Table 1 shows all attributes information in the PGR dataset sample. Table 1 The attributes information in the PGR dataset
1 2 3 4 5 6 7
Attribute name F_assets T_assets T_liabilities O_expenses O_income T_salary A_costs
Attribute information numeric numeric numeric numeric numeric numeric numeric
Number of value continuous continuous continuous continuous continuous continuous continuous
8
C_type
symbolic
4
9
PGR (Class) numeric
No.
continuous
Note Min: 15990 ~ Max: 47131292 Min: 8390514 ~ Max: 2380547669 Min: 5375826 ~ Max: 2297331093 Min: -1994478 ~ Max: 11852475 Min: -34328398 ~ Max: 9318711 Min: -701805 ~ Max: 7535360 Min: -4530602 ~ Max: 188208927 A, B, C and D refer to different types of company Min: -1671.79 ~ Max: 247.28
(Unit: thousands in New Taiwan Dollars). Table 2 The partial raw data of the PGR dataset
No. F-assets 1 25071 2 25119 3 25054 4 25038 5 23386
T-assets 1355605 1337366 1296335 1299946 1574022
T-liabilities 1273420 1258024 1214666 1219787 1486897
… … … … … …
A-costs 11149 4742 4930 4368 7242
632 633 634 635 636
2380547 1880395 1674573 1674420 1656606
2297331 1814067 1583874 1586178 1571092
… … … … …
6982 9612 11558 6677 9017
36016 26887 25612 25613 25457
C-type Class A P A F A F A F A F D D D D D
F F F F F
Step 2: Preprocess the dataset. To preprocess the target dataset to make knowledge discovery easier is needed; accordingly, data are transformed into EXCEL format that is more easily and effectively processed for use in the study. After preprocessing this dataset, a total of 636 instances are contained. Based on experiential knowledge of the author, the
Forecasting Quarterly Profit Growth Rate Using an Integrated Classifier
455
‘PGR (Class)’ (i.e., decision attribute) is partitioned into three classes: F (Fair, PGR > 100%), G (Good, PGR = 0% ~ 100%), and P (Poor, PGR < 0%). The raw data of the PGR dataset is showed in Table 2. Step 3: Select the core attributes by five feature selection methods. The key goal of feature selection is evaluating usefulness of attributes, selecting relevant attributes, and removing redundant and/or irrelevant attributes. Hence, five feature selection methods, Cfs, Chi-square, Consistency, Gain Ratio, and InfoGain, are accordingly applied to build these purposes from the PGR dataset. In the following acts, there are three sub-steps used to select core attributes. Step 3-1: Use Consistency Principle to find the more consistent outcomes for estimating the potential attributes among the five feature selection methods; Step 3-2: Use Condorcet method [29] to compute the scores for estimating the potential attributes among the five feature selection methods also; next, the cut-off point is set to Scorei+1 > Scorei, which is the maximal difference between Scorei+1 and Scorei. Step 3-3: Integrate the two methods above to find out the core attributes. Consequently, only four core attributes are remained. The selected key attributes are ‘F-assets,’ ‘T-assets,’ ‘T-liabilities,’ and ‘A_costs,’ which are regarded as input attributes (condition attributes) for the following steps. Step 4: Build the decision table by global cuts. From the condition attributes selected in Step 3, in addition to a class (decision attribute), which are used to build a decision table by global cuts [30] accordingly. Step 5: Extract the decision rules by LEM2 algorithm. Based on the derived decision table, the decision rules are determined by the rough sets LEM2 algorithm for classifying PGR. Step 6: Improve rule quality by rule filter. The more the generated rules, the more the complexity of making prediction. Thus, the rule filter algorithm in the rough sets guides a filtering process, with rules below the support threshold (< 2) being eliminated, because it refers to only one example coinciding with those rules. The performance of the refined rules will be described conclusively in next step for convenient readings. Step 7: Evaluate the experimental results. To verify the proposed procedure, the PGR dataset is randomly split into two sub-datasets: the 67% training dataset and the 33% testing set; the former thus consists of 426 observations while the latter has 210 observations. The experiments are repeated ten times with the 67%/33% random split using four different methods, Bayes Net [31], Multilayer Perceptron [32], and Traditional Rough Sets [33], and the proposed procedure, respectively. Next, the average accuracy is calculated with standard deviation. Finally, two evaluation criteria, the accuracy with standard deviation, number of attributes, are adopted as the comparison standard. Table 3 clearly illustrates the experimental results in the PGR dataset.
456
Y.-S. Chen et al.
Experimental results Regarding the comparison of them, the proposed procedure has the highest accuracy (94.05%), and thus significantly outperforms other listing methods. Moreover, the proposed procedure uses the fewer attributes (5) than the listing methods (9). In detailed view, the accuracy is significantly improved when the proposed procedure is compared to traditional Rough Sets. This information solidifies the demand of feature selection, which eliminates the redundant or useless attributes, and supports evidence that the reduced attributes can effectively improve accuracy rate through this integrated feature selection in the PGR dataset. As to the standard deviation of the accuracy, the proposed procedure is second low. The lower the standard deviation, the higher its performance, and vice versa. Particularly, it is recommended that the proposed procedure is suitable and viable for classifying PGR in the PGR dataset. Table 3 The comparison results of four methods for running 10 times in the PGR dataset
Method Bayes Net (Murphy, 2002) Multilayer Perceptron (Lippmann, 1987) Traditional Rough Sets (Bazan and Szczuka, 2001) The proposed procedure
Testing accuracy
Attribute
89.95% (0.024)
9
87.95% (0.017)
9
82.87% (0.033)
9
94.05% (0.018)
5
5 Conclusions This study has proposed an effective procedure for classifying the PGR of financial industry, based on utilizing an integrated feature selection approach to select the four condition attributes of financial data, and a rough sets LEM2 algorithm to implement prediction behaviors. To verify the proposed procedure, a practical PGR dataset has employed. The experimental results using the proposed procedure show that the procedure has satisfactory accuracy (of 94.05%) and fewer attributes (of 5), and thus outperforms other listing methods. One suggestion exists for future research. Apply the proposed procedure to more in-depth analysis of market structure toward other emerging markets and developed markets.
References [1] Chen, C.C.: A model for customer-focused objective-based performance evaluation of logistics service providers. Asia Pacific Journal of Marketing and Logistics 20(3), 309–322 (2008) [2] Fan, H., Mark, A.E., Zhu, J., Honig, B.: Comparative study of generalized Born models: Protein dynamics. Chemical Theory and Computation Special Feature 102(19), 6760–6764 (2005)
Forecasting Quarterly Profit Growth Rate Using an Integrated Classifier
457
[3] Barber, S.: Creating effective load models for performance testing with incomplete empirical data. In: Proceedings of the Sixth IEEE International Workshop, pp. 51–59 (2004) [4] Dasgupta, C.G., Dispensa, G.S., Ghose, S.: Comparative the predictive performance of a neural network model with some traditional market response models. International Journal of Forecast 10, 235–244 (1994) [5] Zopounidis, C., Doumpos, M.: Multicriteria classification and sorting methods: a literature review. European Journal of Operational Research 138, 229–246 (2002) [6] Dunham, M.H.: Data mining: Introductory and advanced topics. Prentice Hall, Upper Saddle River (2003) [7] Ravi Kumar, P., Ravi, V.: Bankruptcy prediction in banks and firms via statistical and intelligent techniques – A review. European Journal of Operational Research 180, 1– 28 (2007) [8] Andres, J.D., Landajo, M., Lorca, P.: Forecasting business profitability by using classification techniques: A comparative analysis based on a Spanish case. European Journal of Operational Research 167, 518–542 (2005) [9] Min, S.H., Lee, J., Han, I.: Hybrid genetic algorithms and support vector machines for bankruptcy prediction. Expert Systems with Applications 31, 652–660 (2006) [10] Press, E.: Analyzing Financial Statements. Lebahar, Friedman (1999) [11] Loth, R.B.: Select winning stocks using financial statements, IL, Dearborn, Chicago (1999) [12] Covin, J.G., Green, K.M., Slevin, D.P.: Strategic process effects on the entrepreneurial orientation–sales growth rate relationship. Entrepreneurship Theory and Practice 30(1), 57–82 (2006) [13] Hall, M.A., Holmes, G.: Benchmarking feature selection techniques for discrete class data mining. IEEE Transactions on Data Engineering 15(3), 1–16 (2003) [14] Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering-a filter solution. In: Proceedings of Second International Conference. Data Mining, pp. 115– 122 (2002) [15] Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997) [16] Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(4), 491–502 (2005) [17] Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of 18th International Conference. Machine Learning, pp. 74–81 (2001) [18] Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishers, USA (2005) [19] Hall, M.A.: Correlation-based feature subset selection for machine learning. Thesis submitted in partial fulfilment of the requirements of the degree of Doctor of Philosophy at the University of Waikato (1998) [20] Kononenko, I.: Estimating attributes: Analysis and extensions of relief. In: Proceedings of the Seventh European Conference on Machine Learning, pp. 171–182. Springer, Heidelberg (1994) [21] Liu, H., Setiono, R.: A probabilistic approach to feature selection - A filter solution. In: Proceedings of the 13th International Conference on Machine Learning, pp. 319– 327 (1996) [22] Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification, 2nd edn. Wiley and Sons, Inc., New York (2001)
458
Y.-S. Chen et al.
[23] Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the International Conference on Information and Knowledge Management, pp. 148–155 (1998) [24] Pawlak, Z.: Rough sets. Informational Journal of Computer and Information Sciences 11(5), 341–356 (1982) [25] Pawlak, Z.: Rough Sets. Theoretical Aspects of Reasoning About Data. Kluwer, Dordrecht (1991) [26] Grzymala-Busse, J.W.: LERS—a system for learning from examples based on rough sets. In: Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, pp. 3–18 (1992) [27] Grzymala-Busse, J.W.: A new version of the rule induction system LERS. Fundamenta Informaticae 31(1), 27–39 (1997) [28] Nguyen, H.S., Nguyen, S.H.: Analysis of stulong data by rough set exploration system (RSES). In: Berka, P. (ed.) Proc. ECML/PKDD Workshop, pp. 71–82 (2003) [29] Condorcet method. Wikipedia, http://en.wikipedia.org/wiki/Condorcet_method (retrieved October 31, 2010) [30] Bazan, J., Nguyen, H.S., Nguyen, S.H., Synak, P., Wróblewski, J.: Rough set algorithms in classification problem. In: Polkowski, L., Tsumoto, S., Lin, T. (eds.) Rough Set Methods and Applications, pp. 49–88. Physica-Verlag, Heidelberg (2000) [31] Murphy, K.P.: Bayes Net ToolBox, Technical report, MIT Artificial Intelligence Labortary (2002), Downloadable from http://www.ai.mit.edu/~murphyk/ [32] Lippmann, R.P.: An introduction to computing with neural nets. IEEE Acoustics, Speech and Signal Processing Magazine 4(2), 4–22 (1987) [33] Bazan, J., Szczuka, M.: RSES and RSESlib - A collection of Tools for rough set. LNCS, pp. 106–113. Springer, Berlin (2001)
Fuzzy Preference Based Organizational Performance Measurement Roberta O. Parreiras and Petr Ya Ekel*
Abstract. This paper introduces a methodology for constructing a multidimesional indicator designed for organizational performance measurement. The methodology involves the application of fuzzy models and methods of their analysis. Its use requires the construction of fuzzy preference relations by means of the comparison of performance measures with respect to a reference standard defined as a predetermined scale consisting of linguistic terms. The exploitation of the fuzzy preference relations is carried out by means of the Orlovsky choice procedure. An application example related to the organizational performance evaluation with the use of the proposed methodology is considered, in order to demonstrate its applicability.
1 Introduction In response to the challenges raised by a competitive market, there has been an increasing use of organizational performance measurement systems in the business management activities (Nudurupati et al. 2010). Traditionally, organizational performance has been measured with financial indexes (Kaplan and Norton 1996), (Bosilj-Vuksice et al. 2008). Nowadays, non-financial measures also play an important role in business management, being utilized in the strategic planning to reflect the potential of an organization to obtain financial gains in a near future (Kaplan and Norton 1996), (Chapman et al. 2007), and in the manufacturing and distribution management to control the regular operations (Abdel-Maksoud et al. 2005). In parallel, the complexity of processing multiple performance estimates and of generating effective recommendations from their analysis has motivated researchers to develop models and methods for constructing and analyzing multidimensional performance measures. One possible approach for carrying out the performance evaluation is to regard it as a multiple criteria decision-making problem (Bititci et al. 2001; Clivillé et al. 2007; Yu and Hu 2010). Roberta O. Parreiras · Petr Ya Ekel Pontifical Catholic University of Minas Gerais, Av. Dom José Gaspar, 500, 30535-610, Belo Horizonte, MG, Brazil e-mail:
[email protected],
[email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 459–468. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
460
R.O. Parreiras and P.Y. Ekel
In this paper, we present a methodology for constructing a multidimensional performance indicator, which is based on fuzzy models and methods for multicriteria analysis. Its application requires the construction of fuzzy preference relations by means of the comparison of performance measures with respect to reference standards, which are defined as predetermined scales consisting of linguistic terms (linguistic fuzzy scales (LFSs)). The processing of the fuzzy preference relations is carried out by means of the Orlovsky choice procedure (Orlovsky 1978), which provides the degree of fuzzy nondominance of each performance measure with respect to the predetermined references. The aggregated degree of fuzzy nondominance is taken as a measure of multidimensional performance. An advantageous aspect of the methodology being proposed lies in the fact that it is based on the same theoretical foundation (Orlovsky choice procedure), as several multicriteria decision-making methods (Ekel et al. 2006; Pedrycz et al. 2010) as well as some group decision-making methods (Herrera-Viedma et al. 2002; Pedrycz et al. 2010). Such common foundation serves as a means for the integration of those classes of methods in the development of a computing system of management tools, where each one of those classes of technologies can communicate with each other their respective results without the need of much data processing or data conversions. The need of the development of integrated platforms of management tools has been recognized in the literature (Berrah et al. 2000; Shobrys and White 2002). The paper begins by presenting, in Section 2, some basic issues related to performance measurement and fuzzy models, which are necessary for understanding our proposal. In Section 3, we introduce a version of the fuzzy number ranking index (FNRI) proposed by Orlovsky (1981), which is extended here in order to deal adequately with fuzzy estimates, in a context where they may have no intersections. Section 4 briefly describes Orlovsky choice procedure and show how it can be applied to construct a multidimensional performance indicator based on the degree of nondominance of assessed quantities being compared with reference standards. An application example related to the organizational performance evaluation on the basis of the proposed methodology is considered in Section 5. Finally, in Section 6, we draw our conclusions.
2 Fuzzy Performance Measures In the design of performance measurement systems, indicators of very different nature (for instance, "degree of production efficiency", "level of customer satisfaction", etc.) are utilized (Nudurupati et al. 2010). We call as performance indicator each index of quantitative or qualitative character that reflects the status of the organization considering the goals to be achieved (Popova and Sharpankykh 2010). In the methodology being proposed, fuzzy performance indicators are utilized. Each fuzzy performance indicator Ip, p=1,…,m is defined in a universe of discourse Fp and is associated with a LFS in the following way:
LFS( I p ) = {X 1p , X 2p ,..., X npp } ,
(1)
Fuzzy Preference Based Organizational Performance Measurement
461
where each linguistic estimate X kp , k = 1,..., n p , is characterized by both a word (or sentence) from a linguistic term set and a fuzzy set with membership function μ X p ( f p ) : Fp → [0,1] , k = 1,..., n p . The linguistic estimates belonging to LFS(Ip), k
p=1,…,m, form a reference standard for the evaluation of the “goal satisfaction” degree. Such reference standard can be defined, for instance, in periodic meetings of directors or managers during which they review and update the strategic goals for the enterprise. Finally, it is important to highlight the role of a LFS for the construction of the multidimensional performance indicator: it allows the transformation of physical measures (which can be assessed on different scales), into “goal satisfaction” degrees. Despite of its importance, the aspect of defining a LFS is not further addressed in this paper. Among the correlated works which address this subject, we can name (Pedrycz 2001; Brouwer 2006; Pedrycz et al. 2010).
3 Construction of Fuzzy Preference Relations for Performance Measurement As it was indicated above, the methodology proposed in the present paper is based on processing of fuzzy preference relations. A fuzzy nonstrict preference relation Rp (Orlovsky 1978) consists in a binary fuzzy relation, which is a fuzzy set with bi-dimensional membership function μ R p ( X k , X l ) : X × X → [0,1] . In essence,
μ R p ( X k , X l ) indicates the degree to which Xk is at least as good as Xl in the unit interval. The fuzzy nonstrict preference relations can be constructed for each fuzzy performance indicator by means of the comparison of an estimated quantity (which can be represented as a real number or a fuzzy estimate) with a standard reference expressed as a LFS. This comparison can be realized (Ekel et al. 1998) by applying the FNRI proposed by Orlovsky (1981). The rationality of the use of this FNRI is justified in (Ekel et al 2006). In general, when we are dealing with a performance indicator which can be accommodated (measured) on a numerical scale and the essence of preference behind such performance indicator is coherent with the natural order ≥ along the axis of measured values, then the following expressions can be utilized to compare a pair of fuzzy estimates Xk and Xl (Orlovsky 1981):
μ Rp ( X k , X l ) = μ Rp ( X l , X k ) =
max
f p ( X k )≥ f p ( X l )
max
f p ( X k )≤ f p ( X l )
min( μ X p ( f p ( X k )), μ X p ( f p ( X l ))) ,
(2)
min ( μ X p ( f p ( X k )), μ X p ( f p ( X l ))) ,
(3)
k
k
l
l
if the considered indicator is associated with the need of maximization.
462
R.O. Parreiras and P.Y. Ekel
In (2) and (3), the operation min is related (Zimmermann 1990) to constructing the Cartesian product Fp ( X k ) × Fp ( X l ) , where Fp(Xk) represents the universe of discourse Fp, which is associated with the evaluation of μ X p ( f p ( X k )) , when k
processing the Cartesian product. The operation max is carried out for the region f p ( X k ) ≥ f p ( X l ) , if we use (2) and, for the region f p ( X k ) ≤ f p ( X l ) , if we use (3). If the performance indicator is associated with the need of minimization, then (2) and (3) are written for f p ( X k ) ≤ f p ( X l ) and f p ( X k ) ≥ f p ( X l ) , respectively. Simple examples of applying (2) and (3) are given in (Pedrycz et al. 2010). Characterizing the considered FNRI, it is necessary to distinguish two extreme situations. The first one is associated with the cases when we cannot distinguish two fuzzy quantities, Xk and Xl, whose Core areas intersect (the Core of a fuzzy set X is the set of all elements of the universe that come with membership grades equal to 1 (Pedrycz et al. 2010)). In such cases, (2) and (3) produce μ R p ( X k , X l ) = 1 and μ R p ( X l , X k ) = 1 , respectively. It means that Xk is indifferent to Xl. This is to be considered as a desirable result and can be interpreted as the impossibility of identifying which is the best (highest or lowest) of the compared alternatives due to the uncertainty of the available information. The second situation is associated with the cases when the fuzzy estimates do not intersect, and (2) and (3) produce one between the following results: • μ R p ( X k , X l ) = 1 and μ R p ( X l , X k ) = 0 , if Xk is better than Xl; • μ R p ( X k , X l ) = 0 and μ R p ( X l , X k ) = 1 , if Xl is better than Xk.
In such cases, the FNRI does not reveal how much better Xk (or Xl) is than Xl (or Xk), which is an important aspect to be considered in the construction of a performance indicator. One direct way around this situation is associated with reconstructing the fuzzy estimates of a LFS. However, it is possible to indicate another way, which is more natural and acceptable from the practical point of view to adequately handle fuzzy estimates with no intersection. This way permits one to distinguish how far a performance measure is from the highest degree of goal satisfaction and consists in the inclusion of a term D ( X k , X l ) to expressions (2) and (3), as follows:
μRp ( X k , X l ) =
⎫⎪ 1 ⎧⎪ D( X k , X l ) min( μ X p ( f p ( X k )), μ X p ( f p ( X l ))) + ⎨ f ( Xmax ⎬, k l 2 ⎪⎩ p k )≥ f p ( X l ) ( Fmax p − Fmin p ) ⎪⎭
(4)
μRp ( X l , X k ) =
⎫⎪ 1 ⎧⎪ D( X k , X l ) min( μ X p ( f p ( X k )), μ X p ( f p ( X l ))) + ⎨ f ( Xmax ⎬. k l 2 ⎪⎩ p k )≤ f p ( X l ) ( Fmax p − Fmin p ) ⎪⎭
(5)
Fuzzy Preference Based Organizational Performance Measurement
463
In the expressions (4) and (5), Fmaxp and Fminp represent the maximum value and the minimum value of the universe of discourse Fp. The term D ( X k , X l ) is given by (Lu et al. 2006)
(
)
min | a p − b p | , if μ R p ( X k , X l ) = 0 or μ R p ( X l , X k ) = 0; ⎧ ⎪∀a p∈Supp( μ X k ( f p ) ) D ( X k , X l ) = ⎨∀bp∈Supp( μ X l ( f p ) ) , (6) ⎪0, otherwise. ⎩ where a p , b p ∈ Fp and the operation Supp provides the support of a fuzzy set (a set of all elements of the universe of discourse with nonzero membership degrees in that set (Pedrycz et al. 2010)).
4 Multidimensional Performance Measurement Based on Fuzzy Preference Relations Let us consider a procedure for the construction of a multidimensional performance indicator based on a set of m fuzzy performance indicators I1,…,Im, each one with its respective LFS. Initially, it is necessary to obtain a set X p = { X 0p , X 1p ,..., X np } for each criterion, where X 0p represents the value being measured for Ip and X 1p ,..., X np are the fuzzy estimates corresponding to the linguistic terms of LFS(Ip). Then, by applying (4) and (5) to all pairs belonging to X p × X p , a fuzzy nonstrict preference relation Rp is constructed for each performance indicator Ip, p=1,…,m. Finally, those matrices can be exploited by applying the Orlovsky choice procedure, in order to obtain the fuzzy nondominance degree of X 0p (which is utilized as a performance measure), as it is described below. The Orlovsky choice procedure requires the construction of fuzzy strict preference relations and of fuzzy nondominance sets. The strict preference relation Pp ( X kp , X lp ) corresponds to the pairs ( X kp , X lp ) that satisfy ( X kp , X lp ) ∈ R p and
( X lp , X kp ) ∉ R p and can be constructed as Orlovsky (1978):
μ Pp ( X kp , X lp ) = max{μ R p ( X kp , X lp ) − μ R p ( X lp , X kp ), 0} .
(7)
As Pp ( X lp , X kp ) describes the set of all fuzzy estimates, X kp ∈ X p that are strictly dominated by (or strictly inferior to) X lp ∈ X p , its compliment Pp ( X lp , X kp ) provides the set of alternatives that are not dominated by other alter-
natives from Xp. Therefore, in order to meet the set of alternatives from Xp that are not dominated by any other alternative, it suffices to obtain the intersection of all
464
R.O. Parreiras and P.Y. Ekel
Pp ( X lp , X kp ) . This intersection is the set of nondominated objects belonging to Xp
with the membership function
μ ND p ( X kp ) = minp (1 − μ Pp ( X lp , X kp )) = 1 − maxp μ Pp ( X lp , X kp ) . X l ∈X
X l ∈X
(8)
Having a collection of measures X 0 = { X 01 ,..., X 0m } at hand, one can obtain the degrees of fuzzy nondominance μ ND p ( X 0p ) , for each performance indicator p=1,…,m, and aggregate them in order to estimate the multidimensional performance degree for the enterprise. Different aggregation operators can be applied in this context. The use of an intersection operation is suitable, when it is necessary to verify at which level the organization simultaneously satisfies all the goals associated with the performance indicators I1 and I2 and ... and Im. Among the operators that can be utilized to implement the intersection operation, the min operator allows one to construct a multidimensional performance indicator Gmin ( X 0 ) = min( μ ND1 ( X 01 ),..., μ NDm ( X 0m )) ,
(9)
under a completely noncompensatory approach, in the sense that the high satisfaction of some goals does not relieve the remaining ones from the requirement of being satisfied. Such pessimistic approach gives emphasis to the worst evaluations, which may be particularly advantageous to identify the organization weaknesses. On the other hand, the use of the union operation is also admissible, when it is necessary to verify at which level the organization satisfies at least one goal, which can be associated with I1 or I2 or … or Im. The use of max operator to implement the union operation allows one to construct a multidimensional performance indicator Gmax ( X 0 ) = max( μ ND1 ( X 01 ),..., μ NDm ( X 0m )) ,
(10)
under an extremely compensatory approach, in the sense that the high level of satisfaction of any goal is sufficient. Finally, it can be useful to apply the so-called ordered weighted aggregation operator (OWA) (Yager 1995), which can produce a result that is more compensatory than min or that is less compensatory than max under a proper adjustment of its weights. An OWA operator of dimension m corresponds to a mapping function [0, 1]m → [0, 1] . Here it is utilized to aggregate a set of m normalized values
μ ND1 ( X 01 ),..., μ NDm ( X 0m ) , in such a way that m
GOWA ( X 0 ) =
∑w b , i i
(11)
i =1
where bi is the ith largest value among μ ND1 ( X 01 ),..., μ NDm ( X 0m ) and the weights w1, …, wm satisfy the conditions w1+ …+ wm=1 and 0 ≤ wi ≤ 1 , i=1,…,m.
Fuzzy Preference Based Organizational Performance Measurement
465
The major attractive aspect of using OWA is associated with the fact that it allows to indirectly specify the weights by using linguistic quantifiers. Here, OWA is utilized with the linguistic quantifier “majority” to indicate at which level the organization satisfies most of the goals (Yager 1995).
5 Application Example In order to demonstrate the applicability of the proposed methodology, it is applied to measure the performance of an enterprise. Table 1 shows the performance indicators being considered and their corresponding evaluations. The fuzzy performance indicators I1, I2, and I5 have a maximization character and I3 and I4 have a minimization character. Figure 1 presents the LFS defined for each performance indicator (the LFSs have different granularities to respect the uncertainty degree of the perception of the managers being invited to participate in the definition of the reference standards). With the use of (4) and (5), fuzzy preference relations are constructed for each performance indicator. By applying subsequently expressions (7) and (8) to the fuzzy preference relations, the fuzzy nondominance degrees shown in Table 2 are obtained. Table 2 also shows the fuzzy nondominance degrees aggregated with the use of min, max and OWA operators. As it can be seen, the low value of Gmin(X0) suggests that the enterprise still has to deal with the corresponding weaknesses. The high value of Gmax(X0) indicates that the organization has already achieved at least one of its goals, which means that more aggressive goals can be established for that performance indicator. The majority of goals have been satisfied at a degree 0.72, as indicated by GOWA(X0). It is worth noting that the value of Gmin(X0), Gmax(X0), and GOWA(X0), depends only on the comparison of X 0p with X *p , being X *p the fuzzy estimate from LFS(Ip), which is associated with the highest level of goal satisfaction. In this way, only the entries R p ( X 0p , X *p ) and R p ( X *p , X 0p ) are required to obtain the aggregated fuzzy nondominance degree of X 0 . However, in order to know the position of X 0p in the LFS(Ip), it is valuable to obtain the complete matrix Rp, as well as the values of Gmin ( X kp ) , Gmax ( X kp ) , and GOWA ( X kp ) , for all X kp ∈ X p . For instance, consider the fuzzy preference relation associated with I1, which is given by
μ R1
⎡ 1 ⎢0.38 =⎢ ⎢0.53 ⎢ ⎣ 1
1 1 1 0.81 1
1
1
1
0 .8 ⎤ 0.5 ⎥ ⎥ 0.725⎥ ⎥ 1 ⎦
(12)
466
R.O. Parreiras and P.Y. Ekel
Fig. 1 LFS associated with each fuzzy performance indicator. Table 1 Fuzzy performance indicators. Performance indicators
Assessments
I1: productivity as average monthly ratio (output/input)
0.74
3
I2: production amount (value per hour in 10 USD) ) expressed as a Gaussian distribution with ( μ , σ ) = ( 70,4 ) Gaussian fuzzy number coming from a Gaussian distribution 3
I3: production costs (per year in 10 USD)
60900 3
I4: internal and external failure costs (per year in 10 USD)
190
I5: customers satisfaction expressed as a fuzzy estimate (a trapezoidal fuzzy number) coming from LFS(I5)
Average
Table 2 Fuzzy nondominance degrees for each fuzzy preference indicator and global fuzzy nondominance degrees obtained with min, max and OWA. Indices and operators
μ ND ( X 0 )
I1
I2
I3
I4
I5
min
Max
OWA
0.8
0.7
0.71
1
0.5
0.5
1
0.72
Fuzzy Preference Based Organizational Performance Measurement
467
and the corresponding fuzzy nondominance set μ ND1 = [0.8 0.38 0.53 1] .
(13)
By analyzing (13), we can see that the level of fuzzy nondominance of X 01 is between the level of fuzzy nondominance of the fuzzy estimates associated with the linguistic terms “Average” and “High”, being more similar to “High”.
6 Conclusions We presented a methodology for constructing a multidimensional performance indicator. Among its advantageous aspects, we can name: • Once the managers have constructed a LFS for each performance indicator, the FNRI makes it possible to compare the assessed values with the standard references without the participation of managers (until new goals for the enterprise and, as a consequence, a new reference standard become needed). • The proposed methodology does not involve the use of defuzzifying operations, which usually implicates loss of information or unjustified simplification of the problem. • The results of (Pedrycz et al. 2010), which are associated with multicriteria group decision-making in a fuzzy environment, can be utilized to extend this methodology to include the input of a group of managers and experts. Acknowledgments. This research is supported by the National Council for Scientific and Technological Development of Brazil (CNPq) - grants PQ:307406/2008-3 and PQ:307474/2008-9.
References [1] Abdel-Maksoud, A., Dugdale, D., Luther, R.: Non-financial performance measurement in manufacturing companies. The Br. Account Rev. 37, 261–297 (2005), doi:10.1016/j.bar.2005.03.003 [2] Berrah, L., Mauris, G., Foulloy, L., Haurat, A.: Global vision and performance indicators for an industrial improvement approach. Comput. Ind. 43, 211–225 (2000), doi:10.1016/S0166-3615(00)00070-1 [3] Bititci, U.S., Suwignjo, P., Carrie, A.S.: Strategy management through quantitative modelling of performance measurement systems. Int. J. Prod. Econ. 69, 15–22 (2001), doi:10.1016/S0925-5273(99)00113-9 [4] Bosilj-Vuksice, V., Milanovic, L., Skrinjar, R., Indihar-Stemberger, M.: Organizational performance measures for business process management: a performance measurement guideline. In: Proc. Tenth Conf. Comput. Model. Simul. (2008), doi:10.1109/UKSIM.2008.114 [5] Brouwer, R.K.: Fuzzy set covering of a set of ordinal attributes without parameter sharing. Fuzzy Sets Syst. 157, 1775–1786 (2006), doi:10.1016/j.fss.2006.01.004
468
R.O. Parreiras and P.Y. Ekel
[6] Chapman, C.S., Hopwood, A.G., Shields, M.D.: Handbook of management accounting research, vol. 1. Elsevier, Amsterdam (2007) [7] Clivillé, V., Berrah, L., Mauris, G.: Quantitative expression and aggregation of performance measurements based on the MACBETH multi-criteria method. Int. J. Prod. Econ. 105, 171–189 (2007), doi:10.1016/j.ijpe.2006.03.002 [8] Ekel, P., Pedrycz, W., Schinzinger, R.: A general approach to solving a wide class of fuzzy optimization problems. Fuzzy Sets Syst. 97, 49–66 (1998), doi:10.1016/S01650114(96)00334-X [9] Ekel, P.Y., Silva, M.R., Schuffner Neto, F., Palhares, R.M.: Fuzzy preference modeling and its application to multiobjective decision making. Comput. Math Appl. 52, 179–196 (2006), doi:10.1016/j.camwa.2006.08.012 [10] Herrera-Viedma, E., Herrera, F., Chiclana, F.: A consensus model for multiperson decision making with different preference structures. IEEE Trans. Syst. Man Cybern – Part A: Syst. Hum. 32, 394–402 (2002), doi:10.1109/TSMCA.2002.802821 [11] Kaplan, R.S., Norton, D.: The Balanced Scorecard: Translating Strategy into Action. Harvard Business School, Boston (1996) [12] Lu, C., Lan, J., Wang, Z.: Aggregation of fuzzy opinions under group decisionmaking based on similarity and distance. J. Syst. Sci. Complex 19, 63–71 (2006), doi:10.1007/s11424-006-0063-y [13] Nudurupati, S.S., Bititci, U.S., Kumar, V., Chan, F.T.S.: State of the art literature review on performance measurement. Comput. Ind. Eng. (in press), doi:10.1016/j.cie.2010.11.010 [14] Orlovsky, S.A.: Decision making with a fuzzy preference relation. Fuzzy Sets Syst 1, 155–167 (1978), doi:10.1016/0165-0114(78)90001-5 [15] Orlovsky, S.A.: Problems of Decision Making with Fuzzy Information. Nauka, Moscow (1981) (in Russian) [16] Pedrycz, W.: Fuzzy equalization in the construction of fuzzy sets. Fuzzy Sets Syst. 119, 329–335 (2001), doi:10.1016/S0165-0114(99)00135-9 [17] Pedrycz, W., Ekel, P., Parreiras, R.: Fuzzy Multicriteria Decision-Making: Models, Methods, and Applications. Wiley, Chichester (2010) [18] Popova, V., Sharpanskykh, A.: Modeling organizational performance indicators. Inf. Syst. 35, 505–527 (2010), doi:10.1016/jis.2009.12.001 [19] Shobrys, D.E., White, D.C.: Planning, scheduling and control systems: why cannot they work together. Comput. Chem. Eng. 26, 149–160 (2000), doi:10.1016/S00981354(00)00508-1 [20] Yager, R.R.: Multicriteria decision making using fuzzy quantifiers. In: Proc. IEEE Conf. Comput. Intell. Financial Eng. (1995), doi:10.1109/CIFER.1995.495251 [21] Yu, V.F., Hu, K.J.: An integrated fuzzy multi-criteria approach for the performance evaluation of multiple manufacturing plants. Comput. Ind. Eng. 58, 269–277 (2010), doi:10.1016/j.cie.2009.10.005 [22] Zimmermann, H.J.: Fuzzy Set Theory and Its Application. Kluwer Academic Publishers, Boston (1990)
Generating Reference Business Process Model Using Heuristic Approach Based on Activity Proximity Bernardo N. Yahya and Hyerim Bae
*
Abstract. The number of organizations implementing business process innovation by an approach that involves Business Process Management (BPM) system has been increased significantly. Those organizations design, implement and develop BPM system into such a level of maturity and consequently, there are large collections of business process (BP) models in the repository. The existence of numerous process variations leads to both process redundancy and process underutilization, which impact on business performance in negative ways. Thus, there is a need to create a process reference model that can identify and find a representative model without redundancy. This paper introduces a new heuristic-based approach to generate a valid business process reference model from a process repository. Previous research used genetic algorithm (GA) to produce a reference process model. However, GA procedure has a high computational cost on solving such problem. Near the end of this paper, we show the experimental results of the proposed method, which is conveniently executed using business process structure properties and the proximity of activities. It is believed that this process reference model can help a novice process designer to create a new process conveniently. Keywords: reference process model, heuristic, business process, proximity.
1 Introduction The BP improvement and effectiveness issues have induced many organizations to implement and apply BPM system. The implementation of BPM in numerous organizations encourages vendors to develop the system into a certain level of maturity and consequently, there are large collections of BP models in the repository. The existence of numerous process variations leads to both process redundancy and process underutilization. Thus, there is a need to create a process reference model to identify and find a representative model without redundancy. Bernardo N. Yahya · Hyerim Bae Business & Service Computing Lab., Industrial Engineering, Pusan National University 30-san Jangjeon-dong Geumjong-gu, Busan 609-735, South Korea e-mail:
[email protected],
[email protected] *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 469–478. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
470
B.N. Yahya and H. Bae
The generation of a generic process model, as a new reference model, is considered necessary for future adaptations and decreasing change costs. In the domain of industrial process, there exists industrial process model collections and reference process models such as ITIL, SCOR, eTOM. However, its high level reference model corresponds to process-guidelines to match different aspects of IT without considering the level of implementation. Moreover, existing approaches only attempted to create reference model based on history of process configurations (Li et al. 2009; Kuster et al. 2006; Holschke et al. 2008). Thus, the present study developed a method for generating the best reference model from a large number of past business process variants without any history information of process configurations.
Fig. 1 Process variants in logistics
Figure 1 illustrates the variation of business processes that having common goals, showing seven process variants as examples. The characteristics of each process example have to be modeled for a specific goal in the logistics process. As the basic process, six activities are initialized: order entry, order review, financial check, stock check, manager review, and purchase order (PO) release. According to the given process context, the concerns of a certain organizational unit and specific customer requirements, different variations of a basic process are needed.
Generating Reference Business Process Model Using Heuristic Approach
471
There had been previous approach for creating a reference model with a combinatorial optimization problem by using a distance measure and considering a process model’s mathematical formulation for optimization of the activity distances in a process model (Bae et al. 2010). It includes safety property which is mentioned that a business process will always satisfy a given property, e.g. it will always run to completion (Aalst 2000). However, there is no guarantee when a process has safety property, it also follows the soundness property as a validation approach. To overcome these limitations, Yahya et al. (2010) have developed GA-based method of generating a BP reference model from a BP repository (GA-BP). It adopted a measure to assess the proximity distance of activities in order to evaluate both integer programming (IP) and GA measurements. However, a high computational cost on execution time is somewhat a problem in GA approach. Thus, this study focuses on developing such heuristic approach to select the best solution with a better execution time. In addition, a valid reference process model considering soundness properties (Aalst 2000) and corresponds to the characteristics of existing process variants (Fettke et al. 2006) is also discussed. This paper proceeds as follows. In section 2, we briefly review the literature on process modeling, reference process and graph theory in the BPM field. In section 3, we incorporate the proposed proximity score measurement (PSM) method into the evaluation function of an IP problem (IP-BP). The heuristic method (HeuristicBP) is proposed as a way to improve GA-BP results. Section 4 examines the experimental results using the IP-BP, GA-BP dan Heuristic-BP. Finally, section 5 concludes this study.
2 Related Work There are some existing business-process-design-related research fields, which usually are titled business process modeling, workflow verification, process change and workflow patterns (Kim and Kim 2010; Zhou and He 2005; Jung et al. 2009; Kim et al. 2010). Process configurations using version management approach was discussed previously (Kim et al. 2010). Research about pattern-based workflow design using Petri net was also proposed (Zhou and He 2005). Kim and Kim (2010) developed a process design tool with regard to fault-tolerant process debugger. Jung et al. (2009) discussed a method to find similar process by clustering stored processes in repository. All researches developed an improved process, however, there is no such issue on modeling BP by means of process reference models. Any discussion of reference model issues begins with process variants (Fettke et al. 2006; Kuster et al. 2006; Holschke et al. 2008; Li et al. 2009; Bae et al. 2010). Recently, a comprehensive heuristics-based approach to discovering new reference models by learning from past process configurations was discussed (Li et al. 2009), and a mathematical programming approach was introduced (Bae et al. 2010). The proposed heuristic (Li et al. 2009) updates process configurations and produces new reference models based on a minimum edit distance from initial reference processes. However, most traditional process design tools have lack
472
B.N. Yahya and H. Bae
functions on storing of process configurations. When there are a lot of processes already stored in the repository, it requires a special method to generate process reference model without any process configuration information. The IP-based mathematical model (Bae et al. 2010) was proposed to address the issue of creating reference processes without initial reference information or process reconfiguration. There remain problems in the presentational and validation aspects of the process using IP formulations, which is solved using GA approach (Yahya et al. 2010). The industry papers using refactoring operations (Kuster et al. 2006) emerged as process configuration tool from AS-IS into TO-BE process models. Scenario-based analysis on application of reference process models in SOA (Holschke et al. 2008) and survey results and classification (Fettke et al. 2006) have all necessary concepts in regard to process reference models with less quantitative techniques.
3 Proposed Model We measured the process structure distance using PSM. For a simpler distance measure, the process variants in Figure 1 were transformed into graph abstractions, as illustrated in Figure 2.
3.1 Proximity Score Measurement (PSM) Definition 1. (Process Model) We define a process model pk , which means the k-th process in a process repository. It can be represented as a tuple of
, each element of which is defined below. • Ak ={ai| i=1,…,I} is a set of activities where ai is the i-th activity of pk and I is the total number of activities in pk. • A is defined as a set of all activities in the process repository, where A is the union of all Ak, A =
⊆
∪
K
k =1
Ak
∈
• Lk {lij = (ai,aj) | ai,aj Ak } is a set of links where lij is the link between two activities ai and aj in the k-th process. The element (ai,aj) represents the fact that ai immediately precedes aj. Definition 2. (Activity Proximity Score) We have to obtain the Activity Proximity Score (APS) for each process. The APS value, which is denoted by qij, is defined as
qijk = where
h k (i, j ) d ijk
(1) ,
h (i, j ) = 1 if ai → aj in the k-th process; 0, otherwise, and k
Generating Reference Business Process Model Using Heuristic Approach
473
d ijk is the average path distance between activity ai and aj of the k-th process. k
k
Each process has a single value of qij ,k={1,2,3,…,K}, where qij is the APS of the k-th process in a process repository, K is the total number of processes, and ai → aj denotes that activity aj is reachable from ai. Detailed distance calculations can be found in Yahya et al. (2009). a2 a1
l12 l13 l14
a3 a4
l25 l35
a5
l56
a6
l45
a1
a2
l23 a3
l12
a1
a6 a4
l45
l36
a2
l23
a4
l35
l45 a5 l56
(d). process variant 4 (p4)
a5
l13
a3 l 35 (
)
l14
a4
a5
l56
a6
a1 l13
l46
a6
l34
l34 a3
a4
l35
l46 a6
(
)
l56
(e). process variant 5 (p5)
a1
a3
(c). process variant 3 (p3)
l46 (
)
a4
(
)
l13
a5 l 56
l34 a3
l24
a2
l12
a1
a6
(b). process variant 2 (p2)
l36
a1 l24
a2
a4
(a). process variant 1 (p1) l12
l12
a3
l23 l24
a5
l56
(f). process variant 6 (p6)
a6
l45
(g). process variant 7 (p7)
Fig. 2 Graph Abstraction from Fig. 1
Definition 3. (InDegree and OutDegree of activity) InDegree defines the number of edges incoming to an activity, and OutDegree defines the number of edges outgoing from an activity. We denote the InDegree and the OutDegree of the i-th activity as inDegree(ai) and outDegree(ai), respectively, and according to these concepts, we can define start/end activities and split/merge semantics. Start activity (aS) is an activity with an empty set of preceding activities, inDegree(ai)=0. End activity (aE) is an activity with an empty set of succeeding activities, outDegree(ai)=0. For a split activity ai such that outDegree(ai)>1, fs(ai) = ‘AND’ if all of the succeeding activities should be executed; otherwise, fs(ai)= ‘OR’. For a merge activity ai such that inDegree(ai)>1, fm(ai) = ‘AND’ if all of the preceding activities should be executed; otherwise, fm(ai)= ‘OR’.
3.2 Integer Programming Mathematical Formulation A process of automatic reference model creation finds an optimum reference process by maximizing the sum of proximity scores among the process variants in a process repository. The following notations, extended from [10], are used in the mathematical formulation of our problem. Notice that yi, zj, and xij are decision variables. i,j: activity index (i,j = 1,…, I), where I is the number of activities k: process variant index (k = 1,…, K), where K is the number of process variants
474
B.N. Yahya and H. Bae
yi: 1, if the i-th activity is a start activity; 0, otherwise zj: 1, if the j-th activity is an end activity; 0, otherwise xij: 1, if the i-th activity immediately precedes the j-th activity; 0, otherwise Mathematical Formulation min
s.t.
I
∑y
i
I
I
i
j
∑∑ ((K − c ) x ij
=1
(3),
ij
I
∑x
yi +
i
ji
(2)
≥1
i = 1,..., I
(7),
≥1
i = 1,..., I
(8),
∀k , qijk = 1
(9),
i = 1,..., I
(10).
{ j ; q kji =1}
I
∑z
+ cij .(1 −xij ))
j
=1
(4),
j
I
zi +
∑x
ij
{ j ; q ijk =1}
yi + x ji ≤ 1
∀k , q kji = 1
(5),
xij ∈ {0,1}
zi + xij ≤ 1
∀k , q = 1
(6),
yi , zi ∈{0,1}
k ij
In this model, we link two activities based on the information from the existing process variants. The summation of the number of adjacent links among all of the process variants is denoted as cij. This determines the cost of creating a link between ai and aj among all k process variants. When the constraints are satisfied, we minimize the multiplication of the integer values of possible link (xij) and negative cost (-cij) of links ai and aj. To avoid unexpected links, we multiply (1 – xij) by the cost. In other words, in order to maximize the sum of proximity scores among process variants, the objective functions have to be minimized. Constraints (3) and (4) impose the condition that there is only one start (yi) activity and one end activity (zi) in a process reference model. Constraints (6) and (6) guarantee that there are no immediate predecessors to the start activity and no immediate successors of the end activity, respectively. Constraints (7) and (8) determine that there should be at least one path following the start activity and preceding the end activity, respectively. The adjacent relationship of activity ai and aj is denoted as xij with a binary value element, as shown by constraint (9). Constraint (10) reflects the fact that the start/end activity has a binary value element. a2 a1
a4 a3
a5
a6
Fig. 3 Result from LINGO by IP-BP (Fitness = 31)
By using the graph example in Figure 2 and applying the mathematical approach, we obtain the result from LINGO, shown in Figure 3. The process
Generating Reference Business Process Model Using Heuristic Approach
475
provides us with insight into the new reference process. The safety property, as a part of soundness property, has been considered in the IP constraint. However, behavior in between start and end activities may hold some irrelevant properties. For example, the result is considered to be an invalid process, since activity a3 (financial check activity) has never been experienced as a merge activity. Previous study set about solving the validity problem using a GA approach. Due to the computational cost, this present study shows a relevant heuristic algorithm for gaining the best result with less computational time.
4 Heuristic Approach It is important to generate a valid and sound reference process model. Thus, in order to verify the well-formedness of a business process, this study follows and applies the soundness properties of business processes. Three corresponds properties are accommodated to ensure the soundness of business process (Aalst 2000). The proposed heuristic approach in this study is comprised of two parts. First, the initialization procedure to create initialized process based on certain probability condition. Second, revision algorithm is proposed to modify and enhance a process to be a sound reference process model. The heuristic procedure to obtain the best fitness value is as follow. 1. Identify N activities from all process variants in the repository. 2. Selection activity property (Initialize algorithm) Step 1. Search a possible activity for start activity. Let a1 be a start activity Step 2. Search a potential of next activity after a1 by finding the greatest value activity proximity score. Let n=2. Denote the next activity as an. Step 3. Search a potential of next activity after an by finding the greatest value activity proximity score. Step 4. If n < N, n++, do step 3. Otherwise, next. Step 5. Let aN be an end activity. 3. Check the validity of generated process by insertion_deletion algorithm (see Fig. 4) The total time complexity of insertion_deletion algorithm is O(N2) where N = |A|. Fig. 5 shows the heuristic results by considering validity property. By using the insertion_deletion algorithm (Figure 4) to check the process path, we can produce the result shown in Figure 5, with the objective function greater than the result of IP-BP (Figure 3). Although the heuristic results in a greater objective value, the process model fits with the soundness and validation of the process variants properties. Experiments result using IP, GA and Heuristic are shown at Table 1. The heuristic results show the same fitness value with GA-BP (see Table 1). Problem on GA-BP, which is regarded to high execution time, is solved using Heuristic-BP. Figure 6 (right side) presents a graph of comparison among three methods, IP-BP, GA-BP and Heuristic-BP.
476
B.N. Yahya and H. Bae Algorithm insertion_deletion (pk) Input : a nominated of reference process pk Output : valid process pk Begin /* Link Deletion */ FOR each ai Ak DO // Ak is a set of activities in pk FOR each aj Ak, aj ≠ ai IF (qjik = 1) THEN IF ((cji=0) || (outDegree(aj) > maxk(outDegree(aj)))) THEN Lk⟵Lk -{lij}; outDegree(aj)--; inDegree(ai)--; // Lk is a set of links in pk IF (qijk = 1) THEN IF ((cij=0) || (inDegree(aj) > maxk(inDegree(aj)))) THEN Lk⟵Lk -{lij}; outDegree(ai)--; inDegree(aj)--;
∈ ∈
/* Link Insertion */ FOR each ai Ak DO link_insert(ai); IF ((inDegree(ai)==0) && (ai ≠ aS))THEN FOR each aj Ak , aj ≠ ai ∧ (ai ,aj) Lk IF ((outDegree(aj)>0) && (outDegree(aj) < maxk(outDegree(aj)))) THEN IF (cji > max_in(ai)) THEN max_in(ai) = cji; END FOR FOR each aj Ak , aj ≠ ai IF (cji = max_in(ai)) THEN Lk⟵Lk +{lij}; outDegree(aj)++; inDegree(ai)++; END FOR IF ((outDegree(ai)==0) && (ai ≠ aE)) THEN FOR each aj Ak , aj ≠ ai ∧ (ai ,aj) Lk IF ((inDegree(aj)>0) && (inDegree(aj) < maxk(inDegree(aj)))) THEN IF (cij > max_out(ai)) THEN max_out(ai) = cij; END FOR FOR each aj Ak , aj ≠ ai IF (cij = max_out(ai)) THEN Lk⟵Lk +{lij}; outDegree(ai)++; inDegree(aj)++; END FOR END FOR End.
∈ ∈
∈
∈
∈
∈
∈
Fig. 4 insertion_deletion algorithm
a1
a2
a3
a5
a6
a4 Heuristic-BP (fitness = 32)
Fig. 5 Reference Process Model as result of Heuristic-BP (Fitness = 32)
Generating Reference Business Process Model Using Heuristic Approach
477
Table 1 Experiment Result
# of avg. act. 6.1 10.4 14.8 19 23.2 28.6 33.1 38.3 43 47.2
IP-BP Exec. Time 49 0.17 63 0.17 124 0.39 152 0.31 205 0.56 199 0.92 339 1.49 305 2.45 438 3.72 538 5.56
Obj. Value
GA-BP Exec. Best Time 50 1.19 63 1.09 125 1.89 154 1.81 207 2 202 3.75 345 5.45 306 9.77 447 14.75 548 20.69
Heuristic-BP Fitness Exec. Value Time 50 0.19 63 0.27 125 0.33 154 0.44 207 0.62 202 0.89 345 1.53 306 2.56 447 3.44 548 4.89
Fig. 6 Graphic comparison of IP, GA and heuristic BP based on fitness value and execution time
5 Conclusions This paper presents an enhanced approach of finding a process reference model. Previous works already proposed IP approach, which is a combinatorial optimization problem, and GA procedure. It is required to develop a heuristic to solve the problem more efficiently. This study shows a heuristic approach by using insertion_deletion algorithm to obtain the same result as GA approach with less execution time. The presentation limitations of the mathematical formulation and the possibility of an unguaranteed valid process that were resolved by using the GA procedure are also solved by this heuristic. The process reference model derived by our approach can be utilized for various purposes. First, it can be a process template for certain process variants. Second, it can deal with the process reuse issue. Hence, our approach can be a robust decision making tool for convenient process modeling by novice designers.
Acknowledgement This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST) (No.2010-0027309).
478
B.N. Yahya and H. Bae
References 1. van der Aalst, W.M.P.: Workflow verification: Finding control-flow errors using petrinet-based techniques. In: van der Aalst, W.M.P., Desel, J., Oberweis, A. (eds.) Business Process Management. LNCS, vol. 1806, pp. 161–183. Springer, Heidelberg (2000) 2. Bae, J., Lee, T., Bae, H., Lee, K.: Process reference model generation by using graph edit distance. In: Korean Institute Industrial Engineering Conference, D8-5 (2010) (in Korean) 3. Fettke, P., Loos, P., Zwicker, J.: Business process reference models: Survey and classification. In: Bussler, C.J., Haller, A. (eds.) BPM 2005. LNCS, vol. 3812, pp. 469–483. Springer, Heidelberg (2006) 4. Holschke, O., Gelpke, P., Offermann, P., Schropfer, C.: Business process improvement by applying reference process models in SOA – a scenario-based analysis. In: Multikonferenz Wirtschaftsinformatik (2008) 5. Jung, J., Bae, J., Liu, L.: Hierarchical clustering of business process models. International Journal of Innovative Computing, Information and Control 5(12A), 4501–4511 (2009) 6. Kim, D., Lee, N., Kan, S., Cho, M., Kim, M.: Business Process version management based on process change patterns. International Journal of Innovative Computing, Information and Control 6(2), 567 (2010) 7. Kim, M., Kim, D.: Fault-tolerant process debugger for business process design. International Journal of Innovative Computing, Information and Control 6(4), 1679 (2010) 8. Küster, J.M., Koehler, J., Ryndina, K.: Improving business process models with reference models in business-driven development. In: Eder, J., Dustdar, S. (eds.) BPM Workshops 2006. LNCS, vol. 4103, pp. 35–44. Springer, Heidelberg (2006) 9. Li, C., Reichert, M., Wombacher, A.: Discovering reference models by mining process variants using a heuristic approach. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 344–362. Springer, Heidelberg (2009) 10. Yahya, B.N., Bae, H., Bae, J.: Process Design Selection Using Proximity Score Measurement. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. Lecture Notes in Business Information Processing, vol. 43, pp. 330–341. Springer, Heidelberg (2010) 11. Yahya, B.N., Bae, H., Bae, J., Kim, D.: Generating business process reference model using genetic algorithm. In: Biomedical Fuzzy Systems Association 2010, Kitakyushu (2010) 12. Zhou, G., He, Y.: Modelling workflow patterns based on P/T nets. International Journal of Innovative Computing, Information and Control 1(4), 673–684 (2005)
How to Curtail the Cost in the Supply Chain? Wen-Ming Wu, Chaang-Yung Kung, You-Shyang Chen, and Chien-Jung Lai
*
Abstract. In the appearance of the lowest profitable manufacture epoch, a financial influence has been the most key-point in the supply chain management; Therefore, it is the core-issue for manufacturers and companies that how to increase the positively financial benefits and to decrease the negatively financial impacts in their supply chain. However, in order to create the innovative assessable criteria and evaluated model for the supply chain management, in this research, the Analytical Network Process (ANP) model is selected to evaluate key financial assessment criteria through brainstorming, focus group, the Delphi method and nominal group technique to improve the selection of suppliers in supply chain management (SCM). The specific characteristics of the ANP evaluated model is to establish pairwise compared matrix and furthermore, to measure the priority vector weights (eigenvector) of each assessable characteristic, criteria and attribute. In addition, in the content, the analytical hierarchical relations are definitely expressed in four levels including between each characteristic of supply chain, criterion and attribute. Nevertheless, according to the empirical analysis, the enterprises are able to choose the best potential suppliers through this research in order to minimize financial negative impact. Eventually, in empirical and academic, some suggestions are supposed not only for manage but also for researchers to further the best development of operation strategy of supply chain management. Keywords: Supply Chain Management (SCM), Analytical Network Process (ANP).
1 Introduction Nowadays, the rapid development of the manufacture, information and networking technologies, the lowest manufacture profits epoch has arrived. Therefore, manufacturers and enterprises has commenced not only to create the most effective marketing strategies but also to overview the most efficient manufacture Wen-Ming Wu · Chien-Jung Lai Department of Distribution Management, National Chin-Yi University of Technology *
Chaang-Yung Kung Department of International Business, National Taichung University of Education You-Shyang Chen Department of Information Management, Hwa Hsia Institute of Technology J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 479–487. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
480
W.-M. Wu et al.
processes in order to execute the most cost-down policies. Further, the supply chain management (“SCM”) has been the famous doctrine. However, there a few financially influenced elements to be comprehensively considered in the SCM. Hence, essentially, there are two significant ideas in the SCM in this research. One is “The Supply Chain (“SC”) encompasses all activities associated with the flow and transformation of goods from the raw materials stage (extraction), through to the end user, as well as the associated information flows. Material and information flow both up and down the SC.” [1] and another one is “SCM is the systemic, strategic coordination of the traditional business functions and the tactics across these business functions within a particular company and across businesses within the SC, for the purposes of improving the long-term performance of the individual companies and the SC as a whole.” [2] However, the fundamental ideal of SCM depends on compressing the total cost of manufacture, inventory and delivery in order to reach the best profits for the enterprises after the orders has been given to the enterprises. SCM is not able to effectively handle two crucial problems for enterprises: cash-flow stress without orders and account receivable stress with slow client-payment. In the boom period between the 1990 and 2008, the global economy has been in a rapid growing status due to the steady development of the economy of Mainland China. The cashflow and account receivable stress were not influencing the enterprises. The enterprises have specifically suffered the financial stress from clients after the 2008 global finance crisis because the clients are not willing to offer a stable procurement demand orders and are paying for these orders in six months or longer. Moreover, in this rapid transition and lower profits era, enterprises have confronted more seriously financial and managerial negative influence. Furthermore, [3] analyzed the comprehensive SC under continuous and complex information flow and then, consequently discovered that the management has reached the conclusion that optimizing the product flows cannot be completed without implementing a process approach to the business. The research articulately addressed the key processes of SC processes included supplier relationship management, demand management, order fulfillment, manufacturing flow management, product development, commercialization and returns management, customer relationship management and customer service management. Further, [4] organized the performance evaluation model of SCM which includes four key evaluated elements: reliability, elasticity and respondence, cost and Return on Asset (“ROA”). Reliability includes two main evaluated criteria: order handled performance and delivery performance. Elasticity and respondence contains two major assessable criteria: production elasticity and SCM respondence time. Cost comprises three principle evaluated criteria: SCM management cost, additional cost of production under SCM and correct error cost. ROA involves two chief assessable criteria: inventory days and cash flow. In order to reduce financial and managerial negative influence, the enterprises not only have to focus on utilizing the benefits of SCM to cost-down but must also consider a financially strategic view regarding sales forecast, finance preview, inventory system and SC development in order to achieve the best competitive advantage.
How to Curtail the Cost in the Supply Chain?
481
For the sake of the few profits and unsustainable sale era, major business scholars and enterprise leaders and managers have defected that reducing financial and managerial negative influence to increase sales is more important than cost-down because revenue is the critical lifeline for the enterprises. Specifically, the Asia enterprises which are the world’s manufacturing factory have been affected this tendency. Taiwan’s proximity to Mainland China has resulted in Taiwan being depended on the expose processing and international-trade to develop its economy. In the last 20 years, more and more of Taiwan’s enterprises have started to invest in Mainland China which has caused a high depended relationship between enterprises in Taiwan and Mainland China. This type of highly depended relationship leads to the encumbrance of development in Taiwan’s enterprises. However, due to the rigorously political issue, Taiwan has been marginalized from the rapid integrated economic regional unions, for example: Association of Southeast Asian Nations. Despite, the Taiwan government has directly communicated with Mainland China government through party alternation in power from the Democratic Progressive Party to Chinese Kuomintang Party. The Taiwan’s enterprises still faces many financial and managerial negative influences from uncertain business environment. What is the efficient and effective approach to find out the best suppliers without the financial and managerial negative influence? What are the most important assessable criteria to measure suppliers? Therefore, choosing appropriate supplier to diminish costs and finance negative influence is a crucial analytical factor in SCM. The purpose of this research is to utilize the hierarchically analytical approach and the analytical network process (“ANP”) approach in order to measure the key elements and assessable criteria for reducing financial negative influence under SCM for the enterprises to minimize the finance and managerial negative influence.
2 Methodologies 2.1 Measurement In terms of assessing the complexity and uncertainty challenges surrounding the ANP model, a compilation of expert’s collection was analyzed along with empirical survey in order to achieve retrospective cross-sectional analysis of the supplier chain relationship between the enterprises and suppliers for diminishing finance and managerial negative influence. This section not only characterizes the overall research design, research specification of analytical and research methodology but also is designed for comparing each assessable criteria of the relationship for characteristic, criteria, attributes and selected candidates in the four phases of the research design. The four phases consist of (1) Identify the research motive in order to define the clear research purpose and question - Select the apposite suppliers in order to diminish financial and managerial negative influence ; (2) Select the research methodology – Establish ANP model to analyze research question ; (3) Utilize research methodology to analyze empirical survey data – Utilize ANP model to evaluate each assessable criteria through transitivity, comparing weights principle, evaluated criteria, positive reciprocal matrix and supermatrix ; and (4)
482
W.-M. Wu et al.
Integrate overall analysis in to inductively make conclusion – Select the best choice depended on assaying results by employing the research model development, measuring framework, selecting the research methodology, investigating procedures, analyzing empirically collected data, assessing overall analytical criteria through the use of Delphi method, comparing and empirical analysis in order to make a comprehensive conclusion.
2.2 Measurement In terms of the representativeness of the efficient ANP model through transitivity, comparing weights principle, evaluated criteria, positive reciprocal matrix and supermatrix, research data source must collectively and statistically constrain all impacted expert’s opinion related to each assessable criteria. According to the assessment of the ANP model, the pairwise comparison of the evaluation characteristics, criteria and attribution at each level are evaluated with reference to the related correlationship, interdependence and importance from equal important (1) to extreme important (9) as expressed in Figure 1. Characteristics of supply chain 1
0 1 2 3 4 5 6 7 8 9 Equal-----------------------------------Extreme Important
Characteristics of supply chain 2
Criteria of supply chain 1
0 1 2 3 4 5 6 7 8 9 Equal------------------------------------Extreme Important
Criteria of supply chain 2
Attributes of supply chain 1
0 1 2 3 4 5 6 7 8 9 Equal------------------------------------Extreme Important
Attributes of supply chain 2
Selected Candidate of supply chain 1
0 1 2 3 4 5 6 7 8 9 Equal------------------------------------Extreme Important
Selected Candidate of supply chain 2
Fig. 1 The evaluation scale of pairwise assessment
“Once the pairwise comparison are conducted and completed, the local priority vector w (eigenvector) is computed as the unique solution” [5] in equation (1) and the w is represented priority vector w (relative weights). Additionally, [6] delivered the two-stage algorithm as presented in equation (1.2): m
∑(
Rw = λmax w ,
j =1
wi =
Rij m
∑ Rij i =1
)
(1)
m
In each pairwise comparison, the consistency of compared elements will match transitivity in order to satisfy the representativeness of collected of expert’s opinion. Then, the Consistency Index (“C.I.”) is definitely considered in each pairwise comparison calculated matrix as presented in equation (2) and further, the Consistency Ratio (“C.R.”) is assessed with C.I. and Random Index (“R.I”) which obtained from the statistic table of random index figure as presented in equation (2).
How to Curtail the Cost in the Supply Chain?
C. I . =
λmax − n n −1
483
,
C.R. =
C. I . R. I .
(2)
Based on the principle of consistency ratio, the pairwise comparison matrix can be acceptable when the number of C.R. is equal or small than 0.01. Further, the research data source in this research derived from the scholars and experts who understand the SCM and ANP and are served or employed in Taiwan and Mainland China. Additionally, according to the fundamental characteristics of SCM and AHP with concepts of [7] and the collected data of expert’s opinion, this research is organized the following six assessable criteria and their homologous attributes which are expressed in Figure 2. to testify and analyze the consistency of each candidate suppliers in this research. Distinguish the research motive for creating research purpose and question – Select the apposite financially considerable elements for diminishing financial and managerial negative influence
Select the research methodology – Establish ANP model to analyze research question
Utilize research methodology to analyze empirical survey data – Utilize ANP model to evaluate each assessable criteria through transitivity, comparing weights principle, evaluated criteria, positive reciprocal matrix and supermatrix
Integrate overall analysis in to inductively make conclusion – Choose the best choice depended on assaying results
Fig. 2 The research design framework [8]
2.3 Research Process For employing the effective model to measure the suppliers, the ANP model is applied in this research in order to deal with the correspondent relationships of the “inter-independence” related each characteristic of SC, criteria and attributes, the hierarchical, as presented in figure 3. [9] addressed the most major different point between AHP and ANP is that, based on the original assumption, AHP is not able to directly evaluate each assessable criterion by hierarchical relations but that, on the contrary, ANP can be utilized to dispose of direct interdependence relationships and inter-influence between each criteria and criteria at the same or different level through conducting the “supermatrix”.
484
W.-M. Wu et al. The selection of suppliers with best potential communicationtechnology in supply chain under the minimizing financial risk
Criteria of assessment
Enterprise’s high-speed operation demand Financial Evaluation
Attributes of each subcriterion
Selected Candidate of Suppliers
RG GM-ROI
Sale Review SFA DOIS
Supplier 1 without enterprise’s domain (SWTED)
Value-adding transformation
Supply information channel
Inventory System
Delivery Status
Customer’s Service
IT OI IA
WOCSP OFCOSP
OTS PRG OFR
Suppliers 2 with the enterprise’s internal domain and social networking (SWEIDSN)
Suppliers’ offer POTSDPSDMD PTDMDNRI PTDMVP-W PTDMVP-EDI PTMIMS
Suppliers 3 with the enterprise’s internal and external domain and social networking (SWEIEDSN)
Fig. 3 The research process [10]
(1) Financial Evaluation. For overall reflection of operational financial evaluation of suppliers, the two principle assessable attributes are considered in the criterion of financial evaluation: revenue growth (“RG”) and gross margin ROI (GM-ROI). (2) Sale Review. In terms of ensuring the revenue evaluation of suppliers, the two assessable attributes based on expert’s opinion, are considered in the criterion of sale review: sale forecast accuracy (“SFA”) and days of inventory sales (“DOIS”). (3) Inventory System. In order to realize the inventory status of suppliers in brief, the three major attributes according to financial concepts and expert’s discuss, are considered in the criterion of inventory system: inventory turns (“IT”), obsolete inventory (“OI”) and inventory accuracy (“IA”). (4) Delivery Status. In terms of presentation delivery status, the experts who are surveyed in this research, considered two chief evaluated attributes: warehouse operations cost as a percentage of sales (“WOCSP”) and outbound freight cost as a percentage of sales (“OFCOSP”) in this criterion. (5) Customer Service. For understanding the situation of conducting the customer feedback of suppliers, the three basic evaluated attributes are considered in the customer service criterion: on time shipment (“OTS”), percentage of returned good (“PRG”) and order fill rate (“OFR”). (6) Suppliers’ offer. Based on the discussion of expert, in this assessable criterion, they deem that six crucial evaluated attributes should been contained: percentage of on time supplier delivery (“POTSD”), percentage of supplier delivered material defects (“PSDMD”), percentage of total direct material that do not require inspection (“PTDMDNRI”), percentage of total material value purchased using a webbased system (“PTDMVP-W”), percentage of total material value purchases using
How to Curtail the Cost in the Supply Chain?
485
a EDI transactions (“PTDMVP-EDI”) and percentage of total material inventory managed by suppliers (“PTMIMS”).
3 Empirical Measurement Each potential accounting partner has to fit match each assessable sub-criterion matched in each assessed criterion through the comparative pairwise of each potential supplier. Owing to reflecting the comparative scores for three kinds of potential suppliers with a minimum of financial impact, the equation (3) is employed to measured the comprehensively comparative related priority weight w (eigenvector) in the Table 1. Consequently, the appropriate accounting partner is selected by calculating the “accounting comparative index” D [11] which is defined by: i
s
kj
Di = ∑∑ PjTkj Rikj
(3)
j =1 k =1
Where the importance of related priority, D , is weight w (eigenvector) for assessable criterion j; T is the importance of related priority weight w (eigenvector) for assessable attribute k of criterion j and R is the important potential accounting partner i on the attribute k of criterion j. i
kj
ikj
Table 1 The results of the empirical evaluation (Financial Evaluation/ Sale Review/ Inventory System/ Delivery Status/ Suppliers’ offer/ Customer Service) SCM comparative index
Supplier 1 (SWTED)
Suppliers 2 (SWEIDSN)
Suppliers 3 (SWEIEDSN)
0.137
0.6414
0.2749
Additionally, based on the equation (1), (2), and (3) processing manipulation, the ultimate evaluated step is to combine the overall outcome of complete importance of related priority weights w (eigenvector) in Table 1. Table 1 shows that Supplier 1 (without enterprise’s domain.) (“SWTED”), Suppliers 2 with the enterprise’s internal domain and social networking (“SWEIDSN”) and Suppliers 3 (with the enterprise’s internal and external domain and social networking) (“SWEIEDSN”) all obtain the highest evaluated score is the Suppliers 2 with the enterprise’s internal domain and social networking (“SWEIDSN”), not the Suppliers 3 (with the enterprise’s internal and external domain and social networking) (“SWEIEDSN”) which points out the most critical point that the benefits of the development of the communication-technology in supply chain is on the enterprise’s internal domain and social networking. The reason is the relationship between enterprise and suppliers overall emphasize on the high-speed establishment of internal domain during the enterprises selects the cooperative suppliers because suppliers growing will definitely assist the enterprise to grow. Consequently, the highest result of the evaluated score of comparative index of 0.6414 is Suppliers 2 (SWEIDSN).
486
W.-M. Wu et al.
4 Conclusion There are a plethora of researches in SCM surrounded the major fundamental idea of the cost-down under the development of the communication-techology. However, the measurement and diminishment of financial negative influence of selection suppliers in SCM is not discussed in detail in the research field. Our contention, therefore, not only focuses on the original central concept of SCM but also concentrates on the diminishment of financial negative influence during selecting the best potential suppliers through new, financial perspective and novel approach (ANP model). The ANP model is used not only to clearly establish comprehensively hierarchical relations between each assessable criterion but also to assist the decision-maker to select the best potential supplier 2 with the enterprise’s internal domain and social networking (“SWEIDSN”) with low financial negative influence through the academic Delphi method and expert’s survey. In the content, there are six main assessable criteria including three financial assessable factors (financial evaluation, sale review and inventory system), two SCM assessable factors (delivery status and suppliers’ offer) and one customer-service assessable factors (customer service). The further step beyond this research is to focus attention on minimizing additional negative influence which is created in SCM through more measurement and assessment. As these comprehensive versions are respected, the enterprises will be able to create more comparative business strategies to survive in this complex, higher-comparative, lower-profit manufacture epoch.
References [1] He, H., Garcia, E.A.: Learning from Imbalanced Data, Knowledge and Data Engineering. IEEE Trans. on Knowledge and Data Engineering 21(9), 1263–1284 (2009) [2] Handfield, R.B., Nichols Jr., E.L.: Introduction to Supply Chain Management, pp. 141–162. Prentice-Hall, Inc., Englewood Cliffs (1999) [3] Mentzer, J., et al.: Defining supply chain management. Journal of Business Logistics 22, 1–25 (2001) [4] Lambert, et al.: Issues in Supply Chain Management. Industrial Marketing Management 29, 65–83 (2000) [5] Yi-Ping, C., Yong-Hong, Y.: Ping Heng Ji Fen Ka Wan Quan Jiao Zhan Shou Ce. Merlin Publishing Co., Ltd., Taiwan (2004) [6] Chen, S.H., et al.: Enterprise Partner Selection for Vocational Education: Analytical network Process Approach. International Journal of Manpower 25(7), 643–655 (2004) [7] Sarkis, J.: Evaluating environment conscious business practices. Journal of Operational Research 107(1), 159–174 (1998) [8] Cheng, L.X.: Gong Ying Lian Feng Xian De Cai Wu Guan Dian. Accounting Research Monthly 285, 66–67 (2009) [9] Hsieh, M.-Y., et al.: How to Reduce the Negative Financial Influence in the Supply Chain, pp. 397–400. Electronic Trend Publications (2010)
How to Curtail the Cost in the Supply Chain?
487
[10] Saaty, T.L.: Decision Making with Dependence and Feedback: The Analytic Network Process. RWS Publications, Pittsburgh (1996) [11] Hsieh, M.-Y., et al.: Decreasing Financial Negative Influence in the Supply Chain Management through Integrated Comparison the ANP and GRA-ANP Models. The Journal of Grey System 139(2), 69–72 (2010) [12] Hsieh, M.-Y., et al.: Decreasing Financial Negative Influence in the Supply Chain Management by applying the ANP model. In: 2009 The 3rd Cross-Strait Technology, Humanity Education and Academy-Industry Cooperation Conference, China (2009)
Intelligent Decision for Dynamic Fuzzy Control Security System in Wireless Networks Xu Huang, Pritam Gajkumar Shah, and Dharmendra Sharma
*
Abstract. Security in wireless networks has become a major concern as the wireless networks are vulnerable to security threats than wired networks. While elliptic curve cryptography (ECC) prominently offers great potential benefits for wireless sensor network (WSN) security there is still a lot of work needs to be further improved due to WSN has very restraint running conditions such as limited energy source, capability of computing, etc. It is well known that scalar multiplication operation in ECC accounts for more 80% of key calculation time on wireless sensor network motes. In this paper we presented an intelligent decision for optimized dynamic window based on our previous research works. The whole quality of service (QoS) has been improved under this algorism in particularly the power consuming is more efficiently. The simulation results showed that the average calculation time, due to intelligent decision system and the fuzzy conditions decreased from previous 26 to current 9 as a whole the calculation time, decreased by approximately 17.5% in comparison to our previous algorithms in an ECC wireless sensor network.
1 Introduction The high demand for various sensor applications shows the fact that the rapid progress of wireless communications has become popular in our daily life. With the growth in very large scale integrated (VLSI) technology, embedded systems and micro electro mechanical systems (MEMS) has enabled production of inexpensive sensor nodes, which can transit data over a distances with free media and efficient use of power [1, 22, 23]. In WSN systems, the sensor node will detect the interested information, processes it with the help of an in-built microcontroller and communicates results to a sink or base station. Normally the base station is a more powerful node, which can be linked to a central station via satellite or internet communication to form a network. There are many deployments for wireless sensor networks depending on various applications such Xu Huang · Pritam Gajkumar Shah · Dharmendra Sharma Faculty of Information Sciences and Engineering, University of Canberra, ACT 2601, Australia e-mail: {Xu.Huang,Pritam.Shah,Dharmendra.Sharma}@canberra.edu.au *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 489–500. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
490
X. Huang, P.G. Shah, and D. Sharma
as environmental monitoring e.g. volcano detection [2,3], distributed control systems [4], agricultural and farm management [5], detection of radioactive sources [6], and computing platform for tomorrows’ internet [7]. Generally speaking, a typical WSN architecture can be shown in Figure 1. Contrast to traditional networks, a wireless sensor network normally has many resource constraints [4] due to the limited size. As an example, the MICA2 mote consists of an 8 bit ATMega 128L microcontroller working on 7.3 MHz. As a result nodes of WSN have limited computational power. Radio transceiver of MICA motes can normally achieve maximum data rate of 250 Kbits/s, which restricts available communication resources. The flash memory that is available on the MICA mote is only 512 Kbyte. Apart from these limitations, the onboard battery is 3.3.V with 2A-Hr capacity. Therefore, the above restrictions with the current state of art protocols and algorithms are expensive for sensor networks due to their high communication overhead.
Fig. 1 A Typical WSN architecture
Recalled that the Elliptic Curve Cryptography was first introduced by Neal Koblitz [9] and Victor Miller [10] independently in the early eighties. The advantage of ECC over other public key cryptography techniques such as RSA, Diffie-Hellman is that the best known algorithm for solving elliptic curve discrete logarithm problem (ECDLP) which is the underlying hard mathematical problem in ECC which will take the fully exponential time. On the other hand the best algorithm for solving RSA and Diffie-Hellman takes sub exponential time [11].
2 Elliptic Curve Diffie-Hellman Scheme Proposed for WSN Before we get into our innovation method, we need to have a closer look at the popular legacy scheme for WSN. As per [13] the original Diffie-Hellman algorithm with RSA requires a key of 1024 bits to achieve sufficient security but Diffie Hellman based on ECC can achieve the same security level with only 160 bit key size. In ECC two heavily used operations are involved: scalar multiplication and modular reduction. Gura et. al. [14] showed that 85% of execution time is spent on scalar multiplication. Scalar Multiplication is the operation of multiplying point P
Intelligent Decision for Dynamic Fuzzy Control Security System
491
on an elliptic curve E defined over a field GF(p) with positive integer k which involves point addition and point doubling. Operational efficiency of kP is affected by the type of coordinate system used for point P on the elliptic curve and the algorithm used for recoding of integer k in scalar multiplication. This research paper proposes an innovative algorithm based on one’s complement for representation of integer k which accelerates the computation of scalar multiplication in wireless sensor networks. The number of point doubling and point addition operations in scalar multiplication depends on the recoding of integer k. Expressing integer k in binary format highlight this dependency. The number of zeros and number of ones in the binary form, their places and the total number of bits will affect the computational cost of scalar multiplications. The Hamming weight as represented by the number of non-zero elements, determines the number of point additions and bit length of integer K determines the number of point doublings operations in scalar multiplication. One point addition when P ≠ Q requires one field inversion and three field multiplications [13]. Squaring is counted as regular multiplication. This cost is denoted by 1I + 3M, where I denotes the cost of inversion and M denotes the cost of multiplication. One point doubling when P = Q requires 1I + 4M as we can neglect the cost of field additions as well as the cost of multiplications by small constant 2 and 3 in the above formulae. Binary Method Scalar multiplication is the computation of the form Q = kP, where P and Q are the elliptic curve points and k is positive integer. This is obtained by repeated elliptic curve point addition and doubling operations. In binary method the integer k is represented in binary form:
k=
l −1
∑K
j2
j
, K j ∈ {0,1}
j =0
The binary method scans the bits of K either from left-to-right or right-to-left. The cost of multiplication when using binary method depends on the number of non-zero elements and the length of the binary representation of k. If the representation has kl-1 ≠ 0 then binary method require (l–1 ) point doublings and (W-1) where l is the length of the binary expansion of k, and W is the Hamming weight of k (i.e., the number of non-zero elements in expansion of k). For example, if k = 629 = (1001110101)2, it will require (W–1) = 6–1 = 5 point additions and l–1 = 10–1 = 9 point doublings operations. Signed Digit Representation Method The subtraction has virtually the same cost as addition in the elliptic curve group. The negative of point (x, y) is (x, –y) for odd characters. This leads to scalar multiplication methods based on addition–subtraction chains, which help to reduce the number of curve operations. When integer k is represented with the following form, it is a binary signed digit representation.
492
X. Huang, P.G. Shah, and D. Sharma l
k=
∑S
j2
j
, S j ∈ {1,0,−1}
j =0
When a signed-digit representation has no adjacent non zero digits, i.e. Sj Sj+1 = 0 for all j ≥ 0 it is called a non-adjacent form (NAF). NAF usually has fewer nonzero digits than binary representations. The average hamming weight for NAF form is (n–1)/3.0. So generally it requires (n–1) point doublings and (n–1) /3.0 point additions. The binary method can be revised accordingly and is given another algorithm for NAF, and this modified method is called the Addition Subtraction method.
3 Dynamitic Window Based on a Fuzzy Controller in ECC We are going to use the algorithm based on subtraction by utilization of the 1’s complement is most common in binary arithmetic. The 1’s complement of any binary number may be found by the following equation [19-22]:
C1 = (2 a − 1) − N
(1)
where C1 = 1’s complement of the binary number, a = number of bits in N in terms of binary form, N = binary number. From a closer observation of the equation (1), it reveals the that any positive integer can be represented by using minimal non-zero bits in its 1’s complement form provided that it has a minimum of 50% Hamming weight. The minimal nonzero bits in positive integer scalar are very important to reduce the number of intermediate operations of multiplication, squaring and inverse calculations used in elliptical curve cryptography as we have seen in previous sections. The equation (1) can therefore be modified as per below:
N = (2 a − C1 − 1) For example, we may take N =1788 then it appears N= (11011111100)2 in its binary form C1= 1’s Complement of the number of N= (00100000011)2 a is in binary form so we have a = 11 After putting all the above values in the equation (2) we have: 1788 = 211 – 00100000011 –1, this can be reduced as below: 1788 = 100000000000 00100000011 – 1
(2)
(3)
So we have 1788= 2048
256 2 1 1
As is evident from equation (3), the Hamming weight of scalar N has reduced from 8 to 5 which will save 3 elliptic curve addition operations. One addition operation requires 2 Squaring, 2 Multiplication and 1 inverse operation. In this case a total of 6 Squaring, 6 Multiplication and 3 Inverse operations will be saved.
Intelligent Decision for Dynamic Fuzzy Control Security System
493
The above recoding method based on one’s complement subtraction combined with sliding window method provides a more optimized result. Let us compute [763] P (in other words k = 763) as an example, with a sliding window algorithm with K recoded in binary form and window sizes ranging from 2 to 10. It is observed that as the window size increases the number of precomputations also increases geometrically. At the same time number of additions and doubling operations decrease. Now we present the details for the different window size to find out the optimal window size using the following example: Window Size w = 2 763 = (1011111011)2 No of precomputations = 2w – 1 = 22 – 1 = [3] P 763 = 10 11 11 10 11 The intermediate values of Q are P, 2P 4P, 8P, 11P, 22P, 44P, 47P, 94P, 95P, 190P, 380P, 760P, 763P Computational cost = 9 doublings, 4 additions, and 1 pre-computation. Window Size w = 3 No of pre-computations = 2w – 1 = 23 – 1 = [7] P So all odd values: [3]P, [5]P, [7]P 763 = 101 111 101 1 = [5]P [7]P [5]P [1]P The intermediate values of Q are 5P, 10P, 20P, 40P, 47P, 94P, 188P, 376P, 381P, 762P, 763P Computational cost = 7 doublings, 3 additions, and 3 pre-computations. We continue to derive the remaining calculations for Window Size w = 6, Window Size w = 7, Window Size w = 8, Window Size w = 9, and Window Size w = 10. The results for all calculations are presented in Table 1.
Algorithm for sliding window scalar multiplication on elliptic curves. 1 . Q ← P∞ and i ← l - 1 2.while i ≥ 0 do 3.if n i = 0 then Q ← [2]Q and i ← i - 1 4.else 5. s ← max (i - k + 1,0 ) 6 .while n s = 0 do s ← s + 1 7.for h = 1 to i - s + 1 do Q ← [2]Q 8 u ← (n i ....... ns ) 2 [n i = n s = 1 and i - s + 1 ≤ k] 9. Q ← Q ⊕ [u]P [u is odd so that [u]P is precompute d] 10. i ← s - 1 11.return Q
We continue to derive the remaining calculations for Window Size w = 6, Window Size w = 7, Window Size w = 8, Window Size w = 9, and Window Size w = 10. The results for all calculations are presented in Table 1.
494
X. Huang, P.G. Shah, and D. Sharma
Table 1 Window Size Vs No of doublings, additions and Pre computations Window Size
No of Doublings
No of Additions
No of Pre computations
2
9
4
1
3
7
3
3
4
6
2
7
5
5
1
15
6
4
1
31
7
3
1
61
8
3
1
127
9
1
1
251
10
0
0
501
4 Intelligent Decision Fuzzy Controller System in ECC It is clear, from above description that there is a tradeoff between the computational cost and the window size as shown in Table 1. However, this tradeoff is underpinned by the balance between computing cost (or the RAM cost) and the pre-computing (or the ROM cost) of the node in the network. It is also clear that, from above description that the variety of wireless network working states will make this control complex and calculations could be relatively more expensive.
Therefore, we propose a fuzzy dynamic control system, to provide dynamic control to ensure the optimum window size is obtained by tradeoff between precomputation and computation cost. The fuzzy decision problem introduced by Bellman and Zadeh has as a goal the maximization of the minimum value of the membership functions of the objectives to be optimized. Accordingly, the fuzzy optimization model can be represented as a multi-objective programming problem as follows [21]: Max: min{μs (D)}&min{μl (Ul )} ∀s ∈S &∀l ∈ L such that Al ≤ Cl
∑
∀l ∈ L,
xrs = 1
∀p ∈ P &∀s ∈S,
xrs = 0 or 1
∀r ∈ R &∀s ∈S
r∈RP
In above equation, the objective is to maximize the minimum membership function of all delays, denoted by D, and the difference between the recommend value and the measured value, denoted by U. The Fuzzy control system is extended from and shown in Figure 2. For accurate control, we designed a three inputs fuzzy controller. The first input is storage room, which has three statuses, showing storage room in one of the three, namely (a) low, (b) average, and (c) high. The second input is pre-computing working load (PreComputing) in one of three states, namely (a) low, (b) average, and (c) high. The third input is Doubling, expressing how much working load for the calculation “doubling” which has three cases, namely (a) low, (b) average, and (c)
Intelligent Decision for Dynamic Fuzzy Control Security System
495
high. The output is one, called WindowSize, to express the next window size should be moved in which way, which has three states for the window sizes, namely (a) down, (b) stay, and (c) up.
Fig. 2 Three inputs fuzzy window control system
There are only 9 Fuzzy Rules listed as follows (weight are unit) due to StorageRoom in Figure 6 can be ignored due to the results of Figure 5 although it is a factor needs to be appeared in terms of this control system. 1. If (PreComputing is low) and (Doubling is low) then (WindowSize is Up) 2. If (PreComputing is low) and (Doubling is average) then (WindowSize is Up) 3. If (PreComputing is low) and (Doubling is high) then (WindowSize is stay) 4. If (PreComputing is average) and (Doubling is low) then (WindowSize is Up) 5. If (PreComputing is average) and (Doubling is average) then (WindowSize is Up) 6. If (PreComputing is average) and (Doubling is high) then (WindowSize is stay) 7. If (PreComputing is high) and (Doubling is low) then (WindowSize is Up) 8. If (PreComputing is high) and (Doubling is average) then (WindowSize is stay) 9. If (PreComputing is high) and (Doubling is high) then (WindowSize is stay) The number at each fuzzy condition with a bracket is the weight number, currently it is unit. Later we shall change it with different number according to the running situations as described in the next. The three inputs are StorageRoom, PreComputing and Doubling. The output is WindowSize. It is noted if we did not take the advantage of Figure 5, there will be at least 26 fuzzy rules need to be considered as shown in our previous paper [23]. This is because that the “StorageRoom” has low, average, high with other two parameters’ combinations. In order to make the controller running more
496
X. Huang, P.G. Shah, and D. Sharma
efficiently, a intelligent decision established via multi-agent, the individual agent is able to take decision to look after the controller to work coherently. In order to make high efficiently system, a intelligent decision system is designed as below. We have definitions of the agents as below: Co-Ordination Agent: This agent is the coordination agent in the whole system that communicates with all other agents within the enterprise. It looks after the whole agents to make all agents to co-operation coherently. Window Size Checking Agent: This agent carries out checking the current “window size” to answer if there needs to change the window size or not for the system at the current running status. Fuzzy Controller Agent: This agent will carry on the implementation of fuzzy controller to obtain the optimal window size at the current status. System State Agent: This agent takes care of the whole system (rather than just window size issue) and make sure the holistic system is always sitting on the optimal status. In this solution, not a single agent is fully aware of the whole communication process. Instead, all agents get together to make the whole communication happen and keep the holistic system sitting on the designed state. With this kind of approach, which is quite suitable to be represented as an agent society, modifications can be done effectively as shown in Figure 3 with real line and dish line are used for distinguished from two communication channels, namely one is out of the agent and another one is getting in the agent.
Fig. 3 The Enterprise and the Agent Society
Intelligent Decision for Dynamic Fuzzy Control Security System
497
The “System State Agent” looks after the holistic ECC system and ensures the coding, encoding, cryptography, energy, communications among the nodes, etc. are in the designed states. It will frequently talk to the “Co-ordination Agent” that is majorly control the optimized dynamic window to meet the requirements from the “System State Agent”. The “Co-ordination Agent” is in charge of whole optimized dynamic window with fuzzy controller to effectively operate ECC system. The communications majorly among the “Window Size Checking Agent”, “Fuzzy Controller Agent” and “System State Agent” as shown in Figure 3. “Window Size Checking Agent” is looking after the ECC system the relations among the “ROM control Agent”, “RAM Control Agent”, and “Calculation Control Agent”. To keep the current ECC system always sitting on the excellent condition where the RAM is fully used with its potential while the ROM storage ensure the required pre-calculation results is fairly maintained. Also if there is any calculation needs for the ECC system, the “Calculation Control should offer the services as required. The “Fuzzy Controller Agent” in the agent society is the implementation agent for the Fuzzy Controller, which is running the framework shown in Figure 2. It has three agents: (1) “Fuzzy Input Agent”, which is looking after the three components, namely “storage Room”, “PreComputing”, and “Doubling” as shown in Figure 2; (2) “Fuzzy Rule Agent” is managing “Fuzzy Riles” in Figure 2 to make sure the fuzzy controller can correctly complete the designed functions; and (3) “Fuzzy Output Agent” is taking acre of the “Output” of the Fuzzy control and the ensuring the output sending is correct as shown in the right hand side part of Figure 2. The software developments are based on C++ platform, we have concentrated on developing MAS application using the same C++ language. We have found that most of the MAS applications only support Java based developments. Therefore we have decided to write our own application for MAS using C++. This works well with MAS, since C++ being an object oriented language, the agents can easily be represented by C++ classes. As of now we are in the process of developing the Agent society along with Co-ordination Agent. Appleby and Steward of BT Labs have done a similar approach to prototype a mobile agent based system for controlling telecommunication networks. The final simulation result is shown in Figure 4. There are two outcomes, namely “Doubling” and “PreComputing”.
498
X. Huang, P.G. Shah, and D. Sharma
Fig. 4 The output of the surface for the StorageRoom = constant (0.4) for the up side and StorageRoom = constant (0.8) for the down side and PrecComputing vs. Doubling.
It is obviously to find that the, from above figures, the number for the “addition” factor is leas “important” than others in the fuzzy control system. In our simulations, the proposed method together with a fuzzy window size controller makes the ECC calculation in the current algorism is about 17% more efficient than the methods in [23] with the same QoS level.
5 Conclusion In this paper we have extended our previous research results to the intelligent decision system via the agent society to increase the capacity and capability of Fussy system and the make the original system more efficient and effective. The final simulation in a sensor wireless network shows that about 17.5% more efficient than our previous method [23] can be obtained with an ECC sensor network.
References [1] Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: Wireless sensor networks: a survey. Computer Networks 38, 393–422 (2002) [2] Chung-Kuo, C., Overhage, J.M., Huang, J.: An application of sensor networks for syndromic surveillance, pp. 191–196 (2005)
Intelligent Decision for Dynamic Fuzzy Control Security System
499
[3] Werner-Allen, G., Lorincz, K., Ruiz, M., Marcillo, O., Johnson, J., Lees, J., Welsh, M.: Deploying a wireless sensor network on an active volcano. IEEE Internet Computing 10, 18–25 (2006) [4] Sinopoli, B., Sharp, C., Schenato, L., Schaffert, S., Sastry, S.S.: Distributed control applications within sensor networks. Proceedings of the IEEE 91, 1235–1246 (2003) [5] Sikka, P., Corke, P., Valencia, P., Crossman, C., Swain, D., Bishop-Hurley, G.: Wireless ad hoc sensor and actuator networks on the farm, pp. 492–499 (2006) [6] Stephens Jr., D.L., Peurrung, A.J.: Detection of moving radioactive sources using sensor networks. IEEE Transactions on Nuclear Science 51, 2273–2278 (2004) [7] Feng, Z.: Wireless sensor networks: a new computing platform for tomorrow’s Internet, vol. 1, pp. I–27 (2004) [8] Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: A Survey on Sensor Networks. IEEE Communication Magazine 40, 102–116 (2002) [9] Koblitz, N.: Elliptic Curve Cryptosystems. Mathematics of Computation 48, 203–209 (1987) [10] Miller, V.S.: Use of Elliptic Curves in Cryptography. In: Williams, H.C. (ed.) CRYPTO 1985. LNCS, vol. 218, pp. 417–426. Springer, Heidelberg (1986) [11] Lopez, J., Dahab, R.: An overview of elliptic curve cryptography, Technical report,Institute of Computing, Sate University of Campinas, Sao Paulo, Brazil (May 2000) [12] Lauter, K.: The advantages of elliptic curve cryptography for wireless security. Wireless Communications, IEEE [see also IEEE Personal Communications] 11, 62– 67 (2004) [13] Wang, H., Sheng, B., Li, Q.: Elliptic curve cryptography-based access control in sensor networks. Int. J. Security and Networks 1, 127–137 (2006) [14] Gura, N., Patel, A., Wander, A., Eberle, H., Shantz, S.C.: Comparing elliptic curve cryptography and RSA on 8-bit cPUs. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 119–132. Springer, Heidelberg (2004) [15] http://csrc.nist.gov/CryptoToolkit/dss/ecdsa/NISTReCur.pdf [16] Malan, D.J., Welsh, M., Smith, M.D.: A public-key infrastructure for key distribution in TinyOS based on elliptic curve cryptography. In: 2nd IEEE International Conference on Sensor and Ad Hoc Communications and Networks (SECON 2004)2nd IEEE International Conference on Sensor and Ad Hoc Communications and Networks (SECON 2004), pp. 71–80 (2004) [17] Blake, I., Seroussi, G., Smart, N.: Elliptic Curves in Cryptography, vol. 265 (1999) [18] Hankerson, D., Hernandez, J.L., Menezes, A.: Software implementation of elliptic curve cryptography over binary fields. In: Paar, C., Koç, Ç.K. (eds.) CHES 2000. LNCS, vol. 1965, p. 1. Springer, Heidelberg (2000) [19] Gillie, A.C.: Binary Arithmetic and Boolean algebra, p. 53. McGRAW-HILL Book Company, New York (1965) [20] Bellman, H.R., Zadeh, L.A.: Decision-making in a fuzzy environment. Management Science 17, 141–164 (1970) [21] Huang, X., Wijesekera, S., Sharma, D.: Fuzzy Dynamic Switching in Quantum Key Distribution for Wi-Fi Networks. In: Proceeding of the 6th International Conference on Fuzzy Systems and Knowledge Discovery, Tianjin, China, August 14-16, pp. 302– 306 (2009)
500
X. Huang, P.G. Shah, and D. Sharma
[22] Huang, X., Shah, P.G., Sharma, D.: Multi-Agent System Protecting from Attacking with Elliptic Curve Cryptography. In: The 2nd International Symposium on Intelligent Decision Technologies, Baltimore, USA, July 28-30 (2010) (accepted to be published) [23] Huang, X., Sharma, D.: Fuzzy Controller for a Dynamic Window in Elliptic Curve Cryptography Wireless Networks for Scalar Multipication. In: The 16th Asia-Pacific Conference on Communications, APCC 2010, Langham Hotel, Auckland, New Zealand, October 31-November 3, pp. 509–514 (2010) ISBN: 978-1-4244-8127-9
Investigating the Continuance Commitment of Volitional Systems from the Perspective of Psychological Attachment Huan-Ming Chuang, Chyuan-Yuh Lin, and Chien-Ku Lin
*
Abstract. This study majorly integrates IS success model and Social influence theory to take into account both social influences and personal norms, to investigate deeply critical factors affecting the continuance intention of an information system used by elementary schools in Taiwan. Questionnaire survey is conducted to collect data for analysis by PLS, with 206 teachers sampled from Yunlin county elementary school as research subjects. Principal research findings are: (1) perceived net benefits positively affect attitude, and attitude has the same effect on continuance intention, (2) among three psychological attachment degree, compliance shows negative effect toward perceived net benefits and continuance intention, while identification and internalization manifest positive effects, (3) system quality, information quality, and service quality push perceived net benefits, attitude and continuance intention in general. Conclusions offer practical and valuable guidance regarding ways to enhance SFS users’ continuance commitment and behavior. Keywords: IS success Model, Psychological Attachment, IS continuance.
1 Introduction Due to rapid innovation and popularization of information technology, educational agencies are leveraging it to enhance administrative efficiency and effectiveness. Under this background, Yunlin county has been promoting actively a System Free Software (SFS) for education administration. SFS system provides teachers as well as students better teaching, learning, communication and evaluation platform and Huan-Ming Chuang Associate Professor, Department of Information Management, National Yunlin University of Science and Technology *
Chyuan-Yuh Lin · Chien-Ku Lin Graduate Student, Department of Information Management, National Yunlin University of Science and Technology J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 501–510. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
502
H.-M. Chuang, C.-Y. Lin, and C.-K. Lin
gains increasing acceptance. But, in order to improve their continuance intention effectively, prior IS acceptance and usage researches, emphasize social normative compliance purely, have shown great limitations. As a result, more and more researchers propose personal norms should also be considered to truly understand important factors determining the continuance intention of users. Nevertheless, no matter how good the system is, and how active it is promoted, if can not be accepted and used, the system can not be succeeded at all. Consequently, essential factors affecting information system acceptance and continuance are big research issues.
2 Background and Literature Review 2.1 Delone and McLean’s IS Success Model Since IS success is a multi-dimensional concept that can be assessed from different perspectives according to different strategic needs, the measure for IS success is not an easy and objective job. Nevertheless, Delone and McLean in 1992 made a major breakthrough[1]. After conducting a comprehensive review of related literature, they proposed a IS success model as shown in Figure 1.
Fig. 1 DeLon and McLean’s IS success model [1]
This model suggested that the IS success can be represented by the system quality, the output information quality, use of the output, use satisfaction, the effect of the IS on individual impact, and the effect of the IS on organizational impact. This model identified six important dimensions of IS success and suggest the temporal and casual interdependencies between them. After the original model went through lots of validations, Delone and McLean proposed an updated model in 2003 as shown in Figure 2.
Investigating the Continuance Commitment of Volitional Systems
503
Fig. 2 DeLone and McLean’s updated IS success model [2]
The primary differences between the original and updated models can be listed as follow: (1) the addition of service quality to reflect the importance of service and support in successful e-commence systems, (2) the addition of intention to use to measure user attitude, and (3) the combining of individual impact and organizational impact into a net benefit construct.
2.2 Psychological Attachment The concept of psychological attachment is based on Kelman’s (1958) theory of social influence, aiming to understand the basis for individuals’ attitude and belief change [3]. He emphasized the importance of knowing about the nature and depth of changes, which help to predict about the manifestations and consequences of the new attitudes. Kelman noted that individuals can be affected in three different ways: compliance, identification, and internalization, which are termed the three processes of attitude change [3]. Distinguishing the three processes of attitude change is significant because one could ascertain how individuals are influenced, and then could make meaningful predictions about the consequences of the individuals’ change. Compliance occurs when an individual adopts the induced behavior to gain specific rewards or approval, and to avoid specific punishments or disapproval by conforming. In this case, the individual accepts influence not because he believes in the content of the induced behavior but because he hopes to achieve a favorable reaction from another person or group. Identification occurs when an individual accepts the induced behavior because he wants to obtain a satisfying, self-defining relationship to another person or group. In this case, an individual is simply motivated by the desired relationship, not by the content of the induced behavior. Internalization occurs when influence is accepted because the content of the induced behaviors is congruent with his own value system. In this case, internalization is due to the content of the new behavior. Behavior adopted in this fashion is more likely to be integrated into the individual’s existing values.
504
H.-M. Chuang, C.-Y. Lin, and C.-K. Lin
The processes of compliance, identification, and internalization represent three qualitatively different ways of accepting influence. Kelman further illuminated that behaviors adopted through different processes can be distinguished is based on the conditions under which the behavior is performed [3]. He indicated that each of the three processes mediates between a distinct set of antecedents and a distinct set of consequents. Given the set of antecedents or consequents, influence then will take the form of compliance, identification, or internalization, respectively. Each of these corresponds to a characteristic pattern of internal responses in which the individual engages while accepting the influence. Kelman also noted that responses adopted through different processes will be performed under different conditions and will have different properties [3]. For example, behavior adopted through compliance is performed under surveillance by the influencing agent. Behavior adopted through identification is performed under conditions of salience of the individual’s relationship to the influencing agent. And behavior adopted through internalization is performed under conditions of relevance of the issue, regardless of surveillance or salience. The induced behavior is integrated with the individual’s existing value system and becomes a part of his personal norms. These differences between the three processes of social influence may represent separate dimensions of the commitment to the group or IT usage [4]. Some management research referred the term commitment to antecedents and consequences of behavior, as well as the process of becoming attached and the state of attachment itself to specific behaviors [5][6]. More specifically, it is the psychological attachment that is the construct of common interest (O’Reilly and Chatman) in Kelman’s social influence theory [4]. Kelman’s theory underscores personal norms instead of simple social norms in terms of understanding behavioral commitment to system usage [3]. Personal norms are embedded in one’s own values system and beliefs, and therefore allow one to understand the individual inherent reason that why the induced behavior is adopted or rejected. In IT research, the use of an IT is viewed as a continuum that refers to the range from nonuse, compliant use, and committed use. The continuum is a function of the perceived fit of the system use in terms of the users’ values. Accordingly, this study defines user commitment as the user’s psychological attachment to the chosen technology context. Several prior IS research points out that social influence (referring to subjective norm) is important to predict user’s IS usage and acceptance behavior. However, the conceptualization of social influence only based on social normative compliance has theoretical and psychometric problems, due to that it is difficult to distinguish whether usage behavior is caused by the influence of certain referents on one’s intent or by one’s own belief [7]. Additionally, social normative compliance usually occurred under the power of the influencing agent. Therefore, the prediction on IT usage is likely to require more than simple compliance. Because O’Reilly and Chatman (1986) have proven that psychological attachment can be predicated on compliance, identification, and internalization [4], this study views social influence as the specific behavior adoption process by any individual to fulfill his own instrumental goals in terms of the above three processes.
Investigating the Continuance Commitment of Volitional Systems
505
3 Research Model and Hypotheses 3.1 Research Model Based on the literature review, we propose a research model shown in Figure 3, examining the effects of user commitment and IS success factors on volitional systems usage behavior.
Fig. 3 Research model
3.2 Research Hypotheses Users’ commitment level can be categorized as continuance commitment and affective commitment [8]. Continuance commitment is based on the costs that the system user associates with not adopting the induced behavior. While affective commitment refers to the commitment of the system user based upon congruence of personal values and identification of satisfying self-defining relationships and is represented by identification and internalization in this study. Since compliance can be said to occur when an individual accepts the induced behavior for the sake of gaining rewards of approval, or minimize costs such as punishments, we can expect its negative toward continuance-related variables and propose the following hypotheses. 3.2.1 Hypothesis Related to Perceived Net Benefits H1a: Compliance will have a negative influence on perceived net benefits. H1b: Identification will have a positive influence on perceived benefits. H1c: Internalization will have a positive influence on perceived benefits. H1d: System quality will have a positive influence on perceived net benefits. H1e: Information quality will have a positive influence on perceived net benefits. H1f: Service quality will have a positive influence on perceived net benefits.
506
H.-M. Chuang, C.-Y. Lin, and C.-K. Lin
3.2.2 Hypotheses Related to Attitude H2a: Compliance will have a negative influence on attitude. H2b: Identification will have a positive influence on attitude. H2c: Internalization will have a positive influence on attitude. H2d: System quality will have a positive influence on attitude. H2e: Information quality will have a positive influence on attitude. H2f: Service quality will have a positive influence on attitude. 3.2.3 Hypotheses Related to Continuance Intention H3a: Perceived net benefits will have a positive influence on attitude. H3b: Attitude will have a positive influence on continuance intention.
4 Research Method 4.1 Study Setting School Free Software (SFS) is a school affair system developed under the collaboration of teachers with computer expertise, under the platform as “Interoperable Free Opensource.” Where system modules developed by skilled teachers, such as bulletin board, student credits processing and so on, are integrated for free download to meet the goal of avoiding duplicate development and enhancing resources sharing. Major advantages of SFS can be described as follows. First, with its cross-platform feature, all its functional modules can be accessed and processed through internet browser, meeting the spirits of free software. Second, compared with current commercial software package, it can lessen burden of limited budget greatly. Last, since all the functional modules are developed by incumbent teachers, it fits practical administrative procedures quite well, synergic system performance and compatibility can easily be attained. Default functions offered by SFS can be categorized as follows: (1) school affairs, (2) academic affairs, (3) student affairs, (4) teaching and administrative staff, (5) system administration, and (5) extra modules. These functions can be easily modify and customize for any special purposes. Though SFS is promoted aggressively, it is volitional in nature, since users can decide their involvement of the system willingly.
4.2 Operationalization of Constructs All constructs and measures were based on items in existing instruments, related literature, and input from domain experts. Items in the questionnaire were measured using a seven-point Likert scale ranging from (1) strongly disagree to (7) strongly agree.
Investigating the Continuance Commitment of Volitional Systems
507
4.3 Data Collection Data for this study were collected using a questionnaire survey administered in Yunlin county of Taiwan. The respondents were sampled from instructors of elementary schools who have experiences with SFS. We sent out 250 questionnaires and received 206 useful responses.
5 Data Analysis and Results 5.1 Scale Validation We used PLS-Smart 2.0 software to conduct confirmatory factor analysis (CFA) to assess measurement scale validity. The variance-based PLS approach was preferred over covariance-based structural equation modeling approached such as LISREL because PLS does not impose sample size restrictions and is distribution-free [9]. 100 records of raw data was used as input to the PLS program, and path significances were estimated using the bootstrapping resmapling technique with 200 subsamples. The steps of scale validation were summarized as shown in table 1. Table 1 Scale validation Type of validity Definition and criteria Reference Convergent ●Measures of constructs that theoretically should be related to each [10] validity other are, in fact, observed to be related to each other. - All item factor loadings should be significant and exceed 0.70 - Composite reliabilities (CR) for each construct should exceed 0.80 - Average variance extracted (AVE) for each construct should exceed 0.50, or the square root of AVE should exceed 0.71. Discriminate ●Measures of constructs that theoretically should not be related to [10] validity each other are, in fact, observed to not be related to each other. - The square root of AVE for each construct should exceed the correlations between that and all other constructs.
As seen from Table 2, standardized CFA loadings for all scale items in the CFA model were significant at p <0.01 and all of them meet the requirement of minimum loading criteria should greater than 0.7. From Table 2, we can see the CRs of all factors also exceed the requirement of 0.80. Further, from the principal diagonal elements in Table 3, we can see that all the square root of AVE were greater than the desired minimum 0.71. Hence, all three conditions for convergent validity were met. From Table 3, we can see that the square root of AVE for each construct exceeded the correlations between that and all other constructs. Therefore, the discriminate validity criterion was also met for our data sample.
508
H.-M. Chuang, C.-Y. Lin, and C.-K. Lin
Table 2 AVE, CR and factor loadings of constructs Construct AVE Compliance (COM) 0.72 Identification (IDT) 0.71 Internalization (INT) 0.83 System quality (SQ) 0.67 Information Quality (IQ) 0.76 Service quality (SEQ) 0.87 Perceived net benefits (PNB) 0.73 Attitude (ATT) 0.90 Continuance intention (CI) 0.78
CR 0.84 0.83 0.94 0.92 0.95 0.96 0.73 0.90 0.78
Factor loading 0.73/0.96 0.75/0.93 0.92/0.90/0.92 0.84/0.83/0.79/0.82/0.82/0.80 0.88/0.85/0.88/0.88/0.89/0.86 0.93/0.95/0.93/0.93 0.80/0.87/0.85/0.87/0.88/0.87/0.85 0.82/0.92/0.90 0.96/0.95/0.94/0.94
Table 3 Inter-Construct Correlations
COM IDT INT SQ IQ SEQ PNB ATT CI
COM 0.85 -0.02 -0.08 -0.38 -0.29 -0.39 -0.26 -0.39 -0.29
IDT
INT
SQ
IQ
SEQ
PNB
ATT
CI
0.84 0.70 0.47 0.47 0.24 0.65 0.46 0.60
0.91 0.43 0.41 0.18 0.62 0.44 0.54
0.82 0.81 0.46 0.63 0.57 0.61
0.87 0.47 0.61 0.57 0.54
0.93 0.40 0.37 0.29
0.85 0.68 0.72
0.95 0.69
0.88
Note: Diagonal elements (in bold) represent square root of AVE for the construct.
5.2 Hypotheses Testing The results of hypotheses testing can be drawn as Figure4 below.
-0.08*** Compliance -0.21***
0.30*** 0.05
Identification
0.25***
Internalization
0.01
Perceived Net Benefits (0.62)
0.18** System Quality 0.05
0.45*** 0.16** Attitude (0.54)
0.16*
0.09*
0.69***
Continuance Intention (0.48)
0.05
Note: Path significance * p<0.05, **p<0.01, ***p<0.001; Parentheses indicate R2 values Fig. 4 PLS analysis of research model
Information Quality
Service Quality
Investigating the Continuance Commitment of Volitional Systems
509
6 Discussion and Conclusions 6.1 Discussion of Key Findings Major findings of this study can be described below. First, for psychological attachment variables, compliance shows negative effects on perceived net benefits, attitudes and continuance intention, whereas identification and internalization are beneficial for theses system continuance intention related variables. Second, for users’ strong continuance intention toward volitional systems, their positive perceived net benefits and attitude are essential. Third, system quality, information quality, and service quality all have positive influences on continuance intention related factors. Namely, the IS success model is supported in this study.
6.2 Implication for Practice In order to improve the acceptance and continuance intention of SFS, some helpful guidance’s can be suggested as follows: (1) focusing on improving the objective side of SFS, namely system quality, information quality, and service quality (2) enhancing SFS users’ level of psychological attachment, which is the only way to increase the involvement of users toward SFS and make most of the system.
References [1] Delone, W.H., Mclean, E.R.: Information system success: the quest for the dependent variable. Information System Research 3(1), 60–95 (1992) [2] Delone, W.H., Mclean, E.R.: The Delone and Mclean model of information system success: a ten-year update. Journal of Management Information Systems 19(4), 9–30 (2003) [3] Kelman, H.C.: Compliance, identification, and internalization: Three processes of attitude change? Journal of Conflict Resolution (2), 51–60 (1958) [4] O’Reilly, C.A.I., Chatman, J.A.: Organizational commitment and psychological attachment: The affective compliance, identification, and internalization on pro-social behavior. Journal of Applied Psychology 71(3), 492–499 (1986) [5] Buchanan, B.: Building organizational commitment: The socialization of managers in work organizations. Administrative Science Quarterly 19(4), 533–546 (1974) [6] Mowday, R.T., Porter, L.W., Steers, R.M.: The measurement of organizational commitment. Journal of Vocational Behavior 14(2), 224–247 (1979) [7] Davis, F.D., Bagozzi, R.P., Warshaw, P.R.: User acceptance of computer technology: a comparison of two theoretical models. Management Science 35(8), 982–1003 (1989)
510
H.-M. Chuang, C.-Y. Lin, and C.-K. Lin
[8] Malhotra, Y., Gallette, D.: A multidimensional commitment model of volitional systems adoption and usage behavior. Journal of Management Information Systems 22(1), 117–151 (2005) [9] Chin, W.W., Marcolin, B., Newsted, P.: A Partial Least Squares latent variable modeling approach for measuring interaction effects: results from a Monte Carlo simulation study and an electronic-mail emotion/adoption study. Information System Research 14(2), 189–217 (2003) [10] Fornell, C., Larcker, D.F.: Structural equation models with unobservable variables and measurement errors. Journal of Marketing Research 18(2), 39–50 (1981)
Market Structure as a Network with Positively and Negatively Weighted Links Takeo Yoshikawa, Takashi Iino, and Hiroshi Iyetomi∗
Abstract. Correlation structure in the Tokyo Stock Exchange market is studied from a network viewpoint. The correlation matrix of stock price changes, purified by random matrix theory, is regarded as an adjacency matrix for a network. The stock network thus constructed has weighted links and furthermore those weights can be even negative. By minimizing of the frustration among nodes, it is found that the stocks are decomposed into four comoving groups forming communities, three of which are strongly anticorrelated to each other. Such a frustrated triangle relationship among the groups may give rise to complicated market behavior.
1 Introduction Physical ideas and methods have been successfully applied to disclosing correlation structures hidden behind stock markets since the seminal works [1, 2]. Those studies adopted principal component analysis on correlation matrix, a standard tool for multivariate data. In particular, physicists took advantage of the random matrix theory (RMT) in evaluating how many principal components should be retained as being statistically significant; the RMT works as a null hypothesis. Information contained in the eigenvectors of the correlation matrix elucidated existence of collective motion of business sectors or groups in well-developed markets [3, 4, 5]. Such clustering of stocks were visualized graphically and also modeled in various ways. The objective of the present paper is to throw a new insight into correlations in markets from an alternative point of view. We first regard the correlation matrix of Takeo Yoshikawa Graduate School of Science and Technology, Niigata University, Ikarashi, Niigata 950-2181, Japan Takashi Iino · Hiroshi Iyetomi Faculty of Science, Niigata University, Ikarashi, Niigata 950-2181, Japan e-mail: [email protected] ∗
Corresponding author.
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 511–518. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
512
T. Yoshikawa, T. Iino, and H. Iyetomi
stock price changes, purified by the RMT, as an adjacency matrix to construct a network, and then detect groups in which nodes are strongly correlated as communities in the stock network. Community detection is a powerful tool in studying complex structure of real networks, many of which are not uniform at all. However, it should be noted that the stock network has links whose weights are of indefinite sign, because pairs of stocks can be anticorrelated, in fact, it is true. Recent works [6, 7] provide us with a theoretical framework for detecting communities in networks with negative links as well as positive links.
2 Community Detection Let us consider a directed weighted graph G composed of n nodes and m links; weights associated with the links can take both signs. The total numbers of positive and negative links in G are denoted as m+ and m− , respectively; hence m = m+ + m− . We define an adjacency matrix A of the network as follows: if there is a direct connection from node i to node j, then Ai j = wi j , and Ai j = 0 otherwise, where wi j is weight for the directed link from node i to node j. We then separate the negative and + − positive links by setting A+ i j = Ai j if Ai j > 0 and Ai j = 0 otherwise, and Ai j = −Ai j − + − if Ai j < 0 and Ai j = 0 otherwise, so that A = A − A . The positive and negative, incoming and outgoing degrees of i are expressed as ± out ki
=
n
∑ A±i j ,
± in ki
j=1
=
n
∑ A±ji .
(1)
j=1
Also we suppose that the network is partitioned into communities; the community to which node i belong is designated by σi . A configuration of the community assignment is thus represented by a set {σ } of σi s.
2.1 Frustration At a given assignment {σ }, correlation and anticorrelation coexist within communities. Such a situation is referred to as frustration within communities, and it is measured [7] by F({σ }) = − ∑ Ai j δ (σi , σ j ),
(2)
ij
where δ (i, j) is the Kronecker delta. Minimizing the frustration F({σ }) thus means maximizing the number of positive links or the sum of positive weights within communities. The frustration is peculiar to networks with both positive and negative links, so that it does not have its counterpart for networks with only positive links.
Market Structure as a Network with Positively and Negatively Weighted Links
513
2.2 Modularity To detect communities in a network, maximization of modularity is often used. The modularity for directed graphs is defined [8] as 1 (Ai j − pi j ) δ (σi , σ j ), m∑ ij
Q({σ }) =
(3)
where m = ∑i kiout = ∑i kiin is the sum of in- and outdegrees of node i, and pi j = kiout kinj /m is the expected value of the number of links from node i to j under the uniform random model. Community detection through modularity maximization can work for networks with only positive links, however.
2.3 Hamiltonian Traag and Bruggeman [7] introduced an extension of modularity to accommodate more general networks. According to them, it is called Hamiltonian and defined as − − H ({σ }) = − ∑[Ai j − (γ + p+ i j − γ pi j )]δ (σi , σ j ),
(4)
ij
where γ ± are thresholds for clustering nodes together versus keeping them apart. The Hamiltonian reduces to the frustration (2) in the limit of γ ± = 0 and also recovers the modularity (3) with γ ± = 1 for networks with only positive links.
3 Stock Correlation Network 3.1 Correlation Matrix We analyzed the daily prices of N = 557 stocks belonging to the Tokyo Stock Exchange (TSE) for the 10-year period 1996–2006 (T = 2706 daily returns). The elements of correlation matrix C are calculated as Ci j =
1 T
∑ t
Gi,t − Gi G j,t − G j , σi σj
(5)
where Si,t is the price of stock i at time t, σi is the standard deviation of price fluctuations Gi,t ≡ ln Si,t+1 − ln Si,t , and . . . denotes time average over the period studied.
3.2 Random Matrix Theory To establish a null hypothesis for filtering the correlation matrix, we consider a random correlation matrix given as
514
T. Yoshikawa, T. Iino, and H. Iyetomi
C=
1 HHT , T
(6)
where H is an N × T matrix composed by N time series of random variables with zero mean and unit variance of length T ; its elements are totally independent. In the limit N, T → ∞ with fixed Q ≡ T /N, the probability density function ρ (λ ) of eigenvalue λ of the random correlation matrix C is given [9] by Q (λ+ − λ )(λ − λ−) ρ (λ ) = , (7) 2π λ √ 2 where the eigenvalues are bounded by λ± = 1 ± 1/ Q .
3.3 Genuine Correlations The correlation matrix may be decomposed [2, 3, 10] into C = Cmarket + Cgroup + Crandom 13
557
i=2
i=14
= λ1 u1 uT1 + ∑ λi ui uTi +
∑ λiui uTi ,
(8)
where λi ’s are the eigenvalues of C sorted in descending order and ui are the corresponding eigenvectors. If there were no correlation between stock prices, the eigenvalues shall satisfy the distribution of Eq. (7). Hence, one can not distinguish fluctuations ascribed to ui with λi ≤ λ+ from noise. In the TSE data, the value of Q 4.86, and we obtained λ+ 2.11. Because the 13th and 14th eigenvalues T are λ13 > λ+ > λ14 , the interaction ∑557 i=14 λi ui ui is regarded as being random (see Fig. 1). Furthermore, λ1 is eight times or more as large as λ2 , and the components of u1 are all positive (see Fig. 2). So λ1 u1 uT1 indicates the market mode. Now, we adopt Cgroup for the adjacency matrix to construct a stock correlation network. We exclude Cmarket because it just describes a collective motion of the whole market. To see how much each λi ui uTi (i = 2, 3, . . . , 13) contribute to Cgroup , we define C(l) group as l
T C(l) group = ∑ λi ui ui , l = 2, 3, . . . , 13.
(9)
i=2
As observed in Fig. 2, collective behavior of sectors is manifested in u2 , u3 and u4 . This just confirms the result in [4].
Market Structure as a Network with Positively and Negatively Weighted Links
515
Probability Density
0.2
1.2
Largest λ1
0.1
ρ(λ)
0.8 0.4 0
0
0
50
100 130
λ+ 0
1
2 3 4 Eigenvalue λ
5
Fig. 1 The probability density of the eigenvalues of the correlation matrix C and the theoretical distribution ρ (λ ) predicted by Eq. (7). The inset shows the largest eigenvalue.
λ1 ≅ 132.95
0.1 0
0.1 0
-0.1
-0.1
-0.2
-0.2 Foods
λ2 ≅ 15.48
0.1 0
-0.1
Iron & Steel Construction
λ4 ≅ 9.74
0.2
u4(i)
u2(i)
0.2
λ3 ≅ 11.98
0.2
u3(i)
u1(i)
0.2
0.1 0
-0.1
Precision Instruments -0.2 Electric Appliances
0
100
200 300 400 Stock Index i
500
Banks Electric Power & Gas
-0.2 0
100
200 300 400 Stock Index i
500
Fig. 2 The components of eigenvectors u1 , u2 , u3 and u4 with the corresponding eigenvalues λ1 , λ2 , λ3 and λ4 .
4 Correlation Structure in the Stock Network For simplicity, we carried out community detection in the stock correlation network within the framework of frustration. We used a simulated annealing method to min(l) imize the frustration F({σ }) for the network determined by Cgroup .
4.1 Evolution of the Community Decomposition Table 1 shows that complexity of the community structure is enhanced with increasing l. The eigenvector u2 perfectly separates the stocks into two groups. Stocks
516
T. Yoshikawa, T. Iino, and H. Iyetomi
within their own communities move collectively and stocks belonging to different communities move oppositely. Then inclusion of u3 decomposes the two groups into three, which strongly competes against each other as will be shown later. Inclusion of u4 further decomposes the three groups into four. However, evolution of the community decomposition ceases at the level of l = 4. It has been confirmed that details of the four communities remained almost same beyond l = 4.
Table 1 Number of communities and their size at each l. l 2 3 4 5 6 7 8 9 10 11 12 13
Comm. 1 279 199 175 175 177 179 179 178 178 173 174 175
Comm. 2 278 192 149 149 149 148 150 146 145 146 146 148
Comm. 3 — 166 119 119 117 117 117 118 119 123 122 118
Comm. 4 — — 114 114 114 113 111 115 115 115 115 116
Fig. 3 Flow diagram of nodes during the community decomposition evolved with increasing l. Nodes of community 1 at the level of l = 2 flow along the solid line, and nodes of community 2 at l = 2 flow along the dotted line. The diameter of each circles representing a community is proportional to the community size, and the line width of each flow is proportional to the number of nodes in it.
Market Structure as a Network with Positively and Negatively Weighted Links
517
4.2 “Polarization Ratio” To elucidate the correlation structure obtained by the frustration optimization, we calculated the polarization ratio defined by P± =
∑ Ai j , ∑ |Ai j |
(10)
where ∑ sums up weights of links within community for P+ and weights of links between communities for P− . By definition, −1 ≤ P− ≤ 0 ≤ P+ ≤ 1. Figure 4 shows that all of the communities are full of positive weights and almost perfectly exclusive of negative weights. In return, the communities are interconnected mainly with negative links. The new group (Comm. 3) formed at the level of l = 4 is composed of stocks spilled over from the original three groups and takes a rather marginal position against the remains that are strongly frustrated each other. We thus find that there exists strong frustration in the stock market.
Fig. 4 Polarization ratio of the configuration at l = 3 (left) and l = 4 (right). Circles represent community, respectively. The numbers in the circles refer to the value of P+ ; the numbers associated with links between circles, to the value of P− .
5 Summary We revisited correlation structure in the Tokyo Stock Exchange market from a network viewpoint. The correlation matrix of stock price changes, purified by the RMT, was used to construct a network. The stock network thus constructed has weighted links and furthermore those weights can be even negative. The present study has uncovered that the market had four communities consisting of strongly correlated stocks through minimization of the frustration. The stock prices within the groups comove almost perfectly. In contrast, three of the four groups are in a strongly anticorrelated state. Such a frustrated triangle relationship among the groups may give rise to complicated market behavior. Detailed analysis of the communities detected here is in progress.
518
T. Yoshikawa, T. Iino, and H. Iyetomi
This work was partially supported by the Program for Promoting Methodological Innovation in Humanities and Social Sciences by Cross-Disciplinary Fusing of the Japan Society for the Promotion of Science and by the Ministry of Education, Science, Sports, and Culture, Grants-in-Aid for Scientific Research (B), Grant No. 22300080 (2010-12).
References 1. Laloux, L., Cizeau, P., Bouchaud, J.-P., Potters, M.: Phys. Rev. Lett. 83, 1467 (1999) 2. Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A.N., Stanley, H.E.: Phys. Rev. Lett. 83, 1471 (1999) 3. Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A.N., Stanley, H.E.: Phys. Rev. E 65, 066126 (2002) 4. Utsugi, A., Ino, K., Oshikawa, M.: Phys. Rev. E 70, 026110 (2004) 5. Pan, R.K., Sinha, S.: Phys. Rev. E 76, 046116 (2007) 6. G´omez, S., Jensen, P., Arenas, A.: Phys. Rev. E 80, 016114 (2009) 7. Traag, V.A., Bruggeman, J.: Phys. Rev. E 80, 036115 (2009) 8. Leicht, E.A., Newman, M.E.J.: Phys. Rev. Lett. 100, 118703 (2008) 9. Sengupta, A.M., Mitra, P.P.: Phys. Rev. E 60(3), 3389 (1999) 10. Kim, D.H., Jeong, H.: Phys. Rev. E 72, 046133 (2005)
Method of Benchmarking Route Choice Based on the Input Similarity Using DEA Jaehun Park, Hyerim Bae, and Sungmook Lim
*
Abstract. Benchmarking requires an effective methodology for finding the best performer, which entails an evaluation of the relative efficiencies of competitors in terms of multiple input and output factors. To identify the best performer, Data Envelopment Analysis (DEA) has been popularly used. However, the conventional DEA has some deficiencies with respect to its use for benchmarking. First, the reference set of an inefficient DMU often has multiple efficient DMUs. Second, it might be quite impossible for an inefficient DMU to achieve its target’s efficiency in a single step, especially when the target is far removed from the DMU. To overcome these deficiencies of conventional DEA, we propose a new stepwise benchmarking method using DEA, which enables inefficient DMUs to select the more appropriate benchmarking DMU based on the similarity. Keywords: Data envelopment analysis, K-means clustering, benchmarking.
1 Introduction In general, a benchmarking process for the inefficient organization to improve its efficiency consists of three steps. The first step is identifying a company that is acknowledged as the best performer and the second is setting benchmarking goals and the final is implementing the best practices (Donthu et al. 2005). Selecting the best performers for inefficient organizations is the first step for the benchmarking procedure. And it can be considered to be the most important activity in the benchmarking process. To identity a best performer, Data Envelopment Analysis (DEA), a methodology for measuring the relative efficiencies among homogeneous Decision-Making Units (DMUs), Jaehun Park · Hyerim Bae Business & Service Computing Lab., Industrial Engineering, Pusan National University 30-san Jangjeon-dong Geumjong-gu, Busan 609-735, South Korea e-mail: [email protected], [email protected] *
Sungmook Lim Division of Business Administration, College of Business and Economics, Korea University Jochiwon, Yeongigun, Chungnam 339-700, South Korea e-mail: [email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 519–528. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
520
J. Park, H. Bae, and S. Lim
has been used (Ross and Droge 2002). DEA accomplishes this task by means of multiple inputs and outputs, yielding a reference target for an inefficient DMU along with the corresponding efficiency gap. Several practical problems need to be addressed in benchmark target selection using DEA for inefficient DMUs. One of the problems to be discussed in this research is that it might not be feasible for an inefficient DMU to achieve its target’s efficiency in a single step, especially when that DMU is far from the target DMU on the frontier. To resolve this problem, various methods of stepwise benchmarking have been proposed in the literature. The work for the stepwise improvement provides a stepwise path for improving the efficiency of each inefficient DMU. Joe (2003) proposed a stratification method by iteratively generating the efficient frontiers. Alirezaee and Afsharian (2007) proposed a layered efficiency evaluation model that provides a strategy by which an inefficient DMU can move toward a better layer. However, this model lacks information on how to choose the reference DMU on each layer. Lim et al. (2011) proposed a stratification benchmarking path method that can select next benchmark DMU by applying the Context-dependent DEA and stratification method. The existing works addressed here can be considered to more realistic and more effective than conventional DEA approaches, because these overcome the limitations of conventional DEA in aspects of benchmarking and propose the stepwise benchmarking DMU for each inefficient DMU based on the efficiency. However, the existing stepwise benchmarking methods are still limited in that the stepwise benchmark number is very depends on the number of stratified layers. It means that excessive benchmark activity can be occurred when the number of stratified layers is large. And the existing stepwise benchmarking methods have to improve its efficiency by traversing a sequence of all layers. This may lead to a situation where the gap of efficiency between inefficient DMU and next benchmark target DMU can be very small. In this paper, we propose a new stepwise benchmarking method using DEA that inefficient DMU can select optimal path to benchmark most efficient DMU base on the similarity among the input and output patterns to overcome the above mentioned deficiencies of stepwise benchmarking methods. As an application of the proposed method, benchmarking of East Asia container terminal has been conducted. The structure of this paper is organized as follows. Section 2 provides an overview of conventional DEA. Section 3 discusses the proposed method, and section 4 details our empirical study. Finally Section 5 summarizes our work.
2 Related Work 2.1 Data Envelopment Analysis (DEA) Data Envelopment Analysis (DEA) is a linear-programming methodology that evaluates the relative efficiencies of DMUs using a set of inputs to produce a set of outputs (Joe, 2003). The mathematical model of DEA is represented by Equation (1). This (Equation (1)), a CCR model, is a basic DEA model initially developed by Charnes, Cooper & Rhodes (1978). Here, ur is the weight given to the r-th output, vi is the weight given to the i-th input, n is the number of DMUs, s is the
Method of Benchmarking Route Choice Based on the Input Similarity Using DEA
521
number of outputs, m is the number of inputs, k is the DMU being measured, yrj is the amount of the r-th output produced by DMU j, and xij is the amount of the i-th input produced by DMU j. The DEA model can be divided into input-oriented and output-oriented versions, according to the rationale for conducting DEA. The input-oriented model minimizes inputs with the given outputs, whereas the outputoriented model maximizes outputs with the given inputs. The fractional model shown as (1) can be converted to a linear model. For more details on model development refer to Charnes et al. (1978). s
∑u
r
y rk
r =1 m
max
∑v x
i ik
i =1 s
∑u
r
y rj
r =1 m
s.t.
∑
≤ 1; j = 1,..., n
vi xij
i =1
u r , vi ≥ 0; r = 1,..., s; i = 1,..., m
(1)
DEA can be a beneficial tool to improve performance through efficiency evaluation and benchmarking, specifically by suggesting a reference set, which is a set of corresponding efficient units that can be utilized as a benchmark for improvement. The reference set can be obtained by dual model, as shown in (2). s ⎛ m ⎞ min θ − ε ⎜ ∑ s −i + ∑ s +r ⎟ r =1 ⎝ i =1 ⎠ n
s.t.
∑λ x j =1
j ij
n
∑λ y j =1
j
rj
− θ xik + si− = 0, (i = 1, 2,..., m), − yrk − sr+ = 0, (r = 1, 2,..., s),
λ j , si− , sr+ ≥ 0,
(j = 1, 2,..., n)
(2)
In model (2), θ is the efficiency score, λj is the dual variable, and ε is a nonArchimedean infinitesimal. By solving model (2), we can identify a composite DMU (a linear combination of DMUs) that utilizes less input than the test DMU while maintaining at least the same output levels. The optimal values of the dual variable λj are the coefficients for this linear combination of units. The set of units involved in the construction of the composite DMU can be utilized as a benchmark for improvement of the inefficient test DMU. If a DMU is given an efficiency score of ‘1’, it is considered to be efficient; an efficiency score less than ‘1’ indicates inefficiency.
522
J. Park, H. Bae, and S. Lim
2.2 K-Means Clustering Algorithm K-means, which was proposed by MacQueen (1967), is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple way to classify a given data set through a certain number of clusters fixed a priori. The main idea is to define k-th centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early group age is done. At this point we need to re-calculate k-th new centroids as barycenters of the clusters resulting from the previous step. After we have these k-th new centroids, a new binding has to be done between the same data set points and the nearest new centroid. A loop has been generated. As a result of this loop we may notice that the k-th centroids change their location step by step until no more changes are done.
3 Proposed Method In this section, we propose a new stepwise benchmarking method, which enables inefficient DMU to select the benchmarking DMUs, based on unit similarity. Let’s consider the following supermarket example, originally introduced in Copper et al. (2006) and more DMUs are added for easy explanation of the procedure outlined in this paper. Table 1 consists of twelve DMUs, and each DMU consumes two inputs and yields one output. The conceptual procedure of the proposed method is as follows. First, we calculate the relative efficiency score of each DMU by solving the conventional DEA model, and select an evaluated DMU, which expects improve its efficiency, among the relative inefficient DMUs. Second, we create the layers based on the efficiency of DMU using stratification DEA method for the stepwise benchmarking. Third, the similarity groups based on the input and output patterns using K-means clustering algorithm are created. In this step, DMUs are clustered according to the similarity of input and output patterns. Fourth, the most similar group from the group containing the evaluated DMU by using the distance value of each group is selected. Finally, we determine the next benchmarking target by choosing the DMU which has the maximum efficiency value among the selected group in fourth step. Table 1 Supermarket Example Store Employee Floor area Sales
x1 x2 y
A 2 4 1
B 4 2 1
C 8 1 1
D 3 6 1
E 4 3 1
F 5 2 1
G 5 6 1
H 6 3 1
I 7 3 1
J 6 9 1
K 6 4 1
L 7 7 1
First, we use the stratification DEA method proposed by Seiford and Zhu (2003) for the stepwise benchmarking. Yoon et al. (2005) and Lim et al. (2011)
Method of Benchmarking Route Choice Based on the Input Similarity Using DEA
523
already used this stratification DEA method to find stepwise benchmark target from inefficient DMUs. When we apply stratification DEA method to supermarket example, we can obtain the five layers as illustrated in Figure 1. If DMU L be a unit which wants to improve, DMU L can improve its efficiency by crossing the sequence of layers. In this paper, DMU which want to improve its efficiency such as DMU L, it is called as evaluated DMU. x2 Layer 1 Layer 2
Layer 4 Layer 5 Layer 3
J
L G
D
K
A E B
I
H F
C x1
Fig. 1 Benchmarking target of stratification method
Next, we consider unit similarity for selecting benchmarking path. Unit similarity is for the selection of similar DMUs to the evaluated DMU. Before we select a similar DMU, we classify all DMUs into similarity groups using K-means clustering algorithm based on input and output patterns. The closer distance between groups can be defined as the higher similarity. In order to calculate the similarity of each cluster, we use the final distance values of each cluster from the result of K-means clustering algorithm. If we apply this algorithm with four clustering numbers to the supermarket example, we can obtain the result shown in the Figure 2. x2
Cluster 4 J
L
Cluster 1 (4.517)
D
G (4.333)
(6.642)
(2.807)
A E B
K
Cluster 2
H
I
F
Cluster 3 (2.828)
C
(5.489) x1
Fig. 2 Clustering result with final distance values of each cluster centroid
524
J. Park, H. Bae, and S. Lim
Then, benchmarking possibility set is determined. Benchmarking possibility set means the reference set on the efficient frontier of each layer from evaluated DMU. We suggest an extended DEA model to obtain reference set of each layer. The reference set of each layer can be calculated by using Equation (4).
min δ s.t. n
∑λ x
j ij
j∈E
≤ δxike , (i = 1,..., m),
(4)
l
n
∑λ y
j rj
j∈E
e , (r = 1,..., s), ≥ y rk
l
λj ≥ 0 In Equation (4), l is the layer numbers, El means the set of DMU on l-th layer, xik are i-th input factors of k-th evaluated DMU, and yrke are r-th output factors of k-th evaluated DMU. If DMU L be the evaluated DMU, the benchmarking possibility set (DMU K in layer 4, DMU G and H in layer 3, DMU D and E in layer 2, and DMU A and B in layer 1) can be obtained. Figure 3 shows the benchmarking possibility set of each DMU. Based on the aforementioned method, we determine benchmarking DMU as the next benchmarking target of evaluated DMU. In order to select the benchmarking DMU considering unit similarity and efficiency score, we define target selection method. The target selection method consists of two steps. First, select the cluster, which has the highest similarity value from cluster containing the evaluated DMU. Second, select a benchmarking DMU as the next benchmarking target of the evaluated DMU by choosing DMU with higher efficiency value among the benchmarking possibility DMUs within the selected group in first step. e
x2 Layer 1 Layer 2
Layer 4 Layer 5 Layer 3
J
L G
D
K
A E B
H
I
F C x1
Fig. 3 Benchmarking possibility set from each DMU
Method of Benchmarking Route Choice Based on the Input Similarity Using DEA
525
The procedure of proposed method is described in detail Figure 4.
Measure the relative efficiency score
Select an evaluated DMU
Create the similarity groups
Stratify the efficient layers
Calculate the similarity value of each clusters Calculate the benchmarking possibility set from evaluated DMU Select a next benchmarking DMU by using target selection method Substitute the new benchmarking target for the evaluated DMU No
If efficiency score of evaluated DMU = 1 Yes Terminate the procedure
Fig. 4 Detail procedure of the proposed method
When we apply the proposed method to supermarket example with DMU L as evaluated DMU, DMU L can be contained in cluster 4 with DMU G and J. After calculate the similarity value of each clusters, similarity value between Cluster 4 and Cluster 2 is the highest. The benchmarking possibility set of DMU L in Cluster 2 is DMU K and H, and the efficiency score of DMU H is higher than DMU K. Therefore, DMU H can be selected as next benchmarking DMU of DMU L. DMU H is substituted as the evaluated DMU. The similarity value between Cluster 1 and Cluster 2 is the highest. The benchmarking possibility set in Cluster 1 is DMU A and B. DMU B can be selected next benchmarking target of DMU H. Since the efficiency score of DMU B is equal to 1, the procedure is terminated. The final benchmarking path of DMU L is shown in Figure 5.
526
J. Park, H. Bae, and S. Lim x2 Layer 1 Layer 2
Layer 4 Layer 5 Layer 3
J Cluster 4
L Cluster 1
G
D
K
A
H
E B
Cluster 2
I
F C
Cluster 3
x1
Fig. 5 Benchmarking path of DMU L
4 Case Study For case study, data have been collected for 22 East Asian container terminals, accessing relevant data sources from Containerization International Year Book 2005. We applied our method to the data set. First, the efficiencies of the container terminals were evaluated by simple DEA according to the numbers of berths, the lengths of berths (m), the total area of the port (km2) and the number of cranes, as inputs, while the total container traffic (TEU) data were used as outputs. Four terminals (Hongkong, Sanghai, Shenzhen, Xiamen) were determined to be on the efficient frontier, and the remaining 18, inefficient. Next, to explain the selection of benchmarking path, we choose Kwangyang, as an evaluated DMU (efficiency score is 0.18), which is one of very inefficient DMUs. Six layers were stratified using stratification DEA method. Hongkong, Sanghai, Shenzhen and Xiamen are contained in layer 1, and Kwangyang and Kobe are contained in layer 6 since they are very inefficient DMUs. The benchmarking path of Kwangyang is shown in Figure 6. Kwangyang choose Tokyo as first benchmark target, and then Ningbo as second target; finally select Shenzhen as final benchmarking target. When we consider stratification DEA method for the stepwise benchmarking, Kwangyang has to benchmark five times. However, in our proposed method, Kwangyang benchmark just two times. Our approach differs from previous stepwise benchmarking method in that it considers similarity in reducing the benchmarking number and rendering inefficient DMUs efficient.
Method of Benchmarking Route Choice Based on the Input Similarity Using DEA
Hongkong
Layer 1
Layer 2
Busan
Layer 3
Qingdao
Layer 4
Layer 5
Sanghai
Kaohsiung
Tianjin
Tokyo
Yokohama
Guanzhou
Dalian
Osaka
Layer 6
Shenzhen
Kobe
Taichung
527
Xiamen
Ningbo
Keelung
Lianyun gang
Nagoya
Incheon
Fuzhou
Kwangyang
Fig. 6 Benchmarking path of DMU L
5 Conclusions In this paper, we proposed a DEA based method of selecting stepwise benchmarking path, which takes unit similarity into account. This is a new method that is formulated to remedy the drawback of the existing stepwise benchmarking, which is, that they consider only the influence of efficiency when an inefficient DMU has to benchmark its benchmarking target using DEA. Our approach differs from previous ones in that it considers similarity in reducing the benchmarking number. For the unit similarity, we defined a target selection method that finds the similar DMUs with the evaluated DMU based on the input and output patterns using kmean cluster algorithm. As an application of the proposed method, benchmarking of an East Asian container terminal was tested in the present study. The results show that the stepwise benchmarking target of an inefficient DMU could be found.
Acknowledgement This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No.2010-0027309).
References 1. Alirezaee, M.R., Afsharian, M.: Model improvement for computational difficulties of DEA technique in the presence of special DMUs. Applied mathematics and Computation 186, 1600–1611 (2007) 2. Charnes, A., Cooper, W.W., Rhodes, E.: Measuring the efficiency of decision making units. European Journal of Operational Research 2, 429–444 (1978)
528
J. Park, H. Bae, and S. Lim
3. Cooper, W.W., Lawrence, M.: Seiford and Kaoru T, Introduction to Data Envelopment Analysis and Its uses: with DEA solver software and reference. Interface (2006) 4. Donthu, N., Hershberger, E.K., Osmonbekov, T.: Benchmarking marketing productivity using data envelopment analysis. Journal of Business Research 58, 1474–1482 (2005) 5. Gonzales, E., Alvarez, A.: From efficiency measurement to efficiency improvement: The choice of a relevant benchmark. European Journal of Operational Research 133, 512–520 (2001) 6. Joe, Z.: Quantitative models for performance evaluation and benchmarking-Data Envelopment Analysis with Spreadsheets and DEA Excel Solver. Kluwer Academic Publishers, Dordrecht (2003) 7. Kohonen, T.: An introduction to neural computing. Neural Networks 1, 3–16 (1988) 8. Lim, S., Bae, H., Lee, L.H.: A study on the selection of benchmarking paths in DEA. Expert System with Applications 38, 7665–7673 (2011) 9. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967) 10. Ross, A., Droge, C.: An integrated benchmarking approach to distribution center performance using DEA modeling. Journal of Operations Management 20, 19–32 (2002) 11. Shaneth, A.E., Hee, S., Young, A., Su, H., Shin, C.: A method of stepwise benchmarking for inefficient DMUs based on the proximity-based target selection. Expert Systems with Applications 36, 11595–11604 (2009)
Modelling Egocentric Communication and Learning for Human-Intelligent Agents Interaction R. Gobbin, Masoud Mohammadian, and Bala M. Balachandran*
Abstract. This paper explores new models of software agent architectures able to provide agents with subjectivity and egocentric communication patterns aiming to increase knowledge of subjective intelligent systems communication internal processes improving the design of human-intelligent agent interface tools. The new model may be applied to intelligent agent and knowledge systems adopting subjective and egocentric communication in areas such as analysis of security intelligence, legal argument analysis and intelligent web search engines. Keywords: software agents, Eclipse, JADE, communication, interaction, subjectivity, mediated activity.
1 Introduction The recent advances in researching intelligent multiple agent cooperation and communication require a deeper understanding of the internalisation - externalisation dynamics between thought and speech on the human user side while it is interfacing with inferential knowledge and agent communication language used in multiple agents system environment coordination. Goal driven cooperative activities mediated by the use of tools are deeply ingrained in human behaviour and have existed since the emergence of hunting and gathering societies. The complexity of modern global enterprise working environments requires communicative interaction with intelligent systems able to interact with humans in a meaningful way. [1] To address the issue of subjective communication exchanges between human users and intelligent software agents, we require the analysis and design of new Intelligent Software Agents (ISA) architectures models. These models should be *
R. Gobbin · Masoud Mohammadian · Bala M. Balachandran Faculty of Information Sciences and Engineering The University of Canberra, ACT, Australia e-mail: {renzo.gobbin,masoud.mohammadian, bala.balachandran}@canberra.edu.au
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 529–536. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
530
R. Gobbin, M. Mohammadian, and B.M. Balachandran
able to provide objective internalisation with learning and retention of conceptual knowledge data and subjective externalisation of internal knowledge the agent posses by subjective communication exchanges. Multiple agents need to use a range of communicative tools in order to transfer messages in a context of intelligent agent cooperative activities. Cooperation is hereby defined as the activity of multiple agents working together requiring agents’ communicative capacity in order to exchange information about their common goals, their own identity, and their current status. [2], [3], [4] Software agent communication technology has been utilised to form clusters of cooperative agency activities enabling such agents to produce and receive ACL Speech-Act messages. Although deprived of the richness of human verbal communication, an embryonic form of speech-act communication is achieved. The same internalisation and externalisation processes of human speech and language could then be modelled in communicative agents’ internal processes that could also be externalized in sets of communicative behaviour using specific agent languages as a mediating tool. [5] By using a meta-structured intelligent agent network we will be able to capture internal empirical data between multiple areas of the intelligent model under research so that phenomenological internal aspects of the intelligent system conceptual knowledge can be properly analysed. Such an architectural model could then provide answers to the following questions: 1. 2.
What is the role that subjectivity could play in artificial intelligent systems communication and learning? Can an intelligent system model internal architecture achieve subjective internalisation and egocentrism?
The above questions are important to the investigation and construction of an intelligent software agent meta-structure model capable to provide the necessary subjective functions and methodology.
2 Subjective Agents and Tool Mediated Activity In order to model intelligent agent subjective properties, agent architectures should include: 1. 2. 3.
Language mediated communicative activities. Subjective and Objective properties required by intelligent agents to perform subjective communication activities. The ability to internalize representations of communicative knowledge patterns and subsequently to externalize internally stored knowledge representations to other agents or humans.
The subjective software agent characteristics described above can form a platform for modelling intelligent agent interactive cooperation activities by applying
Modelling Egocentric Communication and Learning
531
tools mediated Activity Theory proposed by Vygotsky for an explanation of cognitive development and learning thus providing an appropriate theoretical framework for investigating intelligent systems communication activities models. [6] Vygotsky’s tool mediation theories have been proposed using the model described in Fig. 1, in research on Human Computer Interaction software applications design. [7]
Fig. 1 Activity Theory model.
Cognitive scientist and philosophers often imply in a metaphorical way, that speech and language are in fact mediating tools used in a communication activity. For example, Wittengstein in his “Philosophical Investigations” relates tools in a toolbox with word generation functionality while Kempson contrasts Austin’s concept of speech-act with Grice’s co-operative principle of communication. [8], [9], [10], [11] The use of language tool mediation in the area of cooperative multiple agents communication is novel but appropriate, as an agent identity represents the dialectical difference between external and internal communication activities. Subjective and objective properties acquired during communication can therefore be merged in a single agent entity as described in Fig. 2. From a computer science perspective agents are considered autonomous, asynchronous and using distributed processes with their distinct objective related traits. From an intelligent systems perspective software agents are considered communicative, intelligent and rational with the possibility of intentional communication so that they could qualify for subjective traits. Both objective and subjective perspectives require different architecture and modelling approaches. While the first perspective has the objective characteristics of software tools, the second implies intelligent communication and therefore requires a subjective paradigm. By using a mediated activity agent model with subjective as well as objective characteristics, the integration of computer science perspectives and intelligent systems perspectives can be achieved.
532
R. Gobbin, M. Mohammadian, and B.M. Balachandran
Fig. 2 Multi-agent tool mediated activity.
The new model for software agents under investigation can describe subjective software agent mediated activities and at the same time takes into account the agent objectivity while communicating with other agents.
3 Subjective Multi-agent Cluster Model The data collected during the modelling tests indicate promising roles that subjectivity can add to the area of intelligent artificial systems communication and learning. The aim was to find specific areas to help the determination of how subjectivity can enhance communication among artificial intelligent systems and also between users and intelligent software application tools. I have focused analysis on two areas: 1. Subjectivity in external communication between subjective systems. 2. Egocentrism in the internal system communication when monitored by sniffer tools. The first point is related to the area of subjective externalisation that could be summarised as the way intelligent entities convey their internalised concepts. It is an important task for a communicative subjective intelligent system to determine if an interlocutor subjective system is really subjective. Any automatic externalisation by software agents and applications that do not perform subjective communication are easily picked up by a subjective system after a few exchanges. The second point is related to intelligent systems subjectivity egocentric communication research as the capability of monitoring egocentric communication is important to further refine and optimise future subjective models and also to aid
Modelling Egocentric Communication and Learning
533
research in cognitive disciplines. The subjective modelling experiment was conducted using two ontologies: 1. 2.
A domain based ontology adapted to work with Jade agent development environment. An internal ontology that has been specifically created for the egocentric communication reflecting Vygotsky’s cognitive theories.
I have used the term meta-agent to describe the grouping of three agents composing the model and being part of and performing the functions related to a whole virtual subjective agent as seen in Fig. 3.
Fig. 3 Multi-agent tool mediated activity.
Three JADE agents Objective, Alma and Subjective were created to construct the ALMA meta-agent platform in order to model subjective communication exchanges and egocentric communication that could express subjective cognition and learning. Two models of complete communication exchanges are shown, one providing a confirmation to the initial request and the second one providing a disconfirmation. Egocentric communication can be seen performed by the agent ALMA that is communicating within using internal ontology while the agents Subject and Object use the domain ontology.
534
R. Gobbin, M. Mohammadian, and B.M. Balachandran
The importance for an intelligent subject’s cognitive development of possessing internal egocentric linguistic tools is vital for Vygotsky’s theory on subjectivity as the verbal thinking faculty is based on conscious awareness of verbal preparation of externalised concepts. When two subjective intelligent systems externalise concepts, an internal verbal thinking must be in place hidden from each other. Only the externalised statement is visible and the possibility of internal concepts externalised in a subjective mode is possible with an experimental model where both multi-agent clusters have a definitive subjective architecture as described in Fig. 4.
Fig. 4 ALMA Meta-Agent architecture.
Modelling Egocentric Communication and Learning
535
When there is a number of artificial intelligent system communicating we have communication exchanges involving two or many clusters of software agents with subjective externalisations. I have prepared two multi-agents subjective cluster architectures using communicative exchanges that can form a model for an analysis of message exchanges involving subjectivity. The experiment has a communication architecture that comprises two subjective meta-agents clusters made by a model reflecting Vygotsky’s activity theory.
4 Conclusions and Further Research The constructed artificial subjective model demonstrated that subjectivity can have an “ontological shift” role in the conceptual learning process and a “subjective switch” role in determining the adequate externalisation of internalised concepts. The “ontological shift” role is achieved by the processes of internalisation, egocentric communication and the use of an internal ontology that can store and change conceptual objects and their properties. The “subjective switch” role is achieved by using the egocentric communication and the requirement to externalise the content of acquired internal ontology concepts to an external objective domain. We can conclude that subjectivity has the role of enabling ontological shifts conducive to conceptual learning in communicating artificial intelligent systems provided the communicative exchange is performed between artificial intelligent systems possessing basic egocentric communication capabilities together with subjective properties involving concept’s externalisation and internalisation. Having established on the modelling tests that subjective qualities are derived by egocentric communication capabilities, we conclude that egocentric exchanges are important in determining the externalised subjective properties that intelligent systems can use in subjective communication. Artificial intelligent systems egocentric communication is an area that deserves to be followed up in future research efforts. Further research focused on subjective agents cluster models will focus on defining a model able to increase understanding of the possible role of subjectivity and egocentric communication in artificial intelligent systems. The research field of artificial intelligence could make use of artificial subjective models similar to ALMA meta-agent clusters to increase knowledge in advanced decision making areas such as Human-Intelligent agent’s interaction for security intelligence analysis, Legal systems and business decision making systems.
References [1] Lock, A.J., Peters, C.R.: Social Relations, Communication and Cognition. In: Lock, A.J., Peters, C.R. (eds.) Handbook of Human Symbolic Evolution. Clarendon Press, Oxford (1996)
536
R. Gobbin, M. Mohammadian, and B.M. Balachandran
[2] Franklin, S., Gaesser, A.: It is an Agent or just a program?: A taxonomy for autonomous agents. In: Jennings, N.R., Wooldridge, M.J., Müller, J.P. (eds.) ECAI-WS 1996 and ATAL 1996. LNCS (LNAI), vol. 1193, Springer, Heidelberg (1997) [3] Odell, J, (ed.) Agent Technology, OMG Document 00-09-01, OMG Agents interest Group (September 2000) [4] Cohen, P.R., Levesque, H.J.: Communicative Actions for Artificial Agents. In: Proceedings of the First International Conference on Multi Agent Systems. AAAI Press, San Francisco (1995) [5] Finin, T., McKay, D., Fritzson, R., McEntire, R.: KCaML: An Informatio and Knowledge Exchange Protocol. In: Fuchi, K., Yokoi, T. (eds.) Knowledge Building and Knowledge Sharing. Ohmsha & IOS Press (1994) [6] Vygotsky, L.S.: Thought and Language. MIT Press, Cambridge (1986) [7] Kaptelinin, V.: Computer Mediated Activity. In: Nardi, B. (ed.) Context and Consciousness. MIT Press, Cambridge (1996) [8] Wittgenstein, L.: The Blue and Brown Book. Harper & Row, NY (1958) [9] Kempson, R.M.: Semantic Theory. Cambridge University Press, UK (1977) [10] Austin, J.L.: How To Do Things With Words. Oxford University Press, NY (1965) [11] Grice, P.: Studies in the Way of Words. Harvard University Press, Cambridge (1989)
Multiscale Community Analysis of a Production Network of Firms in Japan Takashi Iino and Hiroshi Iyetomi
Abstract. We investigate a production network constructed by about 800 thousand firms in Japan through four million transaction relations with focus on its community structure. Communities detected by maximizing modularity often contain nodes with common properties such as characterized by regions or industry sectors. However, the modularity optimization approach suffers from the resolution limit problem; small but important communities tend to be combined into a single large group. To unfold such hidden structure, the community detection was reiterated within each of major communities. Then it was found that the communities were composed of well-defined subcommunities, some of which were separated into smaller groups with more specified regions or industry sectors. Furthermore, a new tool is proposed for measuring the strength of relations between those subcommunities. It is shown to be useful to elucidate the multiscale structure of the network.
1 Introduction Community detection is one of powerful tools for studying complex networks. In fact, real networks are prominent by their nonuniform structure in which nodes are divided into densely connected groups, called communities, and sparsely connected joint parts. Communities often contain nodes that have common behaviors or features in networks. For example, the World Wide Web is not a random network; websites may tend to make links to sites in the same category. To detect community structure, Newman proposed a quality function called modularity to evaluate the density of connections in groups for a given partition [1, 2, 3]. The idea of the modularity is based on quantifying statistically unforeseen arrangements of edges. Finding the division at the highest modularity value determines an optimum community structure in a network. Takashi Iino · Hiroshi Iyetomi Faculty of Science, Niigata University, Ikarashi, Niigata 950-2181, Japan e-mail: [email protected]
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 537–545. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
538
T. Iino and H. Iyetomi
In the present paper we investigate community structure of a production network in Japan consisting of about 800 thousand firms and four million transaction relations. The nodes and links in the production network correspond to firms and their mutual transaction relations. It is no exaggeration to say that the network is so large that it virtually covers the whole production activities in the nation. The analysis of the network may thus be able to give a new insight into the Japanese economy. Visualization and community detection were first applied to the partial network of manufactures based on the same data in [4]. Similar and more extensive study was then carried out for the whole production network in [5, 6]. The modularity optimization works well to demonstrate that the production network is quite nonuniform. However, it is well known that the modularity optimization approach has a resolution limit [7]. The modularity optimization often fails in identifying small but important subcommunities buried in a large community, so that the production network may have more detailed nonuniform structure. Here we thereby throw light on possible hidden structure in the network by repeating the community detection for each of the major communities that have been already extracted. Social and/or industrial organizations, which have hierarchical structures, may give rise to such a multiscale property of the network. For example, countries such as Japan have several regions and the regions are composed of several prefectures. Also we devise a metric based on reduction of information involved in the modularity to measure “distance” between subcommunities. It allows us to address how strongly those subcommunities are related to each other.
2 Method 2.1 Modularity Let us suppose that a network V is divided into L nonnull subsets {V1 ,V2 , · · · ,VL } which do not overlap each other. The modularity Q is then defined by L L Q = ∑ Qi = ∑ eii − a2i , i=1
(1)
i=1
where eii denotes the link density within subset Vi , and ai represents the fraction of links that connect into subset Vi . Therefore, ai a j corresponds to the expectation value of the link density between subset Vi and subset V j , and a2i gives the expectation value of eii for the uniform random null model. The term eii is canceled perfectly by a2i when the network is encircled by a single set or separated randomly into subsets. The symbol Qi = eii − a2i is the partial modularity of subset Vi , which is the contribution of subset Vi to modularity Q. When the link density of the division surpasses the expectation value of link density under the null model, the modularity becomes large value. For detection of communities, in general, approximate heuristic methods are used, including greedy agglomeration [2, 8, 9], simulated annealing [10, 11, 12]
Multiscale Community Analysis of a Production Network of Firms in Japan
539
modularity Q
communities
subcommunities
Q max whole network
0 maximizing Q
maximizing Q sub
Fig. 1 The outline of community and subcommunity detection. We first maximize Q: the modularity of the whole network. To extract subcommunities in communities, we then maximize Qsub : the modularity of subnetworks of communities. Although Q decreases with increasing Qsub , we obtain more detailed structures in communities.
and spectral methods [3]. Here we employ the bisection method worked out by ourselves in [5], which is a top-down method taking advantage of simulated annealing. Although the bisection method needs more computational time than the greedy method, it can detect communities of networks as large as the submillion production network in realistic time. On the other hand, the maximized modularity value by the bisection method surpasses that by the greedy method especially for large-scale networks.
2.2 Detection of Subcommunities We make subnetworks that consist of nodes in a same community by cutting off links between different communities. We then perform community analysis to these subnetworks by bisection method again. The subnetworks differ from the parts that appear in the process of bisection method. The bisection method recursively separates the network into parts but do not erase links between parts. If the bisection method cuts off links between parts, the modularity calculation do not correspond to the whole network. In general, the partial modularity Qi is not equal to the modular(i) ity Qsub of the subnetwork. The subnetworks made by cutting off inter community links are more separated by maximizing the modularity of subnetworks Qsub than the community structure of the whole network. The outline of community and subcommunity detection is shown in Fig. 1. The division by maximizing Qsub reduces the modularity of the whole network Q but has possibilities to unveil more detailed structure in communities.
2.3 Reduced Modularity Matrix The relational strength between the subcommunities is important to understand the nonuniform community structure. The modularity matrix is suitable to illustrate the
540
T. Iino and H. Iyetomi
strength of the connections from the viewpoint of modularity. The element of the modularity matrix Bi j is defined as Blm = Alm −
kl km , 2M
(2)
where Alm denotes an element of adjacent matrix, and kl denotes the number of links connected to node l called degree. We reduce the elements by communities as qi j =
1 ∑ ∑ Blm = ei j − aia j , 2M l∈V i m∈Vi
(3)
where ei j and ai are defined in Eq. (1). The fraction of the number of links between subsets i and j are represented by ei j . If the subsets i and j are selected randomly, the expected value of ei j is corresponding to ai a j . The trace of the modularity matrix is the modularity Q. The modularity Q is also represented as trace of the reduced modularity matrix: Q = ∑ qii = ∑ eii − a2i . (4) i
i
The off-diagonal element qi j of the reduced modularity matrix corresponds to increment Δ Q of the modularity when the communities i and j are merged:
Δ Q = qi j + q ji
(5)
If the network is undirected, the reduced modularity matrix is symmetric, thus Δ Q = 2qi j . The large qi j implies that the communities i and j are strongly related. The reduced modularity matrix of the communities that are obtained by maximizing modularity has positive diagonal and negative off-diagonal elements. If off-diagonal element qi j is positive, we can make Q larger by merging subsets Vi and V j . We define a distance between subcommunities i and j as di j = −qi j − min(−qi j )
(6)
because the distance should be opposite in sign to similarity and positive value. In this article, the reduced modularity matrix is used to evaluate strength of connections between subcommunities. The reduced modularity matrix used here corresponds to the modularity Q of the whole network, not Qsub of subnetworks. Although we cut off links between different communities to make subnetworks, the reduced modularity matrix is calculated all of the links in the whole network including inter community links. Therefore, the reduced modularity matrix of subcommunities has some positive off-diagonal elements because subcommunities are more separated than communities in the state of maximized modularity Q. We then obtain the strength of the connections between subnetworks in the whole network.
Multiscale Community Analysis of a Production Network of Firms in Japan (a)
(b)
541
(c)
Fig. 2 Visualization of the production network using a spring-electrical model [6] in a threedimensional space. Dots in the images represent firms, which have color coding to distinguish communities. The images (a) and (b) are drawn from different angles. The image (c) is a cross section of the slice surrounded by the vertical lines in (b).
3 The Production Network The data used in the present study was compiled by TOKYO SHOKO RESEARCH, LTD., which has been gathering information on firms through investigation of financial statements, corporate documents and hearing-based survey. To simplify our analysis, we treated the transaction relations formed by firms as an undirected network. Also we paid attention to the largest connected component of the network. It contains 773, 670 nodes and 3, 192, 582 links, which covers more than 99% of the whole data. The degree distribution P(k) follows a power law, i.e., P(k) ∝ k−γ . The power law index was determined as γ 2.32 by least squares fitting. Thus the production network possesses the scale-free property. The production network was visualized using a spring-electrical model in the previous paper [6]. The result is demonstrated in Fig. 2, where only nodes are displayed. The communities detected by the modularity optimization are also distinguished using color coding in the figure. We see that the network is highly nonuniform and separated into communities. Its community structure is well reflected in the visualized image as has been claimed in [5]. We should stress here that the self-organized network has multiscale structure; the communities themselves again consists of well-defined components.
4 Community Structure in the Production Network We extract 118 communities in the production network with the maximized modularity value 0.654. The distribution of community size has gap. There are less than 20 communities that have over 10, 000 nodes. On the other hand, about 90 communities consist of less than 20 nodes. Attributes of firms in top five large communities in the production network are shown in Table 1. We check the prefectures and industry sectors of firms in each
542
T. Iino and H. Iyetomi
Table 1 Attributes of firms in major communities in the production network. Top three major prefectures and industry sectors in each community are described. Decimals shown in parentheses represent the ratio of firms of corresponding attribute to community size. rank
size
prefecture (fraction)
industry (fraction)
1
88,840
Tokyo (0.189) Aichi (0.120) Osaka (0.110)
M-GM (0.144) W-ME (0.124) M-FM (0.105)
2
84,280
Niigata (0.117) Tokyo (0.094) Aichi (0.086)
C-GE (0.400) C-SP (0.228) C-EI (0.075)
3
78,529
Tokyo (0.110) Hokkaido (0.088) Aichi (0.055)
W-FB (0.262) M-FO (0.172) R-FB (0.137)
4
48,903
Fukuoka (0.298) Kagoshima (0.137) Kumamoto (0.134)
C-GE (0.359) C-SP (0.146) C-EI (0.124)
5
47,085
Aichi (0.135) Kanagawa (0.121) Tokyo (0.113)
C-GE (0.362) C-SP (0.205) W-BM (0.094)
Table 2 Abbreviation of industry sectors conformed to Japan Standard Industrial Classification defined by Statistics Bureau in Ministry of lnternal Affairs and Communications. abbreviation
Major groups
C-EI
equipment installation work construction work, general, including public and private construction work construction work by specialist contractor, except equipment installation work manufacture of fabricated metal products manufacture of food manufacture of general machinery retail trade (food and beverages) wholesale trade (building materials, minerals and metals, etc.) wholesale trade (food and beverages) wholesale trade (machinery and equipment)
C-GE C-SP M-FM M-FO M-GM R-FB W-BM W-FB W-ME
Divisions
construction
manufacturing
wholesale/retail trade
Multiscale Community Analysis of a Production Network of Firms in Japan
543
community. The industry sectors conformed to Japan Standard Industrial Classification are shown in Table 2, which list only terms used in this article. To simplify notation, the names of industry sectors are abbreviated by three capitals. The communities in the production network are well characterized by regions and industry sectors. The largest community consists of the firms of manufacture and wholesale trade of machinery. The second, fourth and fifth largest communities contain many constructions. Furthermore, the fourth largest community is strongly characterized as a geographical region; the ratio of Kyushu firms to all firms in the fourth largest community is more than 90 %. The third largest community consists of the firms of manufacture, wholesale trade and retail trade that deal with food.
5 Subcommunity Structures in the Communities Here, we analyze subcommunity structures to elucidate nonuniform structures in these communities. The results of the subcommunity analysis to the top five communities are shown in Table 3. The maximized modularity values of the subnetworks are either similar equaling or surpassing 0.654 which is the modularity of the whole network. It implies the communities have recognizable module structures similarly to the whole network. Especially, the second, fourth and fifth largest communities (the construction communities) have clear subcommunities in standpoint of the modularity. To elucidate characters of subcommunities, we examine attributes of firms in subcommunities in some typical communities. As an example, we show attributes of firms in major subcommunities in the second largest community (Table 4). This community is separated into some finely geographical regions. We then show the strength of the connections between the subcommunities by the reduced modularity matrix defined as Eq. (3). The reduced modularity matrix of the subcommunities in the second community is shown in Fig. 3. To understand easily, we represent this matrix by a dendrogram as shown in Fig. 4; it is the result of the cluster analysis for the distance defined as Eq. (6). Each Branch in the dendrogram corresponds to each subcommunity and is labeled by prefecture as a major ingredient. Figure 4 shows there are strongly relations between
Table 3 Results of the subcommunity analysis for the top five communities shown in Table 1 number of number of number of rank nodes links communities modularity 1 2 3 4 5
88,840 84,280 78,529 48,903 47,085
340,994 262,534 286,112 147,175 95,759
48 57 88 67 76
0.525 0.728 0.598 0.729 0.722
544
T. Iino and H. Iyetomi
Table 4 Attributes of firms in major subcommunities in the second largest community. size prefecture (fraction) industry (fraction) C-SP (0.349) C-GE (0.192) W-BM (0.067) C-GE (0.402) C-SP (0.207) C-EI (0.119)
Aichi (0.765) 3 7,978 Gifu (0.160) Mie (0.012)
C-GE (0.384) C-SP (0.290) W-BM (0.056)
Osaka (0.311) 4 7,215 Hyogo (0.284) Kyoto (0.160)
C-GE (0.525) C-SP (0.135) W-BM (0.067)
Tiba (0.236) 5 6,654 Tokyo (0.204) Kanagawa (0.090)
C-GE (0.490) C-SP (0.162) W-BM (0.065)
Shizuoka (0.946) 6 6,633 Kanagawa (0.011) Aichi (0.011)
C-GE (0.364) C-SP (0.288) C-EI (0.129)
Tochigi (0.615) 7 5,522 Gunma (0.162) Ibaraki (0.133)
C-GE (0.439) C-SP (0.195) C-EI (0.062)
Ishikawa (0.947) 8 4,485 Toyama (0.009) Osaka (0.007)
C-GE (0.420) C-SP (0.200) C-EI (0.122)
Fukui (0.955) 9 4,436 Ishikawa (0.008) Shiga (0.007)
C-GE (0.456) C-SP (0.150) C-EI (0.131)
Kanagawa (0.779) 10 3,976 Tokyo (0.126) Tiba (0.020)
C-GE (0.479) C-SP (0.181) W-BM (0.071)
8
0.000400
6
0.000300
4
0.000200
2
0.000100 0.000000
0 0
2
4
6
8
10
community
4: Kinki
1: Capital region & Osaka
6: Shizuoka
7: Northern Kanto
9: Fukui
3: Aichi & Gifu
2: Niigata
8: Ishikawa
0e+00
distance
3e−04
Fig. 3 The reduced modularity matrix of the subcommunities in the second largest community.
10: Kanagawa
Niigata (0.966) 2 9,941 Tokyo (0.010) Shizuoka (0.002)
0.000500
10
5: Southern Kanto
Tokyo (0.321) 1 14,135 Kanagawa (0.187) Osaka (0.107)
community
rank
Fig. 4 The dendrogram of the reduced modularity matrix shown in Fig. 3.
subcommunities characterized by geographically close prefectures. We examine dendrograms from other communities and obtain similar features. The geographical distance or industrial closeness of firms has an influence on the strength of the connections of subcommunities.
Multiscale Community Analysis of a Production Network of Firms in Japan
545
6 Conclusion The production network of submillion Japanese firms has many communities. Some of these communities are characterized by regions or industry sectors. It is plausible that the rough industrial structure in Japan is reflected in the communities of the production network. To elucidate the nonuniform structures in communities, we analyze subcommunities of the communities by the recursive community analysis. We suppose that small communities hide in large communities by the resolution limit of modularity optimization. We extract subcommunities from the communities with similar or larger modularity values compared with the value of the whole network. It implies the communities have recognizable module structures. These subcommunities are characterized by finely geographical regions. We evaluate the nonuniform strength of the connections between the subcommunities by the reduced modularity matrix, which cannot detect only separating into subcommunities. We confirm that subcommunities characterized by geographically close regions tend to have strong connections. The reduced modularity matrix is useful to get overview of the relation between the subcommunities.
Acknowledge We thank Hideaki Aoyama, Yoshi Fujiwara, Yuichi Ikeda and Wataru Souma for useful discussions on the subjects treated in this article. We also thank the Research Institute of Economy, Trade and Industry (RIETI) for having provided us with the data on transaction relationship between firms in Japan. The present study was supported in part by the Ministry of Education, Culture, Sports, Science and Technology, Grants-in-Aid for Scientific Research (B), Grant No. 20330060 (2008-10) and No. 22300080 (2010-12).
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Newman, M.E.J., Girvan, M.: Phys. Rev. E 69, 026113 (2004) Newman, M.E.J.: Phys. Rev. E 69, 066133 (2004) Newman, M.E.J.: PNAS 103, 8577 (2006) Fujiwara, Y., Aoyama, H.: Eur. Phys. J. B 77, 565 (2010) Iino, T., Kamehama, K., Iyetomi, H., Ikeda, Y., Ohnishi, T., Takayasu, H., Takayasu, M.: J. Phys.: Conf. Ser. 221, 012012 (2010) Kamehama, K., Iino, T., Iyetomi, H., Ikeda, Y., Ohnishi, T., Takayasu, H., Takayasu, M.: J. Phys.: Conf. Ser. 221, 012013 (2010) Fortunato, S., Barth´elemy, M.: PNAS 104, 36 (2007) Clauset, A., Newman, M.E.J., Moore, C.: Phys. Rev. E 70, 066111 (2004) Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: J. Stat. Mech., P10008 (2008) Guimer`a, R., Amaral, L.A.N.: Nature 433, 895 (2005) Medus, A., Acu˜na, G., Dorso, C.O.: Physica A 358, 593 (2005) Reichardt, J., Bornholdt, S.: Phys. Rev. E 74, 016110 (2006)
Notation-Support Method in Music Composition Based on Interval-Pitch Conversion Masanori Kanamaru, Koichi Hanaue, and Toyohide Watanabe
Abstract. With computer-assisted musical composition systems, composition work is becoming easier. However, it is still hard for beginners to externalize melodies efficiently because of the lack of their abilities to identify pitch. In order to externalize a melody, it is necessary to identify the pitch of each note. Although many beginners cannot identify the pitch of a note, they can cognize the interval between two notes. Hence, with supporting conversion of information of intervals into information of pitch, beginners can externalize melodies more easily. In this paper, we propose a notation-support method based on interval-pitch conversion. Intervals of notes are input as vertical distance between notes through a graphic tablet. Pitch of each note is calculated from its distance between fiducial notes, defined as their pitch by beginners. A method for supporting manual modification of pitch is also proposed. This method allows beginners to modify pitch of a note with a view to maintaining intervals of notes. Experimental results show that our method was appropriate for beginners who can recognize intervals.
1 Introduction With the development of information technologies, composition work is becoming easer owing to numerous computer-assisted composition systems. Additionally, contents sharing websites such as YouTube get people interested in composition. Masanori Kanamaru Graduate School of Information Science, Nagoya University, Japan e-mail: [email protected] Koichi Hanaue Graduate School of Information Science, Nagoya University, Japan e-mail: [email protected] Toyohide Watanabe Graduate School of Information Science, Nagoya University, Japan e-mail: [email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 547–556. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
548
M. Kanamaru, K. Hanaue, and T. Watanabe
Since existing systems are assumed to be used by composers who have certain level of music skills, it is not always easy for beginners of composition to compose music with them. Many systems of computer-assisted composition for beginners have been proposed. For example, the systems based on interactive Genetic Algorithm, can compose pieces of music reflecting users’ feeling[1]. However, those approaches cannot reflect a user’s originality effectively. Another approach to composition support is to provide beginners with an interface that allows them to express their ideas intuitively. Hyperscore makes repetition of motifs and constitution of harmony easy[2]. However, it cannot support constitution of motifs. Furthermore, Voice-to-MIDI system enables a user to input by singing[3]. However, some beginners cannot sing well because of the poor sense of pitch. In this paper, we propose a method for supporting externalization of the melody imagined in a composer’s mind. To externalize a melody, it is necessary to define the pitch and the length of each tone. Since this is difficult for beginners, we take an approach of receiving intervals from a user and converting them into pitch. By providing users with an interface that allows them to input intervals intuitively, it becomes easy for users to externalize the melody.
2 Approach 2.1 Framework In traditional composition systems, users externalize their melodies by the following process shown in Fig.1. In this process, first, users convert the intervals into the pitch in his/her mind, and then externalizes a melody in the form of a sequence of pitch. It is difficult for beginners to input the pitch into the systems. This is because it is difficult for them to identify the exact pitch from intervals because of the lack of
Fig. 1 Externalization in existing systems
Notation-Support Method in Music Composition
549
musical knowledge and experience. This is the difficulty in melody externalization with existing systems. In order to make it easy for beginners to externalize melodies, we focus on the conversion from intervals to pitch. It is easier for many people to express the intervals of a melody than to express the exact pitch of the notes. Therefore, we take an approach of receiving interval information from a user and then converting it into pitch information semi-automatically as illustrated in Fig.2. By providing a user with an interface that allows him/her to input interval information intuitively, it becomes easy for a user to externalize a melody.
2.2 Interval-Pitch Conversion Support An interval represents the difference between two notes and is denoted as a onedimensional value. An interval is expressed as high or low in natural language. Therefore, in our approach, we prepare an interface by which intervals are input as the sequence of the vertical positions in a two-dimensional space. Since it is assumed that expressed distance of an interval varies by individuals, pitch is calculated based on relative distances. Pitch of each note is calculated from the distance between fiducial notes, which are defined as the note whose pitch is specified by a user. On the premise that the relative distance between two notes reflects the exact interval, pitch of notes can be estimated from the pitch of two notes specified in advance. Compared with the methods which require inputting pitch of all notes by a user, our approach reduces the effort of interval-pitch conversion because this approach only requires inputting pitch of two notes. For intuitive input, we use a graphic tablet as input device. Rhythm of melody is expressed as tapping rhythm with the graphic tablet. In our approach, there is a possibility that the pitch of some notes are estimated incorrectly. In this case, users need to modify these pitches manually. We propose a method for supporting modification of pitch through manual modification of pitch.
Fig. 2 Externalization in a proposed system
550
M. Kanamaru, K. Hanaue, and T. Watanabe
It is assumed that interval errors are mostly due to extract length of distance between two nodes incorrectly. For efficient modification, relative relation among notes should be maintained. Our method allows users to modify pitch of notes with maintenance interval relation between notes.
3 Methods 3.1 Generation of Melodic Factor Fig.3 shows a mechanism of getting melodic factors with a graphic tablet. A melodic factor is a sequence of nodes. A node corresponds to a note in score, and has four properties; vertical position, start time, length, and dynamics. Vertical position is used to calculate the pitch of a note based on the relative distance between nodes. Start time and length are time point and time span of playing a corresponding note. Dynamics is intention of playing a note based on MIDI velocity that takes a value from 0 to 127. Melodic factor generation module converts input data through graphic tablet into a sequence of nodes. Fig.4 shows how to generate melodic factor from input data. Vertical coordinate (y0 ) and pressure (prs) of a pen are input into generate module constantly. A value of prs is normalized from 0 to 1. The generation module has two states: ON and OFF, indicating whether the pen is touched to the tablet. Behavior of generation module varies according to its state and the value of prs. When the state changes from ON to OFF, the module generates a new node. When the state changes from OFF to ON, generation module stores vertical coordinate(y0), pressure(prs) and current time point taken from the metronome module as y, Pmax and Ts , respectively. Pmax is the maximum value of prs while the state is ON. When the pen gets off the tablet, the generation module gets current time point(Te) from the metronome module, and calculates duration Δ T and dynamic vel with the following definitions. Δ T = Te − Ts . (1) vel = Pmax × 127.
(2)
Then, the generation module generates a new node and changes its state from ON to OFF.
Fig. 3 Generation of melodic factor
Notation-Support Method in Music Composition
551
Fig. 4 Flowchart of getting melodic factor
3.2 Interval-Pitch Conversion and Modification Support In order to get two fiducial notes, the system presents melodic factors to users visually like Fig.5 and prompts a user to specify the pitches of two notes. If a magnitude relation between vertical coordinates of the two nodes contradicts a high-low relation between the pitches of the two nodes, the inputs are judged invalid and prompts a user has to specify the pitches again. When two fiducial notes are selected, the pitches of the other nodes are estimated from vertical coordinates and assigned pitches of the fiducial notes. First, Δ d in terms of the number of half tones is calculated according to the following definition.
Δd =
|y f 1 − y f 2 | . |p1 − p2 |
(3)
Where, y f 1 and y f 2 are the vertical coordinates of the fiducial nodes. p1 and p2 are the pitches of the fiducial notes assigned by a user. Pitch is represented as MIDI
Fig. 5 Visualization of melodic factor
552
M. Kanamaru, K. Hanaue, and T. Watanabe
note number. Then, the pitches of the other nodes are estimated by the following definition. y−y p1 + Δ df 1 + 12 (|y − y f 1 | < |y − y f 2 |). p= (4) y−y p2 + Δ df 2 + 12 (otherwise). Where, p is an estimated pitch and y is the vertical coordinate of a node. This definition shows that the pitch is estimated from the vertical distance between a node and one of the fiducial nodes scaled by the vertical distance between fiducial nodes. When the pitches of all the notes are assigned, a user can play a melody and modify the pitch of a node by moving vertically. In order to support this phase, the pitches of notes are recalculated according to the user’s manipulation on nodes. When a user changes the vertical coordinate of a node manually, a system changes the vertical coordinates of other nodes to maintain the intervals between nodes. In this phase, a user is allowed to modify the notes other than the fiducial notes. The recalculation procedure of pitch varies according to whether fiducial nodes exist after the modified node. • In the case that there is no fiducial node between the modified node and the last node in melodic factor, the pitches of all nodes after the modified node are recalculated. The variation of the vertical coordinate of a node is equal to that of the modified node. • Otherwise, the pitches of the nodes which exist between the modified node and the fiducial node are recalculated. When the vertical coordinate of a node is yn , the variation of the vertical coordinate Δ yn is calculated from the previous coordinate y0 and the variation Δ y0 of the modified node and the vertical coordinate of the fiducial node yt with the following definition.
Δ yn =
Δ y0 × (yt − yn ). yt − y0
(5)
4 Evaluation 4.1 Prototype System We develop a prototype system for supporting melody externalization. We use C# on Windows for the development. We assume that a user manipulates the system using a graphic tablet interface. The melody externalized by a user is played on a MIDI device. The maximum length of a melody corresponds to 16 beats. Fig.6 illustrates the interface of the system. First, a user specifies the tempo of a melody. When the user presses the input start button, the system accepts the inputs from the user after 4 beats. The user inputs the melodic factor by tapping in an input area on the left side of the window. During this step, the user is able to listen to a clicking sound produced by the system. The system displays a note as a node in a display area on the window immediately. This allows
Notation-Support Method in Music Composition
553
Fig. 6 Prototype interface
the user to monitor his/her inputs appropriately. For each node, the color represents its dynamics. Green nodes represent weak sounds and red nodes represent strong sounds. This step is finished when the input start button is pressed or the time for the maximum length of a melody (16 beats) passes. After inputting the melodic factor, the user selects two nodes as fiducial nodes and specifies their pitches. When the user picks a node in an display area, the window for specifying a pitch appears. This window allows the user to refer to and specify pitch of the node. The node whose pitch is specified by the user becomes a fiducial node. The border of a fiducial node is drawn in blue. When the user presses the convert button, the system generates a melody by calculating the pitches of the other nodes. When the user presses the play button, the generated melody is played. The user can edit the melody by manipulating the nodes in a display area. In this step, the user re-specifies the pitch of a fiducial node and moves other nodes vertically in a display area. When the pitch of a fiducial node is re-specified, the pitches of all nodes are recalculated. When another node is moved, the pitches of other nodes are recalculated according to the procedure described in Section 3.2.
4.2 Experiment Description The purpose of the experiment is to prove the validity of our proposed methods. In order to evaluate the validity, it is necessary to confirm that the melody imagined by a subject is externalized accurately. This confirmation requires comparison between the melody in a subject’s mind and the one externalized with our system. Therefore, we conducted the experiment in the situation where the melody called to a subject’s mind can be observed. In the experiment, we asked subjects to listen to a melody until they memorized it and duplicate it with our prototype system. Target melodies (the melodies that the subjects listened to) were prepared in SMF format, and played
554
M. Kanamaru, K. Hanaue, and T. Watanabe
with the same tone as the one played in our prototype system. One trial begins with listening to a target melody, and ends with the externalization of the melody. We prepared three target melodies, and five subjects were asked to generate the same melodies for each target melody. We evaluated adequacy of the method by comparison of the target melodies with the externalized melodies. There is a case in which the externalized melody does not match with the target melody but match with the transposed target melody. In this case, a subject does not recognize the pitch correctly but recognizes the interval correctly. Therefore, we assume that the melody is externalized accurately when either the target melody or the transposed one matches with the externalized one. Moreover, we evaluate how accurately a subject recognizes intervals. In what follows, only trials in which the number of the notes in the externalized melody is the same as that of the target one are considered for evaluation. In order to compare the melodies, the target melody T and the externalized melody E are represented as the list of the pitches of the notes: T = (t1 , ...,tn ) and E = (e1 , ..., en ). To compare two melodies, we define the following three criteria. • Absolute agreement rate: A The absolute agreement rate A represents how many notes in the externalized melody are the same as the ones of the corresponding notes in the target melody with respect to pitch. A is calculated by the following definition. ∑n δ (ti , ei ) 1 (x = y), A(T, E) = i=1 , δ (x, y) = . (6) 0 (otherwise) n The value of A is between 0 and 1. When A is 1, two melodies are the same completely. • Relative agreement rate: R The relative agreement rate R is used to compare the externalized melody with the transposed target melody, and calculated with the following definition. R(T, E) = A(N(T ), N(E)), where N(X ) = (x1 − xm, x2 − xm , ..., xn − xm).
(7)
Here, xm is the median of the elements in X . R takes a value between 0 and 1. When R is 1, E matches with a transposed melody of T . • Interval sign agreement rate: S The interval sign agreement rate S is the agreement rate between T and E in terms of the direction of pitch transition, and calculated with the following definition. S(T, E) =
n−1 ∑i=1 f (ti ,ti+1 ,ei .ei+1 ) n−1
(n > 1),
⎧ ⎛ ⎞ x = y, a = b ⎪ ⎪ ⎨ ⎝ 1 x > y, a > b ⎠ where f (x, y, a, b) = .(8) x < y, a < b ⎪ ⎪ ⎩ 0 (otherwise)
Notation-Support Method in Music Composition
555
S takes a value between 0 and 1. When S is 1, all the directions from a note to its following note in E are the same as those in T . We define S calculated just after interval-pitch conversion as S0 . We can prove the validity of proposed methods if A or R is close to 1 with system support S0 is close to 1, namely a subject recognizes the intervals of a target melody correctly.
4.3 Experimental Results and Discussion We recorded operational logs and an externalized melody in each trial. There was a case in which by the subject D was not the same as that of the melody b. This is partly because the subject D could not recognize the melody b correctly. In what follows, we discuss 14 trials other than this one. Table1 shows S0 , Ae and Re . Ae and Re are the pitch agreement rates at the end of a trial. When subject reset the input or converted the input again in a trial, its result has multiple values of S0 . We can see that Ae or Re is close to 1 when S0 is close to 1. This result shows that externalization is performed accurately with our system, when a subject recognizes the target correctly. In trials of the subject A, the value Ae is low. This result shows there is a difference between the pitch recognized by the subject and that of the actual melody. However, the value S0 and Re are high. This means that the subject A recognizes melodies not with pitches of notes, but with intervals between notes. When the value Ne is low, the value S0 also tends to be low. After the experiment, we interviewed the subjects B and D to verify whether or not they recognized intervals correctly. Neither of them could not guess correctly whether a note is higher or
Table 1 Result of each trial Melody a Melody b S0 Subject A Re Ae S0 Subject B Re Ae S0 Subject C Re Ae S0 Subject D Re Ae S0 Subject E Re Ae
0.94 1 1 0.69 0.23 0 1 1 1 0.5 0.26 0.06 1 1 1
1 1 0 0.06 → 0.87 0.75 0.06 1 1 1
0.93 1 1
Melody c 0.8 → 0.93 0.81 0 0.67→ 0.67 0.13 0.06 1 1 1 0.47 → 0.4 → 0.67 0.13 0.13 1 0.86 0.86
556
M. Kanamaru, K. Hanaue, and T. Watanabe
lower than the previous note. The result agrees with the fact that S0 of their trials were low. This fact shows that supporting externalization by our proposed method is difficult when a subject cannot recognize intervals correctly.
5 Conclusion In this paper, we propose a notation-support method based on interval-pitch conversion. First, we mention the need for support of externalization for beginners. We pointed out that the difficulty of externalizing melodies for beginners comes from the difficulty of conversion from intervals to pitches. Therefore, we propose a method which covers user with interval-pitch conversion. We implemented the prototype system and validated our method in a situation where a subject externalized a melody that he/she memorized by listening to it. As the results of experiment, our approach is appropriate as a notation-support method for a subject who has have relative hearing. Although supportable users are limited, our method is valid for the beginners who have strong motivation for composition. For our future work, we have to consider a method for externalization of rhythm factor. Observation of the externalized melodies in the experiment revealed that input duration tends to be shorter. One of the reasons is that the subjects put the pen off the tablet involuntarily to get the timing of the next tap. One of the solutions to this problem is to adjust input values.
References 1. Unehara, M., Onisawa, T.: Interactive music composition system - Composition of 16bars musical work with a melody part and backing parts. In: The 2004 IEEE International Conference on Systems, Man & Cybernetics, pp. 5736–5741 (2004) 2. Farbood, M.M., Pasztor, E., Jennings, K.: Hyperscore: A graphical sketchpad for novice composers. Computer Graphic and Applications 24(1), 247–255 (2009) 3. Itou, N., Nishimoto, K.: A voice-to-MIDI system for singing melodies with lyrics. In: The International Conference on Advances in Computer Entertainment Technology, pp. 183–189 (2007)
Numerical Study of Random Correlation Matrices: Finite-Size Effects Yuta Arai, Kouichi Okunishi, and Hiroshi Iyetomi∗
Abstract. We report the numerical calculations of the distribution of maximal eigenvalue for various size of random correlation matrices. Such an extensive study enables us to work out empirical formulas for the average and standard deviation of the maximal eigenvalue, which are accurate in a wide range of parameters. As an application of those formulas, we propose a criterion to single out statistically meaningful correlations in the principal component analysis. The new criterion incorporates finite-size effects into the current method based on the random matrix theory, which gives the exact results in the infinite-size limit.
1 Introduction In the field of econophysics, random matrix theory(RMT) has been successfully combined with the principal component analysis of various economic data. The analytical forms of the eigenvalue distribution and its upper edge λ+ of random correlation matrix provide a very useful null model for extracting significant information on correlation structures in the data [1, 2, 3, 4, 5, 6, 7]. Here, we should note that the analytical results of RMT is basically obtained for the case where the matrix size is infinite. When the matrix size is finite, the eigenvalue of the random correlation matrices may have a certain distribution in the region beyond λ+ , which is called “finite-size effect” here. When the data size is sufficiently large, the finite-size effect can be neglected. In practical situations, however, time series data is not so long that the finite-size effects can be neglected. Taking account of the finite-size effect of Yuta Arai Graduate School of Science and Technology, Niigata University, Ikarashi, Niigata 950-2181, Japan Kouichi Okunishi, · Hiroshi Iyetomi Faculty of Science, Niigata University, Ikarashi, Niigata 950-2181, Japan e-mail: [email protected] ∗
Corresponding author.
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 557–565. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
558
Y. Arai, K. Okunishi, and H. Iyetomi
the random correlation matrix, therefore, we should improve the RMT criterion in principal component analysis of actual data. In this paper, we numerically analyze the finite-size effect of random correlation matrices. Especially, we focus on the average and standard deviation of the maximum eigenvalue distribution of random correlation matrices. In reference to the Tracy-Widom distribution of order one [8, 9, 10], we show that the average and standard deviation can be described by nontrivial power-law dependences on the matrix size N and the ratio Q ≡ T /N, where T is length of time series data. We then propose a new criterion to single out statistically meaningful correlations in the principal component analysis for a finite size data.
2 Maximal Eigenvalue for Random Correlation Matrix We begin with introducing a random matrix H which is an N × T matrix with elements {hi j ; i = 1, . . . , N; j = 1, . . . , T }. Its elements {hi j } are random variables following normal distribution N(0, 1) and hence mutually independent. The correlation matrix is then defined by C=
1 HHT . T
(1)
In the limit N, T → ∞ with Q ≡ T /N fixed, the probability density function ρ (λ ) of eigenvalue λ of the random correlation matrix C is analytically obtained as Q (λ+ − λ )(λ − λ− ) ρ (λ ) = , (2) 2π λ λ± = 1 + 1/Q ± 2 1/Q, (3) for λ ∈ [λ− , λ+ ], where λ− and λ+ are the minimum and maximum eigenvalues of C, respectively. The analytic result (2) is valid for N, T → ∞. However, for the case where the matrix size is finite, the maximal eigenvalue has broadening. We numerically generate random correlation matrices with Q = 2 for N = 20, 100, 400 and then calculate their maximal eigenvalues. For each N, the number of samples is 10000. Figure 1 shows the shape of the distribution of maximal eigenvalue. With increasing matrix size, the average of the maximal eigenvalue certainly approaches λ+ and broadening of the distribution becomes narrower. To incorporate these finite-size effects with the principal component analysis, we should precisely analyze features of the maximal eigenvalue distribution, namely, the dependence of the average and standard deviation of the maximal eigenvalue on Q and N. Precisely speaking, the distribution is slightly asymmetric around its center with a fatter right tail. Here, we should recall the analytical results [10] of the average and standard deviation for the maximal eigenvalue, which are very helpful for our following numerical analysis. The distribution of the maximal eigenvalue is known follow
Numerical Study of Random Correlation Matrices: Finite-Size Effects
559
Fig. 1 Distribution of the maximal eigenvalue at Q = 2 for various values of N.
asymptotically the Tracy-Widom distribution of order one with the center constant μ and scaling constant σ , where 1 T 1 σ = T
μ=
√
√ 2 T −1+ N , √ √ 1 1 1/3 √ T −1+ N +√ . T −1 N
(4) (5)
Expanding the right-hand side of Eq. (4) with respect to T and N with fixed Q, we obtain λ+ μ = λ+ − . (6) NQ Also we rewrite σ into a convenient form, 4/3 σ = N −2/3 Q−7/6 1 + Q .
(7)
And then the average lm and standard deviation σm of the maximal eigenvalue are given as lm = μ − 1.21σ , σm = 1.27σ ,
(8) (9)
where −1.21 and 1.27 are numerical results [9] for the average and standard deviation of the Tracy-Widom distribution. Equations (8) and (9) thus constitute a leading finite-size correction to the RMT prediction of the maximal eigenvalue.
560
Y. Arai, K. Okunishi, and H. Iyetomi
3 Numerical Results 3.1 Average of the Maximal Eigenvalue In the following, λm denotes the statistical average of the maximal eigenvalue obtained by numerical calculation for N and Q. The number of samples is 10000, for which the statistical error is negligible in our fitting arguments. We define f (N, Q) as f (N, Q) ≡ λ+ − λm .
(10)
We calculate f (N, Q) for N = 20, 30, 50, 70, 100, 200, 400 and Q = 1, 2, 3, 4, 5, 7, 10, 20, 30, which are shown by cross symbols in Fig. 2. Then, inspired by Eq. (6), we assume that f (N, Q) takes empirical form given by fe (N, Q) = aN b Qc .
(11)
The parameters a, b and c are determined by the least squares fitting to f (N, Q). Then we obtain fe (N, Q) = 5.67N −0.74Q−0.68 ,
(12)
which is compared with the numerical results in Fig. 2. We remark that the finitesize scaling law has nontrivial fractional exponents with respect to N and Q. The exponent of Q is very close to − 23 .
Fig. 2 Functional behavior of the empirical formula fe (N, Q) together with f (N, Q).
Figure 3 depicts how accurately λ+ − lm and fe (N, Q) can reproduce f (N, Q) at N = 20 and 400. Deviation of λ+ − lm from f (N, Q) decreases with increasing N. But the convergence of λ+ − lm to f (N, Q) is very slow. To evaluate accuracy of fe (N, Q) quantitatively, we calculate the absolute and relative errors defined as
Δ ≡ | fe (N, Q) − f (N, Q)|,
(13)
Numerical Study of Random Correlation Matrices: Finite-Size Effects
561
Fig. 3 The results of least squares fitting of f e and λ+ − lm . Left panel shows the results at N = 20 and right panel, those at N = 400.
δ≡
| fe (N, Q) − f (N, Q)| . λ+
(14)
For comparison, the same evaluation was done with λ+ − lm replacing fe in Eqs. (13) and (14). The comparison results are summarized in Table 1. We thus see that fe (N, Q) reproduces f (N, Q) quite accurately; the relative errors are well within 1% even for small N. Table 1 Comparison of accuracy of f e and λ+ − lm . Δ and Δ max refer to the average and maximum of Δ . The same notations are used for δ .
Δ Δmax δ δmax
f e (N, Q)
λ+ − lm
3.4 × 10−3
3.0 × 10−2 1.2 × 10−2 (Q = 1, N = 20) 1.3 × 10−2 3.4 × 10−2 (Q = 2, N = 20)
2.5 × 10−2 (Q = 1, N
= 50) 1.2 × 10−3 6.4 × 10−3 (Q = 1, N = 50)
3.2 Standard Deviation of the Maximal Eigenvalue The standard deviation of the maximal eigenvalue is calculated for the same combinations of N and Q as its average. We refer to the numerical results as g(N, Q), which are shown by cross symbols in Fig. 4. To obtain an empirical formula ge (N, Q) for the numerical results, we preserve the functional form of the scaling constant in Q as given by Eq. (7): ge (N, Q) = A(N)Q−7/6 (1 + Q)4/3 . (15)
562
Y. Arai, K. Okunishi, and H. Iyetomi
Fig. 4 Functional behavior of g(N, Q) and ge (N, Q).
The prefactor A is determined by the least squares fit to the original data for each N. Figure 5 shows ge (N, Q) so determined along with g(N, Q) and σm at N = 20 and 400. The convergence of σm is much faster than that of λ+ − lm . For N = 400, σm and ge (N, Q) reproduce g(N, Q) quite well. For N = 20, however, we observe σm deviates appreciably from g(N, Q). On the other hand, the empirical formula can rectify the deficiency of σm for such a small value of N.
Fig. 5 Accuracy of the least squares fitting for ge (N, Q) as a function of Q with given N. The Left panel shows the results at N = 20 and the right panel, those at N = 400.
We then determine the N-dependence of A. Recalling Eq. (7) again, we fit the fitted results for A to the form given by log A = log 1.27 −
N log N 2/3 . N − 3.10
(16)
Equation (15) with this formula recovers Eq. (9) in the limit of large N. Figure 6 shows the N-dependence of A. Equation (15) together with Eq. (16) yields ge (N, Q) in the entire (N,Q) plane, which is presented as lines in Fig. 4.
Numerical Study of Random Correlation Matrices: Finite-Size Effects
563
Fig. 6 The N-dependence of A.
To check accuracy of ge (N, Q) quantitatively, we evaluate Δ and δ for ge (N, Q). Note that the denominator in δ is g(N, Q), instead of λ+ in Eq. (14). The results are summarized in Table 2. Table 2 Comparison of accuracy of ge and σm . The same notations for Δ and δ as in Table 1 are used.
Δ Δmax δ δmax
ge (N, Q)
σm
2.9 × 10−3 1.3 × 10−2 (Q = 1, N = 20) 4.2 × 10−2 1.7 × 10−1 (Q = 30, N = 20)
1.4 × 10−2 1.5 × 10−1 (Q = 1, N = 20) 1.4 × 10−1 5.1 × 10−1 (Q = 1, N = 20)
4 Criterion for Principal Components Taking Account of Finite-Size Effect On the basis of the results in the previous sections, we propose a new criterion for principal component analysis. Let us write the eigenvalue of a correlation matrix obtained from finite size data as λ . So far, the criterion for principal component is given by λ > λ+ . However, the eigenvalue of the corresponding random correlation matrix have a certain distribution in the region larger than λ+ , which is quantified by fe (N, Q) and ge (N, Q). Adopting the confidence level of 3σ (99.7%), we propose the following new criterion:
λ > λnew (N, Q) ≡ λ+ − fe (N, Q) + 3ge (N, Q).
(17)
We infer that the eigenvectors associated with eigenvalues satisfying this criterion contain statistically meaningful information on the correlations. For reference, we also introduce an alternative criterion with the same confidence level as Eq. (17) using the asymptotic formulas (8) and (9):
564
Y. Arai, K. Okunishi, and H. Iyetomi
λ > λ+ ≡ lm + 3σm .
(18)
Table 3 compares λnew and λ+ with λ+ for various combinations of values of N and Q. The critical value λnew for the principal component analysis is appreciably larger than λ+ over the parametric range as covered in Table 3. If we adopted the criterion based on λ+ , we would underestimate the number of principal components significantly for such small N as N < ∼ 100 and small Q as Q ∼ 1. Table 3 Comparison of the criterion for the principal component analysis. Q
N
λ+
λ+
λnew
1 1 1 1 2 2 2 2 3 3 3 3
20 30 50 100 20 30 50 100 20 30 50 100
4 4 4 4 2.91 2.91 2.91 2.91 2.48 2.48 2.48 2.48
4.80 4.61 4.44 4.28 3.38 3.27 3.17 3.08 2.84 2.76 2.68 2.61
4.30 4.30 4.27 4.20 3.05 3.06 3.05 3.02 2.58 2.59 2.59 2.57
5 Summary We numerically studied the distribution of eigenvalues of random correlation matrices for various matrix sizes. In particular, we investigated the finite-size dependence of the average and standard deviation of the maximal eigenvalue distribution. The main results are summarized as follows. • The finite-size correction fe (N, Q) to the average, given by Eq. (12), has nontrivial power-law behavior in N and Q and reproduces the corresponding results obtained numerically quite well. • The standard deviation ge (N, Q), modeled by Eq. (15) with Eq. (16) is in good agreement with the original numerical results even for N as small as 20. As an application of these results, we finally proposed a new criterion to single out genuine correlations in the principal component analysis. This new criterion thus takes accurate account of the finite-size correction to the RMT prediction. The new criterion is especially useful when both N and Q are small (N < ∼ 100 and Q ∼ 1). On the other hand, the similar criterion based on the asymptotic formulas might dismiss an appreciable number of statistically meaningful principal components in the same condition for N and Q.
Numerical Study of Random Correlation Matrices: Finite-Size Effects
565
Acknowledgements. This work was partially supported by the Program for Promoting Methodological Innovation in Humanities and Social Sciences by Cross-Disciplinary Fusing of the Japan Society for the Promotion of Science and by the Ministry of Education, Science, Sports, and Culture, Grants-in-Aid for Scientific Research (B), Grant No. 22300080 (2010-12).
References 1. Laloux, L., Cizeau, P., Bouchaud, J.P., Potters, M.: Phys. Rev. Lett. 83, 1467 (1999) 2. Santhanam, M.S., Patra, P.K.: Phys. Rev. E 64, 016102 (2002) 3. Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A.N., Guhr, T., Stanley, H.E.: Phys. Rev. E 65, 066126 (2002) 4. Utsugi, A., Ino, K., Oshikawa, M.: Phys. Rev. E 70, 026110 (2004) 5. Kim, D.H., Jeong, H.: Phys. Rev. E 72, 046133 (2005) 6. Kulkarni, V., Deo, N.: Eur. Phys. J. B 60, 101 (2007) 7. Pan, R.K., Sinha, S.: Phys. Rev. E 76, 046116 (2007) 8. Tracy, C.A., Widom, H.: Comm. Math. Phys. 177, 727–754 (1996) 9. Tracy, C.A., Widom, H.: Calogero-Moser-Sutherland Models, ed. by van Diejen, J., Vinet, L., pp. 461–472. Springer, New York (2000) 10. Johnstone, I.M.: The Annals of Statistics 29, 295–327 (2001)
Predicting of the Short Term Wind Speed by Using a Real Valued Genetic Algorithm Based Least Squared Support Vector Machine Chi-Yo Huang, Bo-Yu Chiang, Shih-Yu Chang, Gwo-Hshiung Tzeng, and Chun-Chieh Tseng *
Abstract. The possible future energy shortage has become a very serious problem in the world. An alternative energy which can replace the limited reservation of fossil fuels will be very helpful. The wind has emerged as one of the fastest growing and most important alternative energy sources during the past decade. However, the most serious problem being faced by human beings in wind applications is the dependence on the volatility of the wind. To apply the wind power efficiently, predictions of the wind speed are very important. Thus, this paper aims to precisely predict the short term regional wind speed by using a real valued genetic algorithm Chi-Yo Huang Department of Industrial Education, National Taiwan Normal University No. 162, Hoping East Road I, Taipei 106, Taiwan e-mail: [email protected] *
Bo-Yu Chiang Institute of Communications Engineering, National Tsing Hua University No. 101, Sec. 2 , Guangfu Road, Hsinchu 300, Taiwan e-mail: [email protected] Shih-Yu Chang Institute of Communications Engineering, National Tsing Hua University No. 101, Sec. 2, Guangfu Road, Hsinchu 300, Taiwan e-mail: [email protected] Gwo-Hshiung Tzeng Department of Business and Entrepreneurial Administration, Kainan University No. 1, Kainan Road, Luchu, Taoyuan County 338, Taiwan Gwo-Hshiung Tzeng Institute of Management of Technology, National Chiao Tung University Ta-Hsuch Road, Hsinchu 300, Taiwan e-mail: [email protected] Chun-Chieh Tseng Nan-Pu Thermal Power Plant, Taiwan Power Company No. 5, Chenggong 2nd Rd., Qianzhen Dist., Kaohsiung City 806, Taiwan e-mail: [email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 567–575. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
568
C.-Y. Huang et al.
(RGA) based least squared support vector machine (LS-SVM). A dataset including the time, temperature, humidity, and the average regional wind speed being measured in a randomly selected date from a wind farm being located in Penghu, Taiwan was selected for verifying the forecast efficiency of the proposed RGA based LS-SVM. In this empirical study, prediction errors of the wind turbine speed are very limited. In the future, the proposed forecast mechanism can further be applied to the wind forecast problems based on various time spans. Keywords: wind power, wind speed forecasting, short term wind prediction, support vector machines (SVMs), genetic algorithm (GA), least squared support vector machine (LS-SVM).
1 Introduction The unbalanced supply of fossil fuels during the past years has aroused the human being’s anxiety of possible shortage or depletion of fossil fuels in the near future. Thus, people strived toward developing alternative energy sources like wind energy, tidal wave energy, solar energy, etc. The wind power is the fastest growing renewable energy (Mathew 2006) and has played a daily significant role in replacing the traditional fossil fuels. According to the statistics of the Global Wind Energy Council (GWEC), the global installed capacity increased from 23.9 gigawatts (GW) at the end of 2001 to 194.4 GW at the end of 2010 (Global Wind Energy Council 2011), at the compound annual growth rate of 23.3%. To fully benefit from a large fraction of wind energy in an electrical grid, it is therefore necessary to know in advance the electricity production generated by the wind (Landberg 1999). The prediction of wind power, along with load forecasting, permits scheduling the connection or disconnection of wind turbine or conventional generators, thus achieving low spinning reserve and optimal operating cost (Damousis et al. 2004). In order to achieve the highest possible prediction accuracy, the prediction methods should consider appropriate parameters and data that may indicate future trends (Mabel and Fernandez 2008). However, one of the largest problems of wind power, as compared to conventionally generated electricity, is its dependence on the volatility of the wind (Giebel et al. 2003). As observed by Sfetsos (2000), wind is often considered as one of the most difficult meteorological parameters to forecast because of the complex interactions between large scale forcing mechanisms such as the pressure and the temperature differences, the rotation of the earth, and local characteristics of the surface. The short-term prediction is a subclass of the wind power time prediction (in opposition to the wind power spatial prediction) with the time scales concerning short-term prediction to be in the order of some days (for the forecast horizon) and from minutes to hours (for the time-step). According to Costa et al. (2008), the short-term prediction of wind power aims to predict the wind farm output either directly or indirectly. The short-term prediction is mainly oriented to the spot (daily and intraday) market, system management and scheduling of some maintenance tasks (Costa et al. 2008).
Predicting of the Short Term Wind Speed by Using a RGA Based LS - SVM
569
To precisely predict the short term wind speed based on a data set consisting of the time, temperature, humidity, and regional average wind speed, a real valued genetic algorithm (RGA) based least squared support vector machine (LS-SVM) is proposed. The genetic algorithm (GA) has been widely and successfully applied to various optimization problems (Goldberg 1989). However, for the problems being solved by using the GA, the binary coding of the data always occupies the computer memory even though only a few bits are actually involved in the crossover and mutation operations (Wu et al. 2007). To overcome the inefficient occupation of the computer memory when using the GA, the real valued genetic algorithm (RGA) is proposed by Huang and Huang (1997). In contrast to the GA, the RGA uses a real value as a parameter of the chromosome in populations without performing coding and encoding process before calculates the fitness values individuals. Support vector machines (SVMs) are state-of-the-art tools for linear and nonlinear input–output knowledge discoveries (Vapnik 1998). SVMs were first devised for binary classification problems, and they were later extended for regression estimation problems (Vapnik 1998). The least squares support vector machine (LS-SVM) is a least squared version of the SVM. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) for classical SVMs. LS-SVMs classifiers, was proposed in Suykens and Vandewalle (1999). It reduced computing complexity, but still with the high accuracy, and increased the solving speed. An empirical study based on the real data set (Wu 2008) consisting of the time, temperature, humidity, and regional average wind speed being measured from a wind farm being located in Penghu, Taiwan will be provided for verifying the RGA based LS-SVM forecast mechanism. Based on the empirical study results, the wind speed can be precisely predicted. This research is organized as follows. The related literature regarding to wind power forecasting and renewable energy will be reviewed in Section 2. The RGA based LS-SVM forecast mechanism will be introduced in Section 3. A prediction of the wind speed by the RGA based LS-SVM forecast mechanism will be presented in Section 4. Discussions of the forecast results as well as future research possibilities will be presented in Section 5. Finally, the whole article will be concluded in Section 6.
2 Literature Review The advantages of the renewable energy (e.g. reduced reliance on imported supplies, reduced emissions of greenhouse and other polluting gases) have led countries around the world to provide support mechanisms for expanding renewable electricity generation capacity (Muñoz et al. 2007). Among the renewable energy sources, the wind is the fastest growing one (Mathew 2006) today and has played a daily significant role. The wind power generation depends on the wind speed while the wind speed can easily be influenced by obstacles and the terrain (Ma et al. 2009). A good wind power prediction technique can help develop well-functional hour-ahead or day-ahead markets while the market mechanism can be more appropriate to
570
C.-Y. Huang et al.
weather-driven resources (Wu and Hong 2007). Many methods have been developed to increase the wind speed prediction accuracy (Ma et al. 2009). The prediction methods can be divided into two categories. The first category of prediction methods introduces a lot of physical considerations to reach the best prediction results. The second category of prediction methods introduces the statistical method, e.g. the ARMA model, which aims at finding the statistical relationship of the measured times series data (Marciukaitis et al. 2008). Physical method has advantages in the long-term predictions while the statistical method does well in the short-term predictions (Ma et al. 2009). The time-scale classification of wind forecasting methods is vague and can be separated as follows according to the work by Soman et al. (2010): (1) very short-term predictions from few seconds to 30 minutes ahead, (2) short-term predictions from 30 minutes to 6 hours ahead, (3) medium-term predictions from 6 hours to 1 day ahead, and (4) long-term predictions from 1 day to 1 week ahead. The short-term wind power prediction is an extremely important field of research for the energy sector, as the system operators must handle an important amount of fluctuating power from the increasing installed wind power capacity (Catalão 2011). According to the summarization by Catalão et al (2011), new methods including data mining, artificial neural networks (NN), fuzzy logic, evolutionary algorithms, and some hybrid methods have emerged as modern approaches for short term wind predictions while the artificial-based models outperformed others.
3 Analytic Framework for the GA Based LS-SVM Method In this section, a GA based LS-SVM model will be presented. First, the optimal parameters in the LS-SVM will be determined by using the GA. Then, the data set will be predicted by introducing the optimal parameters into the SVM. he detail of the GA and the LS-SVM will be presented in the following subsections. To precisely establish a GA-based feature selection and parameter optimization system, the following main steps (as shown in Fig. 1) must be proceeded.
Fig. 1 The flow chart of the GA based LS-SVM
3.1 Genetic Algorithm The RGA is introduced for resolving the optimization problems by coding all of the corresponding parameters in a chromosome directly. The two parameters, c and σ , of the LS-SVM will be coded directly to from the chromosome in the RGA. The
Predicting of the Short Term Wind Speed by Using a RGA Based LS - SVM
571
chromosome x is represented as x = { p1, p2} , where p1 and p2 denote the regularization parameter c and sigma σ (the parameter of kernel function in LS-SVM), respectively. A fitness function for assessing the performance of each chromosome must be designed before starts to search optimal values of SVM parameters. In this study, the mean absolute percentage error (MAPE) will be used for measuring the fitness. n
The MAPE is defined as MAPE = 1 n ∑ (ai − fi ) ai × 100% , where ai and fi i =1
represent the actual and forecast values and n is the number of forecasting periods. The genetic operators in the GA include selection, crossover and mutation. The offspring of the existing population will be generated by using the operators. There are two well-known selection methods: the roulette wheel method and the tournament method. Users can determine which method is to be adopted in the simulation. After the selection, a chromosome survives to the next generation. Then, the chromosome will be placed in a matting pool for crossover and mutation operations. Once a pair of chromosomes can be selected for the crossover operation, one or more randomly selected positions will be assigned to the to-be-crossed chromosomes. The newly crossed chromosomes are then combined with the rest of the chromosomes to generate a new population. In this research, the method being proposed by Adewuya (1996) will be introduced to prevent the overload of post-crossover when the GA with real-valued chromosomes are applied. Let x1old = {x11, x12 ,…, x1n } and x2old = { x21, x22 ,… , x2 n } . Move closer: x1new = x1old + σ ( x1old − x2old ), x2new = x2old + σ ( x1old − x2old ). Move away: x1new = x1old + σ ( x2old − x1old ), x2new = x2old + σ ( x2old − x1old ). Here, x1old and x2old represent the pair of populations before the crossover operation while x1new and x2new represent the pair of new populations after the crossover operation. The mutation operation follows the crossover operation and determines whether a chromosome should be mutated in the next generation. In this study, uniform mutation method is applied and designed in the presented model. Consequently, researchers can select the method of mutation in GA-SVM best suited to their problems of interest. Uniform mutation can represent as following: x old = {x1, x2 ,…, xn } , xknew = lbk + r × (ubk − lbk )
x new = {x1, x2 ,…, xknew ,…, xn } where n denotes the , , number of parameters; r represents a random number in the range (0,1) , and k is the position of the mutation. lb and ub are the low and upper bounds on the parameters, respectively. lbk and ubk denote the lower and upper bound at the
location k . x old represents the population before mutation operation; x new represents the new population following mutation operation.
572
C.-Y. Huang et al.
3.2 SVM The SVM is a statistical learning theory based on the machine learning algorithm presented by Vapnik (2000). The SVM uses the linear model to implement nonlinear class boundaries through some nonlinear mapping of the input vector x into the high-dimensional feature space. A linear model being constructed in the new space can represent a nonlinear decision boundary in the original space. In the new space, an optimal separating hyperplane is constructed. Thus, the SVM is known as the algorithm that finds a special kind of linear model, the maximum margin hyperplane. The maximum margin hyperplane gives the maximum separation between the decision classes. The training data sets that are closest to the maximum margin hyperplane are called support vectors. All other training data sets are irrelative for defining the binary class boundaries. For the linear separable case, a hyperplane separating the binary decision classes in the three-attribute case can be represented as y = w0 + w1x1 + w2 x2 + w3 x3 , where
y is the outcome, xi are the attribute values, and there are four weights wi to be learned by the learning algorithm. The weights wi are parameters that determine the hyperplane. The maximum margin hyperplane can be represented as y =b+ α i yi x(i ) ⋅ x , where yi is the class value of training data sets x(i ) , ⋅
∑
represents the dot product. The vector x represented a test data set and the vectors x(i ) are the support vectors. In this equation, b and α i are parameters that determine the hyperplane. From the implementation point of view, finding the support vector and determining the parameters b and α i are equivalent to solving a linearly constrained quadratic programming. As mentioned above, the SVM constructs a linear model to implement nonlinear class boundaries through the transforming the inputs into the high-dimensional feature space. For the nonlinear separating case, a high-dimensional version of the equation is simply represented as y = b + α i yi K ( x (i ), x) .
∑
The function k ( x(i), x) is defined as the kernel function. Any function that meets Mercer’s condition can be used as the Kernel function, like polynomial, sigmoid, and Gaussian radial basis function (RBF) used in SVM. In this work, the RBF kernel is defined as K ( xi , x j ) = exp(− xi − x j
2
2σ 2 ) .
Where σ 2 denotes the variance of the Gaussian kernel. In addition, for the separable case, there is a lower bound 0 on the coefficient α i , for the non-separating case, SVM can be generalized by placing an upper bound c on the coefficients α i . Therefore, the c and σ of a SVM model is important to the accuracy of prediction. The learning algorithm for a non-linear classifier SVM follows the design of an optimal separating hyperplane in a feature space. The procedure is the same as the one being associated with hard and soft margin classifier SVMs in x-space. The
Predicting of the Short Term Wind Speed by Using a RGA Based LS - SVM l
dual Lagrangian in z-space is Ld (α i ) =
∑
αi −
i =1
1 2
573
l
∑y y αα z i j i
j i
T
z j and using the
i , j =1
chosen kernels. The Lagrangian is maximized as l
Max Ld (α i ) =
∑ i =1
αi −
1 2
l
∑ y y α α K(x , x ) i j i
j
i
j
i , j =1
s.t. α i ≥ 0, i = 1,…l , l
∑α y = 0. i i
i =1
Note the constraints must be revised for using in a non-linear soft margin classifier SVM. The only difference these constraints and those of the separable non-linear classifier are in the upper bound c on the Lagrange multipliers α i . Consequently, the constraints of the optimization problem become: s.t. c ≥ α i ≥ 0, i = 1,…l , l
∑α y = 0. i i
i =1
4 Short Term Wind Speed Predictions by Using GASVM In the following Section, the GA based LS-SVM model will be introduced for predicting the short term wind speeds. An empirical study based on the real hourly data from the work by Wu (2008) which was measured from a wind farm being located in Penghu, Taiwan will be used to verify the feasibility of the GA based LS-SVM in the real world short term wind speed prediction problem. Some factors like temperature, humidity, etc., will affect the wind speed. These factors were the input data sets while the output were predictions of wind speed in the simulation. The prediction results as well as errors versus the original data were demonstrated in Table 1 and Fig. 2.
5 Discussion According to the forecast results being demonstrated in Table 1 and Fig. 2, the short term wind speed can be predicted precisely with the average error around 2.27% by using the GA based LS-SVM. The error in PM22:00 and PM23:00 are huge since wind speed has very large variation in a day. Thus, to solve the problem about the imprecise prediction(s), the wind speed forecasting should be used in a specific period of time like daylight, night, etc. The wind speed predictions by using the SVM model with fixed parameters will enlarge forecast errors. The more factors which affect the wind speed are used as input data sets can further reduce the
574
C.-Y. Huang et al.
prediction errors. Thus, the accuracy of the forecasts can further be enhanced if some factors influencing the wind speed can be added as the inputs of the GA based LS-SVM. Finally, since the prediction errors are low, the proposed forecast mechanism can be used in predicting the short term wind speeds in any wind farm. Table 1 The prediction results and errors Time(Hour) Original Data (m/s) Prediction Data (m/s) Error (%)
0 17.60 17.57 0.17
1 15.90 15.79 0.70
2 15.50 15.85 2.26
3 16.00 16.01 0.10
4 16.00 15.70 1.88
5 15.10 15.16 0.40
6 14.50 14.48 0.14
7 8 13.30 12.60 13.40 12.38 0.75 1.75
Time(Hour) Original Data (m/s) Prediction Data (m/s) Error (%)
12 9.70 9.73 0.31
13 9.10 8.92 1.98
14 9.00 8.91 1.00
15 8.90 8.89 0.11
16 8.30 8.26 0.48
17 7.20 7.31 1.53
18 6.60 6.56 0.61
19 6.10 6.11 0.16
20 6.00 6.01 0.17
9 12.00 11.97 0.25
21 5.10 5.30 3.92
10 10.50 10.44 0.57
22 4.80 5.52 15.00
11 9.50 9.95 4.74
23 5.00 5.77 15.40
Fig. 2 The original versus the prediction results
6 Conclusions The wind power is one of the fastest growing widely used alternative energy. Efficient wind forecasting methods are very helpful for wind energy integration into the electricity energy grid. In the past, various wind forecasting methods have been developed for the short term wind prediction problems. In the future, an introduction of factors influencing the wind speed including the temperature, humidity, as well as other seasonal factors into the GA based LS-SVM will be very helpful for demonstrating the feasibility of the application of this method in the real world.
References Adewuya, A.A.: New Methods in Genetic Search with Real-Valued Chromosomes: Massachusetts Institute of Technology, Dept. of Mechanical Engineering (1996) Carolin Mabel, M., Fernandez, E.: Analysis of wind power generation and prediction using ANN: A case study. Renewable Energy 33(5), 986–992 (2008)
Predicting of the Short Term Wind Speed by Using a RGA Based LS - SVM
575
Catalão, J.P.S., Pousinho, H.M.I., Mendes, V.M.F.: Short-term wind power forecasting in Portugal by neural networks and wavelet transform. Renewable Energy 36(4), 1245–1251 (2011) Costa, A., Crespo, A., Navarro, J., Lizcano, G., Madsen, H., Feitosa, E.: A review on the young history of the wind power short-term prediction. Renewable and Sustainable Energy Reviews 12(6), 1725–1744 (2008) Damousis, I.G., Alexiadis, M.C., Theocharis, J.B., Dokopoulos, P.S.: A fuzzy model for wind speed prediction and power generation in wind parks using spatial correlation. IEEE Transactions on Energy Conversion 19(2), 352–361 (2004) Giebel, G., Landberg, L., Kariniotakis, G., Brownsword, R.: State-of-the-art on methods and software tools for short-term prediction of wind energy production. In: European Wind Energy Conference, Madrid (2003) Global Wind Energy Council. Global wind capacity increases by 22% in 2010 - Asia leads growth (2011) Goldberg, D.E.: Genetic algorithms in search, optimization, and machine learning. Addison-Wesley Pub. Co., Reading (1989) Huang, Y.-P., Huang, C.-H.: Real-valued genetic algorithms for fuzzy grey prediction system. Fuzzy Sets and Systems 87(3), 265–276 (1997), doi:10.1016/s0165-0114(96)00011-5 Landberg, L.: Short-term prediction of the power production from wind farms. Journal of Wind Engineering and Industrial Aerodynamics 80(1-2), 207–220 (1999) Lerner, J., Grundmeyer, M., Garvert, M.: The importance of wind forecasting. Renewable Energy Focus 10(2), 64–66 (2009) Lorenz, E., Hurka, J., Heinemann, D., Beyer, H.G.: Irradiance Forecasting for the Power Prediction of Grid-Connected Photovoltaic Systems. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2(1) (2009) Ma, L., Luan, S., Jiang, C., Liu, H., Zhang, Y.: A review on the forecasting of wind speed and generated power. Renewable and Sustainable Energy Reviews 13, 915–920 (2009) Marciukaitis, M., Katinas, V., Kavaliauskas, A.: Wind power usage and prediction prospects in Lithuania. Renewable and Sustainable Energy Reviews 12, 265–277 (2008) Mathew, S.: Wind energy: fundamentals, resource analysis and economics. Springer, Heidelberg (2006) Muñoz, M., Oschmann, V., David Tàbara, J.: Harmonization of renewable electricity feed-in laws in the European Union. Energy Policy 35(5), 3104–3114 (2007) Sfetsos, A.: A comparison of various forecasting techniques applied to mean hourly wind speed time series. Renewable Energy 21(1), 23–35 (2000) Soman, S.S., Zareipour, H., Malik, O., Mandal, P.: 2010. A review of wind power and wind speed forecasting methods with different time horizons. In: North American Power Symposium, NAPS (2010) Suykens, J.A.K., Vandewalle, J.: Least Squares Support Vector Machine Classifiers. Neural Processing Letters 9(3), 293–300 (1999) Vapnik, V.N.: Statistical learning theory. Wiley, Chichester (1998) Vapnik, V.N.: The nature of statistical learning theory. Springer, Heidelberg (2000) Wu, C.-H., Tzeng, G.-H., Goo, Y.-J., Fang, W.-C.: A real-valued genetic algorithm to optimize the parameters of support vector machine for predicting bankruptcy. Expert Systems with Applications 32(2), 397–408 (2007) Wu, M.T.: The Application of Artificial Neural Network to Wind Speed and Generation Forecasting of Wind Power System. Department of Electrical Engineering, National Kaohsiung University of Application Sciences, Kaohsiung (2008) Wu, Y.K., Hong, J.S.: A literature review of wind forecasting technology in the world. In: Power Tech. IEEE Lausanne (2007)
Selecting English Multiple-Choice Cloze Questions Based on Difficulty-Based Features Tomoko Kojiri, Yuki Watanabe, and Toyohide Watanabe
*
Abstract. English multiple-choice cloze questions require learners of various grammatical and lexical knowledge. Since the knowledge of learners is different, it is difficult to provide appropriate questions suitable for learners’ understanding levels. This research determines features that affect to difficulties of questions and proposes the method for selecting questions according to the features for the stepwise learning. In order to manage the relations among questions, a question network is introduced in which questions are structured based on differences of each feature. Questions are selected by following appropriate links according the learners’ answers. By following this question network, learners are able to tackle questions from easier one to difficult one according to their understanding levels.
1
Introduction
Multiple-choice cloze questions are often used in English learning. Such type of question is effective for checking the knowledge of grammar and lexicon and thus it is used in TOEIC or TOEFL. In addition, by tackling these questions repeatedly, the knowledge of English grammar and lexicon is able to be acquired. Only limited number of knowledge is included in one question, many questions need to be solved for the purpose of acquiring whole grammar and lexicon knowledge. However, it is difficult to select questions that are appropriate for individual learners’ understanding levels. If all knowledge in the question has been already acquired, a learner cannot acquire new knowledge from it. If all knowledge is new to a learner, it may be difficult for him to understand plural knowledge from one question. Appropriate questions for learners should contain some acquired knowledge and a few in-acquired one. Tomoko Kojiri Faculty of Engineering Science, Kansai University 3-3-35 Yamate-cho, Suita, Osaka, 564-8680, Japan e-mail: [email protected] Yuki Watanabe · Toyohide Watanabe Graduate School of Information Science, Nagoya University Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 577–587. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
578
T. Kojiri, Y. Watanabe, and T. Watanabe
Intelligent Tutoring System (ITS) which provides learning contents that fit for learners’ understanding levels has been developed for different learning domain. Suganuma et al. developed the system that estimates dynamically difficulties of exercises and learners’ understanding levels according to the learners’ answers, and provides exercises based on the estimated understanding levels [1]. This system assigned difficulties of exercises based on the learners’ answers. However, the reasons for the incorrect answer may be different among learners. The incorrect knowledge should be evaluated in detail. Since English multiple-choice cloze questions need various knowledge to solve and it is difficult to find out which knowledge is used to derive the correct answer, to specify correctly acquired/inacquired knowledge from learners’ answers may be difficult. We have constructed the knowledge network that arranges multiple-choice cloze questions according to the quantity of their grammatical knowledge [2]. This research assumed that the difficulties of questions become high according to the number of their grammatical knowledge. By solving questions along the knowledge network, English grammar could be learned from a simple question to a complicate one. However, this knowledge network considers only grammatical knowledge and is not able to cope with all types of multiple-choice cloze questions. Other features of questions are needed to characterize questions. This paper proposes features of multiple-choice cloze questions that affect to difficulties of questions (difficulty-based feature). Then, it proposes a method for providing questions based on the features for the stepwise learning. In order to represent the relations among questions, a question network is introduced that structures all questions based on difficulty levels for each difficulty-based feature. In the question network, questions that situate in the same levels for all difficulty-based features form one node, and nodes whose levels are next to each other are connected by links. By following this question network, learners are able to tackle questions from easier one to more difficult one according to their understanding levels. In our system, in order to estimate correctly the learners’ understanding levels in the ongoing learning process, learners are required to solve plural questions at one learning process. In addition, questions are selected not from one node but from several nodes that are similar to the learners’ understanding levels (solvable nodes). Questions in solvable nodes have larger possibilities to be solved by learners and they are determined after each learning process.
2 2.1
Approach Difficulty-Based Features of English Multiple-Choice Cloze Question
English multiple-choice cloze questions require learners to understand the meaning of English sentences and grammatical structure, and to select word/s for a blank part that forms the correct sentence from choices. Figure 1 is an example of English multiple-choice cloze questions. A question consists of sentence, blank part, and choices. Choices consist of one correct choice and three distracters. Learners select one from choices for filling in the blank part.
Selecting English Multiple-Choice Cloze Questions
✓ 1 Question Sentence
579
✓ 2 Blank Part
The company's advertisements always look (
).
1) beauties 2) beautiful 3) beautifully 4) beauty
✓ 3 Choices (Correct choice and distracters) Fig. 1 Example of multiple-choice cloze question
There are various definitions or findings about difficulty features of English questions. Kunichika et al. defined difficulty features of English reading questions as difficulties of understanding of original texts, understanding of question sentences, and understanding of answer sentences [3]. This paper follows this definition and defines three difficulty-based features of English multiple-choice questions illustrated. Understanding of original texts for English reading question corresponds to understanding of sentence for multiple-choice questions. Understanding of question sentences is regarded as understanding of intention of a question, so understanding of blank part corresponds to this feature. Understanding of answer sentence is regarded as similar to the understanding of differences among choices. − 1) Difficulty of sentence Especially for questions that ask words, it is important to grasp the meaning of a sentence correctly. Readability is one of the features that prevent learners of understanding the meaning easily. Researches about readability of English sentences insisted that lengths of sentences or difficulties of words affect to the readability [4]. Based on this result, lengths of sentence and difficulties of words are defined as one of the difficulty-based features of a sentence. 2) Difficulty of blank part A blank part indicates required knowledge to answer the question. In some questions, answers in the blank part can be estimated, most of which ask grammatical knowledge such as word class. Therefore, the difficulty of the blank part depends on which word class is asked. 3) Difficulty of choices There are various relations between distracters and a correct choice. One distracter may belong to the same word class as the correct choice and another one may have the same prefix as the correct choice. The difference between choices needs to be grasped in selecting the correct answer. We adopt types of distracters defined in [5]. The types of distracters represented differences between the correct answer and the distracters. 12 types of distracters exist, which were derived by analyzing
580
T. Kojiri, Y. Watanabe, and T. Watanabe
existing questions statistically. Questions become more difficult if similar types of distracters exist in it. Therefore, the number of distracter types in choices is defined as a difficulty-based feature.
2.2
Question Network
In learning with multiple-choice cloze questions, it is desirable that learners acquire knowledge step by step according to their understanding levels. If the difficulty of a question is determined without considering the knowledge included in the question, to support learners to acquire knowledge may become difficult. In our research, levels of difficulty-based features are assigned to each question. Such difficulty-based features are more related to the knowledge used to solve the question, but it is still acquired from the features of questions. By determining the level of learners for each difficulty-based features and selecting questions, learners are able to solve questions that are appropriate for their levels.
…
Adverb
Pronoun
Noun
Verb
問題 Most difficult questions
問題
問題 問題
…
問題 Difficulty-based feature 1 Difficulty-based feature 2 Difficulty-based feature 3
問題
問題
問題
問題
問題
Easiest questions
問題 問題
Questions
問題
問題
Fig. 2 Question Network
In order to provide questions along levels of difficulty-based features, questions need to be organized according to the levels. In this paper, a question network is introduced that structures questions along the levels of each feature. Nodes in the question network contain the questions of the same levels for all difficulty-based features. Nodes whose levels are next to each other are linked by directed links. Since the difficulty order cannot be defined uniquely for the word class of the blank part, links based on word class are not attached. Instead, question networks
Selecting English Multiple-Choice Cloze Questions
581
are constructed for all kinds of word classes. Figure 2 illustrates the question network. Nodes without incoming links correspond to the easiest questions. Nodes without outgoing links have the most difficult questions.
問題 問題
Solvable nodes for difficulty-based feature 1
問題 問題
Questions
問題 問題 Solvable nodes for difficulty-based feature 2
問題
問題 問題
問題
Difficulty-based feature 1 Difficulty-based feature 2 Difficulty-based feature 3
Solvable nodes for difficulty-based feature 3
Fig. 3 Selecting solvable nodes
Learners’ current understanding levels are grasped by their answers and current their nodes are determined. If learners’ levels are increased to the next level, current nodes of learners in the question network are changed by following the link of the corresponding feature. The senses of difficulty for difficulty-based features vary for each learner. Some learners may feel a feature is critical, but others may not. If a learner does not feel the feature is difficult, learners move to the higher levels quickly, since it is a waste of time to follow the link one by one in such case. In order to determine the correct level of learners, several questions from several nodes that are estimated to be solvable are provided. The solvable nodes for each difficulty-based feature are estimated based on the learners’ answers. Figure 3 is an example of selecting questions from solvable nodes. In this figure, two nodes are solvable for the feature 1, while only one node for the features 2 and 3.
3 3.1
Mechanism for Selecting Questions Construction of Question Network
Questions that have the same levels for all features are gathered as one node of the question network. Two nodes whose levels of one difficulty-based feature differs only one are connected by a link. Figure 4 shows a part of a question network.
582
T. Kojiri, Y. Watanabe, and T. Watanabe
This research defines length of sentence, difficulty of words, and the number of distracter types as the difficulty-based features. Followings are the method for acquiring these features from questions. z
z
z
Length of sentence The number of words is regarded as one of the viewpoints of defining the length of sentence. Based on the analysis of 1500 questions in the database of our laboratory, it is revealed that sentences consist of 4 to 32 words. Thus, we categorize the length of sentence into four levels according the number of words. Table 1 shows the levels of the length of sentence. Difficulty of words In this research, the difficulty of words followed SVL12000[6], which is the list of word difficulties defined by ALC. In SVL12000, 12000 words that are useful for Japanese are selected and classified into 5 levels of difficulty. In addition to these five levels, the most difficult level 6 is attached to the words that are not in the list. The level of a question is defined as the highest level in all words including the sentence and choices. The number of distracter type Distracter types are attached to the questions in the database of our laboratory. People who achieved more than 700 points in TOEIC are asked to attach the distracter types based on the definition in [5]. Since choices of the same distracter types may be more difficult than that of the different one, the difficulty based on the number of distracter type is set as Table 2 Length of sentence:3 Difficulty of word:3 Number of distracter type:1
Length of sentence:2 Difficulty of word:4
Length of sentence becomes larger. Words becomes difficult.
Levels of difficulty-based features Length of sentence:2 Difficulty of word:3
Number of distracter type:1
Number of distracter type:1 Length of sentence:2 Difficulty of word:3 Number of distracter type:2
Fig. 4 Example of attaching links
Level based on the number of distracter type becomes higher.
Selecting English Multiple-Choice Cloze Questions
583
Table 1 Levels of length of sentence
Level # of words
1 <11
2 12-18
3 19-25
4 >26
Table 2 Levels based on the number of distracter type
Level # of distracter types
3.2
1 3
2 2
3 1
Selection of Questions
In the begging of the learning process, the start node which fits for the learner’s initial understanding level is estimated. The start node is calculated by the result of the pretest. Let θi be the level of difficulty-based feature i of a learner. θi is calculated by the following formula. ∑
,
,
,
(1)
where bi,j represents the level j of difficulty-based feature i, ni is the number of levels in i and Pi,j indicates the ratio of correctly answered questions whose levels are j of the feature i. By deriving average of levels and ratio of correctly answered questions, average level of the feature i can be derived. Questions are selected from several solvable nodes. More questions should be selected from nodes that are nearer to the learner’s current node. The probabilities for selecting questions for each node i is calculated by the following formula. ,
√
(2)
li is the number of the links from the learner’s current node to the node i. S(li) follows the normal distribution and βis a normalization factor. According to the probability for each node, questions are selected from the node. In the learning phase, learners solve several questions. After the learning has been finished, learners’ levels for each difficulty-based feature are re-calculated. Learners’ levels for the difficulty-based feature i after t–th learning, such asθi,t is calculated by the following formula. ,
,
|
|
∑
,
,
(3)
Average difference of solved questions’ levels and the current level is added to the current level for each difficulty-based feature. Qt is the set of questions that are posed in the t–th learning and |Qt| represents the number of the questions. Q’t is the set of the questions that learners answered correctly at the t–th learning. bi,q is the level of difficulty-based feature i of question q, and α is the ratio of correctly answered questions for judging the accomplishment of the node. If the value ofθi,t is bigger than that of the current node, the next node by following the link of the difficulty-based feature i until the level of the current node becomes bigger thanθi,t is selected as the current node.
584
T. Kojiri, Y. Watanabe, and T. Watanabe
The solvable nodes for the learner are also re-calculated after each learning. The number of solvable nodes becomes large if the learner solved questions in farther node, while it becomes small if the learner only could solve the questions in the nearer nodes. The distance of solvable nodes from the current node is calculated by the following formula. ,
,
|
|
∑
,
,
,
(4)
This equation adds the certain number of links from the current node which is derived by subtracting the number of links to the incorrectly answered nodes from that of correctly answered ones. li,q represents the number of links from the current node to the node that question q belongs to for the difficulty-based feature i. wq corresponds to the correctness of the question q; wq is 1 if the answer to the question q is correct and -1 if the answer is incorrect. The initial value of ri,t are set as 1, which means only the next node is solvable. ri,t is set to 1 if the calculated ri,t becomes smaller than 1.
4
Prototype System
We have developed the web-based system based on the proposed method. When the learning starts, the selected questions are shown in the web page as shown in Figure 5. Currently, 10 questions are selected in one learning process. Learners answer the questions by selecting the radio buttons of the correct choice. If learners cannot understand the answer, they can check the checkbox which says “I could not solve the question”. If this checkbox is checked, the answer of this question is not considered as learners’ answers. After learners select all answers and push the send button, their answers are evaluated, and the result and explanation are displayed.
Sentence Choices
Fig. 5 Interface of prototype system
Checkbox: “I could not solve question”
Selecting English Multiple-Choice Cloze Questions
5
585
Experimental Result
Experiments were conducted using the prototype system. This experiment focuses on the question network of verb words. First, examinees were asked to solve a pretest which consists of 20 questions and examinees’ initial levels were calculated based on the result of the pretest. The questions were carefully prepared by authors to include all levels of difficulty-based features as the equal ratio. In the learning phase, they were asked to answer 10 questions for 10 times. In the experiment, α in Equation 3 is set as 0.7, which means learners were regarded to accomplish the node if they could answer more than 70 percent of the posed questions. As the counter methods, we have prepared following two methods: z z
Random link selection method (RLSM) which selects links randomly in selecting nodes in the question network, Random question posing method (RQPM) which selects questions randomly from the database.
In RLSM, the movement of the node occurs when the learner solved correctly 70 percent of the questions in the node. The examinees were 12 members in our laboratory and 4 of them were assigned for each method; proposed method, RLSM, and RQPM. The correct questions in each learning were evaluated. Table 3 is the average number of correct questions and its variance for each learning. The average numbers are almost the same for all three methods. However, the variance of our method is the smallest of the three. This indicates that the number of correctly answered questions is almost the same for every learning. This result shows that our method could provide questions whose levels are similar to the learners, even the understanding levels of learners change during the 10th learning. Table 3 Result of learning phase
Proposed method RLSM RQPM
Average # of correct questions 5.725 5.850 5.825
Variance of # of correct questions 1.585 2.057 2.665
The questionnaire result for acquiring the consciousness of examinees for the proposed questions is shown in Table 4. In each questionnaire item, 5 is the best and 1 is the worst. Items 1 and 2 got high values. Based on the result of item 1, examinees felt questions become difficult as the learning proceeded. Based on the item 2, they also felt that words were getting more difficult. Table 5 shows the number of links that examinees who use the prototype system with proposed method followed during the learning. All examinees follow links of difficulty of words more than 2 times. For the item 3, examinee who answered 4 followed the link of the length of sentence more than 2 times, and examinees who answered 3
586
T. Kojiri, Y. Watanabe, and T. Watanabe
followed the link only once. The worst result of item 4 may be caused by the small number of following links based on the number of distracter type. Based on the result, if links are followed, learner can feel the difficulties of questions. Therefore, questions are arranged appropriately by its difficulties in the question network. Table 4 Questionnaire result
1 2 3 4
Contents Did the questions become difficult? Did the words in questions become difficult? Did the question sentences become difficult? Did the distracters become difficult?
Average value 4.00 4.00 3.50 2.75
Table 5 # of links that examinees followed
Difficulty words Examinee 1 Examinee 2 Examinee 3 Examinee 4
6
of
Length of sentence
3 2 2 3
1 2 3 1
The number distracter type
of 1 0 1 0
Conclusion
In this paper, the method for posing questions step by step according to the difficulty-based features was proposed. Based on the experimental result, defined features are intuitive and match to learners’ consciousness. In addition, using the question network which arranges questions according to the levels of difficultybased features, questions that fit for learners’ levels were able to be selected in spite of change of learner’s situation during the learning. Currently, three difficulty-based features have been prepared. However, there are still several other features in questions, such as grammatical structure. Thus, for our future work, to investigate other features of questions is necessary if they become difficulty-based features or not. Moreover, our system only provides explanation for learner’s answers to the question and does not support learners of acquiring the knowledge. If questions run out before learners can make correct answers for the certain amount of questions, they cannot proceed the learning process. Therefore, we have to provide the support tool that teaches necessary knowledge to learner.
References 1. Suganuma, A., Mine, T., Shoudai, T.: Automatic Generating Appropriate Exercises Based on Dynamic Evaluating both Students’ and Questions’ Levels. In: Proc. of EDMEDIA, CD-ROM (2002)
Selecting English Multiple-Choice Cloze Questions
587
2. Goto, T., Kojiri, T., Watanabe, T., Yamada, T., Iwata, T.: English grammar learning system based on knowledge network of fill-in-the-blank exercises. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part III. LNCS (LNAI), vol. 5179, pp. 588–595. Springer, Heidelberg (2008) 3. Kunichika, H., Urushima, M., Hirashima, T., Takeuchi, A.: A Computational Method of Complexity of Questions on Contents of English Sentences and its Evaluation. In: Proc. of ICCE 2002, pp. 97–101 (2002) 4. Kate, R.J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R.J., Roukos, S., Welty, C.: Learning to Predict Readability Using Diverse Linguistic Features. In: Proc. of ICCL 2010, pp. 546–554 (2010) 5. Goto, T., Kojiri, T., Watanabe, T., Iwata, T., Yamada, T.: Automatic Generation System of Multiple-choice Cloze Questions and its Evaluation. An International Journal of Knowledge Management and E-Learning 2(3), 210–224 (2010) 6. Standard Vocabulary List 12000: SPACE ALC, http://www.alc.co.jp/eng/vocab/svl/index.html (in Japanese)
Testing Randomness by Means of RMT Formula Xin Yang, Ryota Itoi, and Mieko Tanaka-Yamawaki
*
Abstract. We propose a new method of testing randomness by applying the method of RMT-PCA, which was originally used for extracting principal components from a massive price data in the stock market. The method utilizes RMT formula derived in the limit of infinite dimension and infinite length of data strings, and can be applied to test the randomness of very long, highly random data strings. Although level of accuracy is not high in a rigorous sense, it is expected to be a convenient tool to test the randomness of the real-world numerical data. In this paper we will show the result of applying this method (RMT-test) on two machine-generated random numbers (LCG, MT), as well as artificially distorted random numbers, and examine its effectiveness. Keywords: Randomness, RMT-test, Eigenvalue distribution, Correlation, LCG, MT.
1 Introduction In spite of numerous algorithms to generate random numbers being proposed [1,2] , no artificial method can generate better randomness than the naturally generated noise in the radioactive decay, for example [3,4]. Yet many programmers rely on various kinds of random number generators that can be used as a sub-routine to the main program. On the other hand, game players utilize other means to generate randomness, for the sake of offering flat opportunity to the players and attracting participants to the community, in order to make the participants to believe in the equal opportunity to win or lose. Stock prices are also expected to be highly random, so that the investors can dream that everyone can have a chance to win the market. When the purpose is different, the level of accuracy required on randomness is also different. On the other hand, the random matrix theory (RMT, hereafter) has attracted much attention in many fields of sciences [5,6]. In particular, a theoretical formula [7,8] on the eigenvalue spectrum of correlation matrix been applied to extract principal components from a wide range of multidimensional databases including financial prices [9-14]. Xin Yang, Ryota Itoi, and Mieko Tanaka-Yamawaki Department of Information and Knowledge Engineering Graduate School of Engineering Tottori University, Tottori, 680-8552 Japan e-mail: [email protected] *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 589–596. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
590
X. Yang, R. Itoi, and M. Tanaka-Yamawaki
We consider in this paper a new algorithm to test the randomness of marginallyrandom sequences that we encounter in various situations, by applying the RMT-PCA method [13,14] originally developed in order to extract trends from a massive database of stock prices [9-12]. We name this method the 'RMT-test' and examine its effect on several examples of pseudo-random numbers including LCG [1], and MT [4].
2 Formulation of RMT-Test The RMT-test can be formulated as follows. The aim is to test the randomness of a long 1-dimensional sequence of numerical data, S. At the first step, we cut S into N pieces of equal length T, then shape them in an N×T matrix, Si,j, by placing the first T elements of S in the first row of the matrix Si,j, and the next T elements in the 2nd row, etc., by discarding the remainder if the length of S is not divisible by T. Each row of Si,j is a random sequence of length T and can be regarded as independent T- dimensional vector, Si = (Si,1, S i,2, S i,3, ... , Si,T). We normalize them by means of Si, t - < Si > x i, t = (1) σi for (i=1,…,N, t=1,...,T) where,
< Si >=
1 T ∑S T t =1 i, t
σ i = < Si 2 > - < Si > 2
(2)
such that every row in the new matrix x has mean=0, variance =1. Since the original sequence S is random, in general all the rows are independent, i.e., no pair of rows is identical. The cross correlation matrix Ci,j between two stocks, i and j, is constructed by the inner product of the two time series, x i, t and x j,t , Ci, j =
1 T ∑x x T t =1 i, t j, t
(3)
thus the matrix Ci,j is symmetric under the interchange of i and j. A real symmetric matrix C can be diagonalized by a similarity transformation V-1CV by an orthogonal matrix V satisfying Vt=V-1, each column of which consists of the eigenvectors of C. Such that
C v k = λ k vk (k=1,…,N)
(4)
where the coefficient λk is the k-th eigenvalue and v k is the k-th eigenvector. According to the RMT, the eigenvalue distribution spectrum of the cross correlation matrix C of random series is given by the following formula [7,8],
PRMT (λ ) =
Q 2π
(λ + − λ )(λ − λ − )
λ
(5)
where the upper bound and the lower bound are given by the following formula. (6) λ ± = (1 ± Q −1 / 2 ) 2
Testing Randomness by Means of RMT Formula
591
in the limit of N → ∞, T → ∞, Q = T / N = const . where T is the length of the time series and N is the total number of independent time series (i.e. the number of stocks considered, when applied on stock markets [9-14]). An example of this function is illustrated in Fig.1 for the case of Q=3. This means that the eigenvalues of correlation matrix C between pairs of N normalized time series of length T distribute in the following range, if the sequence is random, (7) λ− < λ < λ+ The RMT-test can be formulated in the following five steps.
Algorithm of the RMT-test: 1. Prepare a data string to be tested and cut it into N pieces of length T. 2. Each piece, Si = (Si,1, S i,2 , ... , Si,T), is converted to a normalized vector x i = (xi,1, x i,2 , ... , xi,T) by means of Eq. (1) and Eq. (2). By placing x i in i-th row, we make an N × T matrix x i,t . Taking the inner product and divide it by T, we make the correlation matrix, C in Eq. (3). 3. Compute the eigenvalues λk and the eigenvectors v k of Matrix C, by solving the eigenvalue equation, Eq. (4). 4. Discard the eigenvalues to satisfy Eq. (7), as the random part. 5. If nothing is left, the data string passes the RMT-test. If any eigenvalue is left over, the data string fails the RMT-test. 6. (Visual Presentation of the RMT-test) Compare the histogram of the eigenvalue distribution and the graph of Eq. (5). If those two lines match, as in the left figure below, that data passes the RMT-test. If those two lines do not match, as in the right figure below, it fails the RMT-test.
Fig. 1 The algorithm of RMT-test is summarized in 5 steps plus a visual presentation in the step 6 in two figures, an example of good random sequence(left) and an example of bad random sequence(right).
592
X. Yang, R. Itoi, and M. Tanaka-Yamawaki
3 Applying RMT-Test on Machine-Generated Random Numbers In this chapter, we apply RMT-test on two kinds of machine-generated random numbers, namely, LCG as the most popular algorithm of pseudo-random number generators, and MT as a new algorithm recently discovered and widely used for its astronomically long period.
3.1 Random Sequence by Means of Linear Congruential Generators (LCG) The most popular algorithm of numerically generating random sequences is the class of linear congruential generators (LCG) [1]. Xn+1 = (aXn + b) mod M
(8)
The popular function rand( ) uses the following parameters in Eq. (8). A = 1103515245 / 65536 B = 12345 / 65536 M = 32768
(9)
Using Eq.(8)-(9), we generate a random sequence S of length 500*1500, then cut it into 500 ( = N) pieces of length 1500 ( = T) each to make LCG ( Q = 3) data. Although LCG are known to have many problems, RMT-test cannot detect its offrandomness. As is shown in the left figure of Fig.2, this data passes RMT-test safely for Q = 3 (left) with the wide variety of seeds. The right figure of Fig.2 is a corresponding result for Q = 6 for N = 500, T = 3000.
Fig. 2 Examples of random sequences generated by LCG for different Q pass the RMT-test (left:Q = 3, right:Q = 6, both for N = 500)
Testing Randomness by Means of RMT Formula
593
3.2 Random Sequence by Means of Mersenne Twister (MT) The Mersenne Twister (MT) [2] is a recently proposed, highly reputed random number generator. The most valuable feature of MT is its extremely long period, 219937 - 1. We test the randomness of MT in the same procedure as above. The result is shown in Fig.3 for Q = 3, and Q = 6. MT also passes the RMT-test in wide range of N and T.
Fig. 3 Examples of random sequences generated by MT for different Q pass the RMT-test (left : Q = 3, right : Q = 6, both for N = 500)
4 Application of RMT-Test on Artificially Distorted Sequence In this chapter, we artificially lower the randomness of pseudo-random sequences by three different ways, and apply the RMT-test on such data. The first data are created by collecting the initial 500 numbers generated by LCG, starting from a fixed seed. The second data are created by converting the sequences into Gaussian-distributed random numbers by means of Box-Muller formula. The third data are created by taking log-returns of the generated sequences by LCG, and MT.
4.1 Artificially Distorted Pseudo-random Sequence (LCG) Knowing that the initial part of the LCG sequences generally have low randomness, we artificially create off-random sequences by collecting the initial parts of pseudo-random numbers, namely, the sequences of 500 iterations after starting with seeds, in order to see if RMT-test can indeed detect the offrandomness. As is shown in Fig. 4, RMT-test has detected a sign of deviation from RMT formula, for the case of N=100, and t=500 (left), since some eigenvalues are larger than the theoretical maximum. On the other hand, the corresponding case of using the data without the first 500 numbers after the seeds passes RMT-test, having all the eigenvalues within the theoretical curve, as shown
594
X. Yang, R. Itoi, and M. Tanaka-Yamawaki
in the right figure in Fig.4. Compared to the case of LCG, MT does not have a problem of initial 500 numbers.
Fig. 4 (left): A collection of first 500 of LCG (N=100,T=500) fails the RMT-test. (right): LCG without first 500 numbers (N=100,T=500) passes the RMT-test.
4.2 Artificially Distorted Pseudo-random Sequence (Box-Muller) The Box-Muller formula is often used to convert uniform random numbers into Gaussian random numbers. We test those sequences for checking the performance of the RMT-test. We apply the B-M formula in two ways. One is to convert to the random number of the normal distribution with zero mean, N(0,1), which barely passes the RMT-test as shown in the left figure of Fig.6. On the other hand, another set of random numbers created to have the off-symmetric normal distribution , N(5,1), fails the RMT-test as shown in the right figure of Fig.5.
Fig. 5 (left): N(0,1) Gaussian random number barely passes the RMT-test. (right): N(5,1) Gaussian random number (N=100,T=500) fails the RMT-test.
Testing Randomness by Means of RMT Formula
595
4.3 Artificially Distorted Pseudo-random Sequence (Log-Return) In this section, we point out the fact that any random sequence looses randomness after being converted to the log-return as follows, ri = log(Si +1 / Si )
(10)
although this process is inevitable when we deal with financial time series in order to eliminate the unit/size dependence of different stock prices. The effect of taking the log-return of the time series typically results in a significant deviation from the RMT formula, as shown in the left figure for the case of LCG and the right figure for the case of MT in Fig. 6.
Fig. 6 Both log-return sequence of LCG (left) and MT (right) fail the RMT-test.
5 Summary In this paper, we proposed a new method of testing randomness, RMT-test, as a byproduct of the RMT-PCA that we used to extract trends of stock markets. In order to examine its effectiveness, we tested it on two random number generators, LCG and MT, and showed that both LCG and MT pass the RMT-test for various values of parameters. We further tested the validity of our RMT-test on some artificially deteriorated random numbers to show that the RMT-test can detect offrandomness in the following three different examples; (1) the initial 500 numbers of rand( ), and (2) off-centered Gaussian random numbers obtained by means of Box-Muller algorithm, and (3) log-return sequences of the two kinds of pseudorandom numbers (LCG, MT).
596
X. Yang, R. Itoi, and M. Tanaka-Yamawaki
References 1. Park, S.K., Miller, K.W.: Random Number Generators: Good Ones are Hard to Find. Communication of ACM 31, 1192–1201 (1988) 2. Matsumoto, M., Nishimura, T.: Mersenne Twister: a 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Transactions on Modeling and Computer Simulation 8, 3–30 (1998) 3. Tamura, Y.: Random Number Library (The Institute of Mathematical Statistics) (2010), http://random.ism.ac.jp/random/index.php 4. Walker, J.: HotBits: Genuine Random Numbers (2009), http://www.fourmilab.ch/hotbits/ 5. Edelman, A., Rao, N.R.: Acta Numerica, pp. 1–65. Cambridge University Press, Cambridge (2005) 6. Mehta, M.L.: Random Matrices, 3rd edn. Academic Press, London (2004) 7. Marcenko, V.A., Pastur, L.A.: Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik 1(4), 457–483 (1994) 8. Sengupta, A.M., Mitra, P.P.: Distribution of singular values for some random matrices. Physical Review E 60, 3389 (1999) 9. Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A.N., Stanley, H.E.: Random matrix approach to cross correlation in financial data. Physical Review E 65, 066126 (2002) 10. Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A.N., Stanley, H.E.: Physical Review Letters 83, 1471–1474 (1999) 11. Laloux, L., Cizeaux, P., Bouchaud, J.-P., Potters, M.: American Institute of Physics 83, 1467–1470 (1999) 12. Bouchaud, J.-P., Potters, M.: Theory of Financial Risks. Cambridge University Press, Cambridge (2000) 13. Tanaka-Yamawaki, M.: Extracting principal components from pseudo-random data by using random matrix theory. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010. LNCS, vol. 6278, pp. 602–611. Springer, Heidelberg (2010) 14. Tanaka-Yamawaki, M.: Cross Correlation of Intra-day Stock Prices in Comparison to Random Matrix Theory, Intelligent Information Management (2011), http://www.scrp.org
The Effect of Web-Based Instruction on English Writing for College Students Ya-Ling Wu, Wen-Ming Wu, Chaang-Yung Kung, and Ming-Yuan Hsieh
*
Abstract. The use of network technology in teaching and learning is the latest trend in education and training. The purpose of this study is to investigate the effect of web-based instruction on English writing. In this study, a hybrid course format (part online, part face-to-face lecture) was developed to deliver an English writing course to sophomores majored in English. The hybrid course was structured to include online assignments, handouts, lecture recording files and weekly lectures in a classroom. To evaluate the effectiveness of web-based instruction on learning English writing, participants include two classes. One class was taught under hybrid course format while the other one was taught with traditional instruction. The findings of the study revealed that: 1. There were statistically significant differences between students’ achievement mean scores in English writing skills and concepts attributed to the course setting. This difference is in favor of students in the hybrid course format; 2. There were no statistically significant differences between students’ writing mean scores in writing ability attributed to the course setting. Keywords: web-based instruction, CAI, English writing.
1 Introduction At most colleges and universities, English writing courses are required for students majored in English. In order to improve learning, many learners are eager to find the way to enhance their learning effect. As the progress of technology, using web-based instruction or computers as teaching tools is more prevalent than Ya-Ling Wu Department of Applied English, National Chin-Yi University of Technology *
Wen-Ming Wu Department of Distribution Management, National Chin-Yi University of Technology Chaang-Yung Kung Department of International Business, National Taichung University of Education Ming-Yuan Hsieh Department of International Business, National Taichung University of Education J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 597–604. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
598
Y.-L. Wu et al.
before. Hundreds of studies have been generated on the use of computer facilities, e-learning, network to enhance learning. Riffell and Sibley (2005) claimed that using web-based instruction can improve the learning for undergraduates in biology courses.
1.1 Description of the Hybrid Course The hybrid course incorporate three primary components: (1) computer-assisted instruction (CAI): WhiteSmoke software provides instant feedback for students’ writings; (2) web-based instructions: web-based assignments, handouts, lectures; (3) lectures in a computer-equipped classroom: weekly meetings in the computer lab focused on teaching core skills and concepts of text types in English writing.
1.2 Research Question Hypothesis This study attempts to answer the following questions. 1. Are there any statistically significant differences (α< 0.05) between the student’ achievement mean scores attributed to course setting (traditional & hybrid)? H1: There is no significant difference in achievement between students taught under hybrid course format and those students taught with traditional instruction. 2. Are there any statistically significant differences (α< 0.05) between the student’ writing mean scores in writing ability attributed to course setting (traditional & hybrid)? H2: There is no significant difference in writing ability between students taught under hybrid course format and those students taught with traditional instruction.
1.3 Limitation In this study, the participants are sophomores and major in English. The result of this study cannot infer that all college students in Taiwan can improve their English writing ability via web-based instruction. Moreover, the effect of web-based instruction might depend on some characteristics of learners, such as age, personality, attitude, motivation and so on.
2 Literature Review Since the first computers used in schools in the middle of 20th century, things have changed. At first, computers were used as tools. They are tools to help learners and instructors perform tasks faster and more efficiently. As a result, computers have played an important role in processing information as well as in assisting teaching and learning.
The Effect of Web-Based Instruction on English Writing for College Students
599
2.1 CAI There are several researches on computer-assisted instruction (CAI) indicating positive influence in learning and teaching. Schaudt (1987) conducted an experiment which students were taught with CAI in reading lesson. The result showed that CAI is possible to be an effective tool in improving students’ master aimed at reading skills. In this study, Schaudt assumed that (1) time is allocated on the computer for sufficient and continuous content coverage, (2) performance is monitored, and (3) the teacher chooses software properly for the students’ ability levels. Wedell and Allerheiligen (1990) indicated that CAI has significant efficacy in learning English, especially in reading and writing. Learners are conscious of improvement in the areas of conciseness, courtesy, concreteness, clarity, consideration, and correctness, as well as their overall, general communication skill. However, there is no efficiency on the inexperienced writers because they don’t possess enough ability to understand these corrected errors. Zhang (2007) suggested that incorporating computers into Chinese learning and teaching can offer several important advantages to American learners. It cannot only enhance the learning motivations for students in the classroom, but also afford them the increased opportunity for self-directed learning. According to Cotton (1997), using CAI as a supplement to traditional instruction can bring higher performance than only using conventional instruction. Besides, students learn instructional contents more efficient and faster when using CAI than using traditional instruction; they retain what they have learned better with CAI than with conventional instruction alone. Karper, Robinson, and Casado-Kehoe (2005) reported that computer assisted instruction has been viewed as an effective way to improve learners’ performance than the conventional instructional method in counselor education. In addition, a study by Akour (2006) showed the time required for learners to use CAI was higher overall than conventional classroom instruction. Students taught by conventional instruction combined with the use of computer performed significantly better than students taught by conventional instruction in a college setting. The purpose of Chen and Cheng (2006)’s study is to prove factors that may lead to facilitation or frustration when students was taught by CAI. They suggested that computer-assisted learning programs should be used as a supplement to classroom instruction but never as a replacement of the teacher. Kasper (2002) said that technology is now regarded as both a necessary component and means of achieving literacy; as a result, it must become a required part of language courses, and computers ought to be used as a tool to promote linguistic skills and knowledge construction. Forcier and Descy (2005) pointed out that with an explosion of information and widespread, immediate access to it, today’s students are faced with the need to evolve problem-solving skills. The classroom equipped with computer offers situations that the student will confront in real life and permits the students to illustrate their capabilities in completing various tasks.
600
Y.-L. Wu et al.
Purao (1997) stated that a hyper-linked instructional method encouraged students to engage in a more active role in the classroom and boost higher levels of learning.
2.2 Web-Based Instruction Recently, web-based instruction is becoming increasingly popular and familiar. Online instruction was broadly applied in education because it may be superior to traditional learning environment. According to Ostiguy and Haffer (2001), web-based course may provide students more flexibility and control over where and when to participate. In addition, it can lead to greater learning motivation (St. Clair, 1999). Furthermore, Hacker and Niederhauser (2000) indicated that learning in web-based courses can be more active than traditional instruction. It is also more student-centered (Sanders, 2001) and can encourage students to learn (Yazon, Mayer-Smith, & Redfield, 2002). Moreover, it was reported that online courses have significant improvements in student performance (Navarro and Shoemaker, 2000), and a hybrid course could improve learning outcomes (Tuckman, 2002).
3 Methodology This study was designed to investigate whether the web-based instruction significantly enhance learning outcome and English writing ability for students.
3.1 Subjects Participants of this study were recruited by convenient sampling because it is hard to get the permission from instructors to conduct this study. The participants consisted of 41 sophomores majored in English. There were totally two classes. All participants were taught by a same instructor for a semester.
3.2 Instrument In order to successfully implement this study, achievement tests and writing tests were designed by the instructor. The achievement tests were used to measure the core skills and concepts for different text types in English writing, and writing tests were utilized to evaluate writing ability. 3.2.1 Achievement Test Two achievement tests were developed by the instructor. The first test was used as a pre-test to assess the students’ previous knowledge and the second one as a posttest to find out the impact of the hybrid course format on students’ achievement.
The Effect of Web-Based Instruction on English Writing for College Students
601
3.2.2 Writing Test Writing tests were administered on the first day of the course (pre-test) and on the last class day (post-test). The objective of the pre-test was to compare the level of writing ability between the two classes. Both pre-test and post-test were assessed by the software, WhiteSmoke. WhiteSmoke contains the following functions: Grammar-Checker, Punctuation-Checker, Spell-Checker, Dictionary, Thesaurus, Templates, Translator, Error Explanations and AutoCorrect. The grading system in WhiteSmoke is 10-point scale.
3.3 Procedure The study was conducted by the following steps. First of all, in order to make sure the students in both classes are at the same level of writing ability, a writing test (pre-test) was arranged at the beginning of a semester. Students were asked to write a paragraph in the writing test. Then students’ writings were evaluated by the software “WhiteSmoke.” The use of WhiteSmoke could provide the same criterion on the evaluation of writing ability. In addition, the first achievement test was taken by students in the two classes on the first class day to test students’ previous knowledge. During the semester, one class (experimental group) was taught under hybrid course format in the computer lab whereas the other one (control group) was subject to traditional instructions in the classroom with blackboard only. Finally, at the end of the semester, an achievement test (post-test) and a writing test (post-test) were given again.
3.4 Data Analysis The data analysis consisted of independent samples t-test testing the significance of the difference in the mean scores of tests between the two classes.
4 Results This study was designed to determine whether a significant difference existed in the achievement and writing ability based on course settings. The first question asks about the existence of statistically significant differences between the students’ achievement mean scores attributed to the course setting (traditional & hybrid). Analysis of independent samples t-test was performed to test the significance of the differences between the experimental group who were taught under hybrid course setting and the control group who were taught with traditional instruction. Table 1 presents the means and standard deviations of the experimental group and control group for achievement tests and writing tests.
602
Y.-L. Wu et al.
Table 1 Means and Standard Deviations on Tests
Tests Achievement Test 1
Achievement Test 2
Writing Test 1
Writing Test 2
Group
N
Mean
SD
Hybrid
25
81.64
7.233
Traditional
16
84.88
7.650
Hybrid
25
73.12
7.844
Traditional
16
63.88
11.592
Hybrid
24
7.38
1.056
Traditional
14
6.93
1.072
Hybrid
23
7.39
.839
Traditional
16
7.25
1.125
As indicated in Table 1, the mean score on achievement test 1 for students in the hybrid course format is 81.64 and 84.88 for students in traditional setting; the mean scores on achievement test 2 are 73.12 and 63.88 in the two groups respectively. The mean score on the achievement test 2 is lower than it is on the achievement test 1 for the two groups. This is because the contents tested in the achievement test 2 were more difficult than the contents tested in the achievement test 1. Moreover, the mean scores on writing test 1 for the two groups are 7.38 and 6.93, 7.39 and 7.25 on writing test 2. Table 2 Independent Samples t-test on Tests t
df
Achievement Test 1
Tests
p-Value
-1.366
39
.180
Achievement Test 2
3.052
39
.004*
Writing Test 1
1.251
36
.219
Writing Test 2
.450
37
.656
To test hypothesis one and two, independent samples t-test was utilized to determine the significance of differences, if any, between mean scores of achievement tests and writing tests between the two groups. As indicated in Table 2, there is no statistically significant difference between the mean scores in the achievement test 1 and writing test 1 (pre-tests). It reveals that there is no significant difference (p = 0.18) on students’ previous knowledge of writing skills and writing ability level between the two groups. However, it exists a significant difference (p = 0.004) on the achievement test 2. It indicates that the student in the hybrid course format have better performance on course contents learned in class than the student in the traditional setting. In contrast, it refers no significant difference (p = 0.656) between the mean scores in the writing test 2. This shows that students in
The Effect of Web-Based Instruction on English Writing for College Students
603
hybrid class did not have better performance on writing ability than students in tradition class.
5 Discussions and Conclusions This study had two major purposes. The first purpose of the study was to examine the effect of web-based instruction on students’ achievement. Secondly, the researchers sought to determine whether a significant difference existed in writing ability based on the course setting. According to the analysis of t-test shown in Table 2, students who received web-based instruction performed better on achievement test than students who were treated by traditional instruction. The result indicated that web-based instruction enhanced students on learning core skills and concepts of text types in English writing. Students exposed to the hybrid course format had more flexibility and control over where and when to review the contents learned in class. However, findings in this study revealed that students in the hybrid course setting did not perform better on the writing ability than students in traditional course setting. This probably implies that students need more time to improve their writing ability. Although available technology can greatly enhance students’ learning on the contents of a course, well-planned curricular should be careful designed by instructors.
References 1. Akour, M.A.: The effects of computer-assisted instruction on Jordanian college student’ achievements in an introductory computer science course. Electronic Journal for the Integration of Technology in Education 5, 17–24 (2008), http://ejite.isu.edu/Volume5/Akour.pdf 2. Chen, C.F., Cheng, W.Y.: The Use of a Computer-based Writing Program: Facilitation or Frustration? Presented at the 23rd International Conference on English Teaching and Learning in the Republic of China (May 2006) 3. Cotton, K.: Computer-assisted instruction. North West Regional Educational Laboratory, http://www.borg.com/rparkany/PromOriginalETAP778/CAI.html 4. Forcier, R.C., Descy, D.E.: The computer as an educational tool: Productivity and problem solving, vol. 17. Pearson Education, Inc., London (2005) 5. Hacker, D.J., Niederhauser, D.S.: Promoting deep and durable learning in the online classroom. New Directions for Teaching and Learning 84, 53–64 (2000) 6. Karper, C., Robinson, E.H., Casado-Kehoe, M.: Computer assisted instruction and academic achievement in counselor education. Journal of Technology in Counseling 4(1) (2005), http://jtc.colstate.edu/Vol4_1/Karper/Karper.htm (Retrieved December 22, 2007) 7. Kasper, L.: Technology as a tool for literacy in the age of information: Implications for the ESL classroom. Teaching English in the two-year College (12), 129 (2002)
604
Y.-L. Wu et al.
8. Navarro, P., Shoemaker, J.: Performance and perceptions of distance learners in cyberspace. American Journal of Distance Education 14, 15–35 (2000) 9. Ostiguy, N., Haffer, A.: Assessing differences in instructional methods: Uncovering how students learn best. Journal of College Science Teaching 30, 370–374 (2001) 10. Purao, S.: Hyper-link teaching to foster active learning. In: Proceedings of the International Academy for Information Management Annual Conference, 12th, Atlanta, GA, December 12-14 (1997) 11. Riffell, S., Sibley, D.: Using web-based instruction to improve large undergraduate biology courses: an evaluation of a hybrid course format. Computers and Education 44, 217–235 (2005) 12. Sanders, W.B.: Creating learning-centered courses for the world wide web. Allyn & Bacon, Boston (2001) 13. Schaudt, B.A.: The use of computers in a direct reading lesson. ReadingPsychology 8(3), 169–178 (1987) 14. St. Clair, K.L.: A case against compulsory class attendance policies in higher education. Innovations in Higher Education 23, 171–180 (1999) 15. Tuchman, B.W.: Evaluating ADAPT: A hybrid instructional model combining webbased and classroom components. Computers and Education 39, 261–269 (2002) 16. Wedell, A.J., Allerheiligen, J.: Computer Assisted Writing Instruction: Is It Effective? The Journal of Business Communication, 131–140 (1991) 17. Yazon, J.M.O., Mayer-Smith, J.A., Redfield, R.J.: Does the medium change the message? The impact of a web-based genetics course on university students’ perspectives on learning and teaching. Computers and Education 38, 267–285 (2002) 18. Zhang, H.Y.: Computer-assisted elementary Chinese learning for American students. US-China Education Review 4(5) (Serial No. 30), 55–60 (2007)
The Moderating Role of Elaboration Likelihood on Information System Continuance Huan-Ming Chuang, Chien-Ku Lin, and Chyuan-Yuh Lin
*
Abstract. Under the rapid development of digital age, government has been promoting the digitization and networking of administration activities. Under this background, with the great potential to improve learning effectiveness, uSchoolnet is one of the important information systems to be emphasized. Nevertheless, no matter how good the system is, and how active it is promoted, if can not be accepted and used, the system can not be succeeded at all. Consequently, essential factors affecting uSchoolnet’s acceptance and continuance are big research issues. Based on the case of uSchoolnet used in elementary schools of Yunlin county, Taiwan, this study adopt theories from elaboration likelihood model (ELM) and IS success model to build research framework. Specifically, system quality and information quality from IS success model represent the argument quality of ELM model, while service quality represents source creditability of it. Besides verifying the casual effects of IS success related constructs, the dynamic moderating effects of task relevance and personal innovativeness representing motivation and user expertise representing ability, on the above relationships are also investigated. Questionnaire survey is conducted to collect relevant data for analysis, with teachers of Yunlin county elementary schools are sampled as research subjects. Major research results can offer insightful, valuable and practical guidance for the promotions of uSchoolnet. Keywords: Elaboration Likelihood Model, IS success Model, Information System Continuance.
1 Introduction Under the rapid development of digital age, government has been promoting the digitization and networking of administration activities. Under this background, Huan-Ming Chuang Associate Professor, Department of Information Management, National Yunlin University of Science and Technology *
Chien-Ku Lin · Chyuan-Yuh Lin Graduate Student, Department of Information Management, National Yunlin University of Science and Technology J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 605–615. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
606
H.-M. Chuang, C.-K. Lin, and C.-Y. Lin
with the great potential to improve learning effectiveness, Taiwan government has been prompting information technology integrated instruction aggressively. uSchoolnet is one of the important information systems to be emphasized in elementary schools. Nevertheless, no matter how good the system is, and how active it is promoted, if can not be accepted and used, the system can not be succeeded at all. Consequently, essential factors affecting uSchoolnet’s acceptance and continuance are big research issues.
2 Background and Literature Review 2.1 Delone and McLean’s IS Success Model Since IS success is a multi-dimensional concept that can be assessed from different perspectives according to different strategic needs, the measure for IS success is not an easy and objective job. Nevertheless, Delone and McLean in 1992 made a major breakthrough. After conducting a comprehensive review of related literature, they proposed a IS success model as shown in Fig. 1. This model identified six important dimensions of IS success and suggest the temporal and casual interdependencies between them.
Fig. 1 DeLon and McLean’s IS success model [1]
After the original model went through lots of validations, Delone and McLean proposed an updated model in 2003 as shown in Fig. 2.
The Moderating Role of Elaboration Likelihood
607
Fig. 2 DeLone and McLean’s updated IS success model [2]
The primary differences between the original and updated models can be listed as follow: (1) the addition of service quality to reflect the importance of service and support in successful e-commence systems, (2) the addition of intention to use to measure user attitude, and (3) the combining of individual impact and organizational impact into a net benefit construct.
2.2 Elaboration Likelihood Model In social psychology literature, the role of influence processes in shaping human perceptions and behavior has been examined by dual-process theories. These theories suggest that there exist two alternative processes of attitude formation or change, namely more versus less effortful processing of information [3]. One representative dual process theory of interest to this study is the elaboration likelihood model (ELM). ELM posits in relation to above-mentioned dual processes, there are two alternative “route” of influence, the central route and the peripheral route, which differ in the amount of thoughtful information processing or “elaboration” demanded of individual subjects [3] [4]. ELM supposes elaboration likelihood is composed of two major component dimensions, motivation and ability to elaborate, both of which are required for extensive elaboration to occur [4]. ELM researchers have typically adopted recipients’ personal relevance of the presented information as their motivation, and prior expertise or experience with the attitude target as their ability. If information recipients view a given message as being important and relevant to the target behavior, they are more likely to spend more time and effort to scrutinize its information content. In contrast, those who view the same message as having little personal relevance may not be willing to spend the same effort, but instead rely on peripheral cues such as recommendations from trustable people to build their perceptions. Consequently, personal relevance and prior expertise are presumed to exert moderating effects on the relationship between argument quality or peripheral cues and attitude changes as shown in Figure 3.
608
H.-M. Chuang, C.-K. Lin, and C.-Y. Lin
Fig. 3 Elaboration Likelihood Model [3]
3 Research Model 3.1 The Application of IS Success Model 3.1.1 The Roles of System Use and Net Benefits There has been intense debate over the role system use plays in measuring IS success. From the con side, some authors have suggested that system use was not an appropriate IS success variable [5]. But Delone and McLean argued otherwise. They posited that the source of the problem was a too simplistic definition of system use, and researchers have to take the extent, nature, quality and appropriateness of it into consideration. For the purpose of this research, since all sampled subjects had experiences with the target IS, we adopted continuance intention as surrogate of system use. Although it may be desirable to measure system benefits from the objective and quantitative perspective, such measures are often hard to conduct due to intangible system impacts and intervening environmental variables [6]. Consequently, system benefits are usually measured by the perception of system users. We also used perceived system benefits to represent IS success. 3.1.2 The Relationships among System Use, Net Benefits, and User Satisfaction In terms of the relationship between system use and net benefits, Seddon (1997) contended that system use would be a behavior that reflects an expectation of potential system benefits; though system use must precede benefits, it did not cause them. Therefore, system benefit was precedent variable of system use, not vice versa. User satisfaction results from the feelings and attitudes from aggregating all the benefits that a person hopes to receive from interaction with the IS [7]. Therefore, user satisfaction was caused by perceived system benefits. In addition, some researchers [8] proposed that user satisfaction pushes system use rather than vice versa.
The Moderating Role of Elaboration Likelihood
609
3.2 The Application of ELM Since social psychology research views attitude as a broad construct consisting of three related components: cognition, affect, and conation [9], we expanded the independent variable of ELM (i.e., attitude changes) into three constructs. First, as the cognition dimension is related to beliefs salient to the target behavior, we used perceived system benefits as a representation. Next, for the affect dimension, it is represented by user satisfaction. Last, for the conation dimension, we adopted continuance intention as a proxy. ELM identified two alternative influencing processes, namely central route and peripheral route toward information recipients’ attitude changes. Besides, it recognized the important moderating effects of personal elaboration likelihood on the above-mentioned relationships. While elaboration likelihood refers to users’ motivation and ability to elaborate on informational messages[4]. Bhattacherjee and Sanford [3] drawing on prior ELM research, operationalized motivation dimension of elaboration as job relevance, defined as the information recipient’s perceived relevance of an IT system to their job, and the ability dimension as user expertise, defined as the information recipient’s IT literacy in general. We not only followed their approach, but also added personal innovativeness as a motivation construct, since under rapid-developing IT environment, this factor can be expected to influence their involvement of an IT system. 3.3 The Integration of IS Success Model and ELM 3.3.1 The Identification of Central Route The central route of ELM is represented by argument quality, and peripheral route is peripheral cues. Argument quality refers to the persuasive strength of arguments embedded in an informational message, while peripheral cues relate to metainformation about the message (e.g., message source) [3]. Sussman and Siegal [10] developed an argument quality scale that examined completeness, consistency, and accuracy as three major dimensions. These dimensions mapped quite well with the information quality as well as broader system quality of IS success model. As a result, we used IS users’ assessment of system quality and information quality as our model’s argument quality. 3.3.2 The Identification of Peripheral Route Many peripheral cues have been suggested in the ELM literature, including the number of message, number of message sources, source likeability, and source credibility. Of these, source credibility seems to be one of the most frequently, referenced cues [3]. Source credibility is defined as the extent to which an information source is perceived to be believable, competent and trustworthy by information recipients [4] [10]. This definition of source credibility relates quite well with the service quality of IS success model, which emphasizes the competency and credibility of IS champions.
610
H.-M. Chuang, C.-K. Lin, and C.-Y. Lin
In sum, our research model can be shown as Fig. 4.
Fig. 4 Research model
3.4 Research Hypotheses 3.4.1 Hypothesis Related to IS Success Model Based on above literature review, the hypotheses regarding IS success model can be listed as follows. H1. The extent of system quality is positively associated with user perceived net benefits. H2. The extent of information quality is positively associated with perceived net benefits. H3. The extent of service quality is positively associated with perceived net benefits. H4. The extent of system quality is positively associated with user satisfaction. H5. The extent of information quality is positively associated with user satisfaction. H6. The extent of service quality is positively associated with user satisfaction. H7. The extent of perceived net benefits is positively associated with user satisfaction. H8. The extent of perceived net benefits is positively associated with user continuance intention. H9. The extent of user satisfaction is positively associated with user continuance intention. 3.4.2 Hypotheses Related to ELM Based on above literature review, the hypotheses regarding ELM can be listed as follows.
The Moderating Role of Elaboration Likelihood
611
H10. User elaboration likelihood has moderating effect on the relationship between IS success dimensions and perceived net benefits. H10a. User elaboration likelihood has positive moderating effect on the relationship between system quality and perceived net benefits. H10b. User elaboration likelihood has positive moderating effect on the relationship between information quality and perceived net benefits. H10c. User elaboration likelihood has negative moderating effect on the relationship between service quality and perceived net benefits.
4 Research Method 4.1 Study Setting uSchoolnet is a leading web-based communication network for the k-12 schools sponsored by Prolitech Corp. The goal of this system is to bridge the gap between teachers, students, parents and administrators by offering products and services that promote and encourage interaction and collaboration. By allowing everyone involved to participate in the creation, maintenance, and growth of their own class website, their class website will be an extension of their daily lives. It will be dynamic and have a life of its own. Furthermore by focusing on the user experience and friendliness, teachers will finally be able to focus on teaching, students to focus on learning, parents to focus on guiding and administrators to focus on managing [13]. uSchoolnet is a comprehensive class web suite that offers the following major features: (1) Photo Albums, (2) Class Schedule, (3) Class Calendar, (4) Seating Chart, (5) Homework Assignments, (6) Student Recognition/Awards, (7) Message Board, (8) Content Management System, and (9) Polls and Surveys Since its practicability, it is quite popular in Taiwan, and is chosen as the target IS of this research. The adoption of uSchoolnet is volitional in nature, instructors are encouraged, but not forced to do so.
4.2 Operationalization of Constructs All constructs and measures were based on items in existing instruments, related literature, and input from domain experts. Items in the questionnaire were measured using a seven-point Likert scale ranging from (1) strongly disagree to (7) strongly agree.
4.3 Data Collection Data for this study were collected using a questionnaire survey administered in Yunlin county of Taiwan. The respondents were convenient sampled from instructors of elementary schools who have experiences with uSchoolnet. We sent out 200 questionnaires and received 185 useful responses.
612
H.-M. Chuang, C.-K. Lin, and C.-Y. Lin
5 Data Analysis and Results 5.1 Scale Validation We used PLS-Smart 2.0 software to conduct confirmatory factor analysis (CFA) to assess measurement scale validity. The variance-based PLS approach was preferred over covariance-based structural equation modeling approached such as LISREL because PLS does not impose sample size restrictions and is distribution-free [11]. 100 records of raw data was used as input to the PLS program, and path significances were estimated using the bootstrapping resmapling technique with 200 subsamples. The steps of scale validation were summarized as shown in table 1. Table 1 Scale validation Type of validity Definition and criteria Ref. Convergent ● Measures of constructs that theoretically should be related to each [12] validity other are, in fact, observed to be related to each other. - All item factor loadings should be significant and exceed 0.70 - Composite reliabilities (CR) for each construct should exceed 0.80 - Average variance extracted (AVE) for each construct should exceed 0.50, or the square root of AVE should exceed 0.71. Discriminan ● Measures of constructs that theoretically should not be related to each [12] validity other are, in fact, observed to not be related to each other. - The square root of AVE for each construct should exceed the correlations between that and all other constructs.
As seen from Table 2, standardized CFA loadings for all scale items in the CFA model were significant at p <0.01 and most of them meet the requirement of minimum loading criteria should greater than 0.7. From Table 2, we can see the CRs of all factors also exceed the requirement of 0.80. Further, from the principal diagonal elements in Table 3, we can see that all the square root of AVE were greater than the desired minimum 0.71. Hence, all three conditions for convergent validity were met marginally. From Table 3, we can see that the square root of AVE for each construct exceeded the correlations between that and all other constructs. Therefore, the discriminant validity criterion was also met for our data sample. Table 2 CR, AVE and factor loadings of constructs Construct System quality (SQ) Information Quality (IQ) Service quality (SEQ) Perceived net benefits (PNB) Satisfaction (SAT) Continuance intention (CI) Elaboration likelihood (EL)
CR 0.97 0.98 0.93 0.97 0.97 0.95 0.93
AVE 0.85 0.87 0.93 0.88 0.89 0.87 0.59
Factor loading 0.92/0.95/0.93/0.91/0.89/0.92 0.93/0.94/0.93/0.91/0.94/0.94 0.94/0.97/0.98/0.96 0.91/0.95/0.96/0.94 0.94/0.94/0.96/0.95 0.93/0.95/0.92 0.73/0.80/0.63/0.56/0.59/0.89/0.89/0.85/0.89
The Moderating Role of Elaboration Likelihood
613
Table 3 Inter-Construct Correlations
SQ IQ SEQ PNB SAT CI EL
SQ 0.98 0.86 0.68 0.66 0.59 0.57 0.73
IQ
SEQ
PNB
SAT
CI
EL
0.99 0.59 0.67 0.56 0.60 0.67
0.96 0.46 0.37 0.36 0.52
0.98 0.66 0.72 0.69
0.98 0.74 0.82
0.97 0.69
0.96
Note: Diagonal elements (in italics) represent square root of AVE for the construct.
5.2 Hypotheses Testing The results of hypotheses testing can be drawn as Fig. 5 and Fig. 6. below.
Note: Path significance * p<0.05, **p<0.01, ***p<0.001; Parentheses indicate R2 values. Fig. 5 PLS analysis of main effects
Note: Path significance * p<0.05, **p<0.01, ***p<0.001; Parentheses indicate R2 values. Fig. 6 PLS analysis of moderating effects.
614
H.-M. Chuang, C.-K. Lin, and C.-Y. Lin
6 Discussion and Conclusions 6.1 Discussion of Key Findings From the results of PLS analysis of main effects, we can find that among three main constructs, system quality plays the most important role, since it exerted significant effects toward both perceived net benefits and satisfaction. Next came the information quality, which showed significant effects on perceived net benefits, but not on satisfaction. And service quality did not have significant influences on perceiver net benefits, nor satisfaction. For users’ continuance intention, both perceived net benefits and satisfaction were critical causing factors, and the influences of perceived can also mediate satisfaction to enhance users’ continuance intention. Regarding the dual-process theory of ELM, based on the results of PLS analysis of moderating effects, the influence pattern did follow expectation that elaboration likelihood had significant positive effects on the relationship between central route (represented by system quality in this study) and perceived net benefits, meanwhile had significant negative effects on the relationship between peripheral route (represented by service quality in this study) and perceived net benefits. Elaboration likelihood did not show significant moderating effects on the relationship between information quality may be due to the nature of uSchoolnet was built mostly for class administration, consequently, information quality was not the main concern of users.
6.2 Implication for Practice In order to improve the acceptance and continuance intention of uSchoolnet, some helpful guidance’s can be suggested as follows: (1) focusing on improving the objective side of uSchoolnet, namely system quality and information quality, (2) enhancing IS users’ information literacy, since it can tighten the link between system quality or information quality and perceived net benefits, and then the final continuance intention.
References [1] DeLone, W.H., McLean, E.R.: Information Systems Success: The Quest for the Dependent Variable. Information Systems Research 3(1), 60–95 (1992) [2] DeLone, W.H., McLean, E.R.: The DeLone and McLean Model of Informtaion Systems Success: a Ten-Year Update. Journal of Management Information Systems 19(4), 9–30 (2003) [3] Bhattacherjee, A., Sanford, C.: ”Influence processes for information technology acceptance: an elaboration likelihood model”. MIS Quarterly 30(4), 805–825 (2006) [4] Petty, R.E., Cacioppo, J.T.: Attitudes and Persuasion: Classic and Contemporary Approaches. William C. Brown, Dubuque (1981) [5] Seddon, P.B.: A Respecification and Extension of the DeLone and McLean Model of IS Success. Information Systems Research, 240–253 (1997)
The Moderating Role of Elaboration Likelihood
615
[6] McGill, T., Hobbs, V., Kloba, J.: “User-Developed Applications and Information Systems Success: a Test of DeLone and McLean’s Model”. Information Resources Management Journal 16(1), 24–45 (2003) [7] Ives, B., Olson, M.H., Baroudi, J.J.: The Measurement of User Information Satisfaction. Communications of the ACM 26(10), 758–793 (1983) [8] Baroudi, J.J., Oslon, M.H., Ives, B.: An Empirical Study of the Impact of User Involvement on System Usage and Information Satisfaction. Communication of the ACM 29(3), 232–238 (1986) [9] Breckler, S.J.: Empirical Vladition of Affect, Behavior, and Cognition as Distinct Components of attitudes. Journal of Personality and Social Psychology 47(6), 1191–1205 (1984) [10] Sussman, S.W., Siegel, W.S.: Informational Influence in Organizations: An Integrated Approach to Knowledge Adoption. Information Systems Research 14(1), 47–65 (2003) [11] Chin, W.W., Marcolin, B.L., Newsted, P.R.: A partial least squares latent variable modeling approach for measuring interaction effects: results from a Monte Carlo simulation study and an electronic mail emotion/adoption study. Information Systems Research 14(2), 189–217 (2003) [12] Fornell, C., Larcker, D.F.: Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research 18(1), 39–50 (1981) [13] http://www.uschoolnet.com/portal/?m=our&oid=about (retrieved on February 25, 2010)
The Type of Preferences in Ranking Lists Piech Henryk and Grzegorz Gawinowski
Abstract. We consider the situation when we dispose several lists of ordered objects (stages, objects etc.) and we use it to create final list of assigned objects or classificated objects. Moreover we permit the continuous inflow of new object and outflow chosen to realization objects. We would mention that initial lists can be generated by simple approximation set of algorithms or experts as well as agents etc. Introduction presents the conception of utilization of theses and antitheses about object location on particular lists to create final lists of ordered objects. We create final list on the base of initial lists. This final list we utilize to build the sets of theses and antitheses. The weighing of locations theses and antitheses is the object of investigations. It permits us to create final list of ordering objects. We consider the possibility of changing ranges of zone theses and antitheses as well as aggregation of indirect conclusions and joining the supporting heuristics to infer about choice of objects. Keywords: preferention, domination, ranking lists, consistency estimation.
1 Introduction In many sources [3, 4, 5, 25] to analyze data of ordered objects, classification and creation relative opinions are used the supporting and the anti supporting measures treating to set of objects conditions. The choice of suitable measures as well as description of mutual relation (dependencies) among them is the subject of studies presented in [10]. Many practical recommendations and heuristics (or adaptation form) have worked out for real conditions Piech Henryk · Grzegorz Gawinowski Czestochowa University of Technology, Dabrowskiego 73 e-mail: [email protected]
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 617–627. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
618
P. Henryk and G. Gawinowski
[7]. The researches of proposed measures are used to create new Bayese’s rules for supporting theses like ”if...., then ....” [4]. In our work we focus on utilizing measures to define an opinion about scale of domination and preferential arrangements supporting objects locations (theses and antitheses treat to them). The zones and structure of theses and antitheses generally strengthen the procedures of inference and finally confirm positions of ordered objects. The final arrangement of measures in monotonic structures [12] determines the base to infer about structure of solution. The results of approximation are obtained on the base of Lorenze domination, which ”contains in itself” Pareto domination [2, 6]. Fuzzy structure of results of inferences is not analyzed in our description because we are focusing on introduction of measures supporting inferences, but not on results and their profiles (lower and upper approximations). Studied questions are typical multi-criteria problems [9, 24], and their solutions we classify as decisions making questions. The different kind of modifications of conception connected with dividing on sets of thesis and antithesis in our work are introduced. According to one of the versions the internal and external dominations in reference to zones of theses and antitheses are investigated same as their measures were defined. In other words, there are proposed scale estimations of mutual penetration of theses and anti-these structures (ranks of locations). Introduced concepts can be aggregated creating complex of supporting decision heuristics.
2 Adaptation to Rankings Problem: Preference Relation and Profile We don’t want to close preference theory [6, 20] but rather adopt preference relation to situation when we have m ranking lists (m decision makers, criteria or algorithms). Every decision maker takes n decision on every stage of his list forming:∀ni=1 ∃m j=1 Di,j , where j - code of decision maker, i - number of decision, Di,j - decision about object location (choice object ϕk , k = 1, 2, ..., n on i place of ranking list (or ∀ni=1 ∃m j=1 Di,j ). Decision Di,j = loc(ϕk ) → j, where ϕk - k-th object can not be repeated according to single decision maker (one object on one list): if loc(ϕk ) → j then not loc(ϕk ) →¬ j, where ¬ j different stage (placement on list) then j (not j). Obviously several objects can not pretended to the same place on one list: if loc(ϕk ) → j then not loc(ϕp ) → j, where k = p. These two rules can be shortly described as: Di,j = Di,p . Obviously different decision makers can take the same decision on specific stage j: Di,j = Dr,j i.e. that the same objects can have the same location on different lists. Object’s attribute location is chosen on ranking list: loc(ϕk ). Supporting attribute is defined as the number of objects on given position sup(ϕk → j). Set of preference relation on all objects Φ; (ϕ1 , ϕ2 , ..., ϕn ) is denoted by R; R ∈ R if and only if 1)xRx 2)(xRy and yRz) => xRz 3)(xRy or yRz). We have strict preference relation we have if and only if (xRy and − yR − x) as well as indifference if
The Type of Preferences in Ranking Lists
619
and only if (xRy and yRx). On every stage we can obtain n-tuple of objects ordered by preference relation. The set of these relations is named preference profile (R(1), R(2), ..., R(m)). When decision maker i takes decision about object ϕk choice on j decision stage then ϕk R(i) ϕp ≡ ϕk P(i) ϕp ≡ ϕk ϕp , if ϕp wasn’t chosen by any decision maker. Decisions on stage j can be joined with decision on previous stages. We assume that there are no possibilities of indifference relation between objects in ranking list. So set R has n! possible relation (permutation) [21, 24]: R1 R2 ... Rn! obj1 obj2 objn obj2 obj1 obj(n − 1) ................................. objn objn obj1 For R1 we have obj1 P 1 obj2; obj2 P 1 obj3 ; ... ; obj(n − 1) P 1 objn or obj1 obj2; obj2 obj3 ; ... ; obj(n − 1) objn. Set of profile Rm has (n!)m elements. Basing on group of ranking lists equated relation structure, we can create final ranking list.
3 Description of Ranking Lists and Character of Preferences between Objects We dispose several (m) ranking lists with length n in which n objects are dislocated (coded in fields) (Fig.1). We will research preferences between objects basing on : • • • •
Pareto domination in aspect of objects location, Lorenze domination in aspect of objects location, domination in aspect of object appearing frequency, concentration objects location domination,
Fig. 1 Objects located on ranking lists (rows) on sequenced placements (columns)
620
P. Henryk and G. Gawinowski
• domination of object compositions, • domination in aspect of mutual objects relations. We can propose same classification of domination (fig.2): Domination can have strong or weak character. Domination is connected with classification, categorization, comparison, ordering (scheduling) and other selecting procedures. When it is not possible to definitely say that one object is better then another we use terminology: undistinguishable, inextricable, incomparisonable, indifferenceable in various theories and different situations (objects, problems) [7, 10]. Intuitively and generally we can present strength of domination (or preferences) as in fig.3. Weak preferences are often interpreted as ”at least good as” or ”at most as good as” [8]. It is exploited in problems descriptions of classification, decision rules, pairwise comparison etc. [22]. Location attributes which we meet in ranking list are connected with neighborhoods and their compositions. Let’s define objects neighborhood by using
Fig. 2 Scheme of ranking objects classification
Fig. 3 The visual presentation of domination strength
The Type of Preferences in Ranking Lists
621
notation exploited in description of confirmation measure [2] : sup(ϕ → ψ) (number of objects (objects) having property ϕ and ψ. In our case ϕ means the code of object and ψ the location (position number) on list. The sequence: {sup(ϕi → ψj ) > 0}, {sup(ϕi → ψj+1 ) > 0}, ..., {sup(ϕi → ψj+l−1 ) > 0}, where ψj , ψj+1 , ... - succeeding positions on list (psij -first position, ψj+l−1 - last position of neighborhood), l - neighborhood width, is named neighborhood of object Φi (fig.4). We can introduce preferences of opinions matrix form (fig.5). The notation ”list i - list j” marks, that on list i object ob A is on position which requires approving. The number of rows in fig.5, including the opinion of preference is even (m−1), where the m is the number of algorithms used (number of lists). Notation ”?” means the incomparability of attributes [22], i.e. we can not qualify the degree of supporting of decision about object location. Such situation for introduced assignments do not appear, because analyzed attributes are comparable only. In fig. 6 number of defined relations, which have positive value of opinion in columns was marked by num{es(i, j)}. Similarly we can create the table of relative preferences with reference to their validity scale (hierarchy) - fig.7. In this case by sum{es(i, j)} we denote the sum of positively estimated relation of preferences in separate table columns in fig.5. The differences between effects of inferring basing on analysis of single placements and concentrated in objects neighborhood centers are showed in tables on fig.8 and 9. The scale of preference is presented in table on fig.9 (notation ”˜” means, that given location isn’t connected with object i). After this stage of analysis we sum all values of preferences x for particular object and hypothesis. Finally we will build tables for all objects and use them for creating final list location (fig.10). Detailed description and presentation of such approach will be performed in next sections.
Fig. 4 Examples of object ϕ9 neighborhoods (3)
Fig. 5 Matrix model of preference relation
622
P. Henryk and G. Gawinowski
Fig. 6 Table of relative preferences with reference to number of cases
Fig. 7 Table of relation of preferences in reference to hierarchy
Fig. 8 The estimation of objects i location on lists according single placements (xspace for estimator values).
Fig. 9 Scale of preference type (A(i) A(j) for hypothesis k:
Fig. 10 Tables for all objects for creating final list location (h k - hypothesis connected with choosing object location in position k)
The Type of Preferences in Ranking Lists
623
4 Short Presentation of Utility Domination and Preference Dependences in Scheduling Process According to Pareto domination (more stronger then P-Lorenz criterion) we must get the comparable differences between placement of chosen objects (tasks) for all algorithms (rows in table on fig. 11). According to P-Lorenz object 2 gets superiority over object 1 according to occupying placement 1. Using Pareto domination criterion Dp we can obtain such precise clue for determining placements of objects. The method corresponds with P-Lorenze domination criterion, where we summarize difference between placements object1 and 2. For four cases we hold: ˆ p object1 1. (4 − 1) + (1 − 8) + (7 − 1) + (1 − 8) = −5 object2 D ˆ p object1 2. (4 − 1) + (7 − 8) + (6 − 3) + (1 − 7) = −1 object2 D ˆ p object1 3. (7 − 1) + (1 − 8) + (7 − 1) + (1 − 8) = −2 object2 D ˆ p object1 4. (4 − 2) + (2 − 8) + (8 − 1) + (1 − 8) = −4 object2 D Referencing to cones of domination [8], we can prepare and qualify outranks relation between objects (represented by their codes). We would remind the definition of this relation: (code a) is at least as good as (code b) (to locate given object on final list): (code a) S (code b). It must be considered for every criterion (algorithms-rows). For every pair of objects we can create sets, which fulfill outrank relation: B. In similarly way we can create relation
Fig. 11 Examples - two objects pretending to specific placements on final list
624
P. Henryk and G. Gawinowski
of type (code a) is not at least as good as (code b) which is named (described) by (code a) S c (code b). For given (code a, code b) ∈ B we define dominating set: Dp+ = (code a, code b) ∈ B : (ca, cb) D (code a, code b), and dominated set: Dp− = (code a, code b) ∈ B : (code a, code b) D (ca, cb) The P-dominating and P-dominated sets can be used to express P-lower and P-upper approximations preference relation S and S c , respectively: P (S){(code a, code b) ∈ B : Dp+ (code a, code b) ⊆ S} P (S){(code a, code b) ∈ B : Dp− (code a, code b) ∩ (S = 0)} P (S c ){(code a, code b) ∈ B : Dp− (code a, code b) ⊆ S c } P (S c ){(code a, code b) ∈ B : Dp+ (code a, code b) ∩ (S c = 0)} Similarly to [5, 7] we can confirm : P (S) ⊆ S ⊆ P (S), P (S c ) ⊆ S c ⊆ P (S c )
5 Compromise Solution Searching Having several solutions (final objects location lists) we can search compromise basing on minimal cost. Cost is connected with changing objects location on particular ranking lists (fig.12). Describing this algorithm we should notice that it is exact ( giving the optimal solution ), although ineffective (with NPtype of complexity). The assumption : min = maxf low admits guaranteed reduction of minimal costs in successive corrections. Individual loops nesting treat to all objects and can be carried out in any order (generally). If, for example i(3) = 8 , it marks that for third object we accept hypothesis with location on 8 position on final list. The limitations nhi appoint on the norm of location sets, which are taken into account for individual objects as parameters ph(i, j). Additional restriction i(s) = nz refer to excluding occupying previously position (in external cycles: to s − 1 inclusive ) reserved for other objects. Summing up the differences maxap(k, ∗) − ap(k, i(k)) we cumulate divergences between the best potential location for k-th object: maxap(k, ∗), and the present analyzed position ap(k, i(k)). The best configuration of location of objects is stored as elements of vector {t(1), ..., t(n)}, and corresponding its minimum cost as variable min. The algorithms basing on hierarchy of assignment objects location in order, depended on initially carried out criteria and multi criteria analysis, are definitely less complex, and simultaneously giving results farther from optimal. The well-ordered sequences of objects, for example, according to different criteria make up interesting database complex for procedures, which have objective character. It can consist in operation on sets such as summing, multiplying etc. With help of introduced algorithm, we can establish the upper bound of costs, and also well ordered (obviously in
The Type of Preferences in Ranking Lists
625
Fig. 12 Algorihtm for finding minimal cost of assignment objects organization
relation to costs) list of solutions relating to possible realization sequences of objects. The relation of domination in Pareto and Lorenz sense [6] practically permits to schedule objects. Although Lorenz’s domination is based on sum of location positions, it delivers the exact and certain interpretations about given situation. It is common, that this compassed dimension can be equal for two or larger number of objects. In this case we apply the methods of investigation based on different preference type.
6 Conclusions 1. Method of building the zones of theses and antitheses permits creating heuristics of sequential objects selection. 2. The disadvantage of utilizing neutral zone is lack of possibility of regarding the concentration of thesis or antithesis location just in this zone and resulting from this the inference about fast approval such object location on final scheduling list. 3. It is considered indirectly by choosing more and less dominating locations of other objects.
626
P. Henryk and G. Gawinowski
References 1. Blazewicz, J., Lenstra, J.K., Rinnooy Kan, A.H.G.: Scheduling subject to resource constrains: Classification and complexity. Discrete Appl. Math. 5, 11–24 (1983) 2. Brzeziska, I., Greco, S., Slowinski, R.: Mining Pareto-optimal rules wth respect to support and anti-suport. Engeniering Applications of Artificial Inteligence 20(5), 587–600 (2007) 3. Conway, R.W., Maxwell, W.L., Miller, L.W.: Theory of Scheduling. AddisionWesley, Reading (1954) 4. Crupi, V., Tentori, K., Gonzalez, M.: On Bayesian confirmation measures of evidential support, Theoretical and empirical issues. Philosophy of science 5. Finch, H.A.: Confirming Power of Observations Metricized for Decisions among Hypotheses. Philosophy of Science 27, 391–404 (1999) 6. Greco, S., Matarazzo, B., Slowinski, R., Stefanowski, J.: An algorithm for induction of decision rules with dominance principle. In: Rough Sets and Current Trends in Computing. LNCS (LNAI), pp. 304–313. Springer Verlag, Berlin (2005) 7. Greco, S., Matarazzo, B., Slowinski, R.: Axiomatic characterization of a general utility function and its particular cases in terms of conjoint measurement and rough sets decision rules. European J. of Operational Research (2003) 8. Greco, S., Matarazzo, B., Slowinski, R.: Extension of rough set approach to multicriteria decision support. INFOR 38, 161–196 (2000) 9. Greco, S., Matarazzo, B., Slowinski, R.: Rough sets theory for multicriteria decision analysis. European J. of Operational Research 129, 1–47 (2001) 10. Greco, S., Pawlak, Z., Slowinski, R.: Can Bayesian confirmation measures be useful for rough set decision rules? Engineering Applications of Artificial Intelligence 17, 345–361 (2004) 11. Greco, S., Slowinski, R., Szczech, I.: Assesing the Quality of Rules with a New Monotonic Interestingness Measure Z: Artificial Intelligence and Soft Computing ICAISC, pp. 556–565. Springer, Heidelberg (2008) 12. Hilderman, R., Hamilton, H.: Knowledge Discovery and Measures of Interest. Kluwer Academic Publishers, Dordrecht (2001) 13. Jaro, J.: Systemic Prolegomena to Theoretical Cybernetics. Scient. Papers of Inst. of Techn.Cybernetics, Wrocaw 25 (1975) 14. Kent, R.E.: Rough concept analysis: A synthesis of rough sets and formal concept analysis. Fundamanta Informaticae 27, 169–181 (1996) 15. Kleinberg, J.: Navigation in a small words. Nature 406, 845 (2000) 16. Kohler, W.H.: A preliminary evolution of the critical path method for scheduling tasks on multiprocessor systems. IEEE Trans. Comput. 24, 1235–1238 (1975) 17. Nikodem, J.: Autonomy and Cooperation as Factors Dependability in Wireless Sensor Network, vol. P3179, pp. 406–413. IEEE Computer Society, Los Alamitos (2008) 18. Pawlak, Z., Sugeno, M.: Decision Rules Bayes, Rule and Rough, New Decisions in Rough Sets. Springer, Berlin (1999) 19. Pawlak, Z.: Rough Sets. Present State and the Future, Foundations, vol. 18, pp. 3–4 (1993) 20. Piech, H.: Analysis of possibilities and effectiveness of combine rough set theory and neibourhood theories for solving dynamic scheduling problem, vol. P3674, pp. 296–302. IEEE Computer Society, Washington (2009)
The Type of Preferences in Ranking Lists
627
21. Skowron, A.: Extracting lows from decision tables. Computational Intelligence 11(2), 371–388 (1995) 22. Slowinski, R., Brzeziska, I., Greco, S.: Application of Bayesian Confirmation Measures for Mining Rules from Suport Confidence Pareto Optimal Set. In: ICAISC. LNSC, vol. 4029, pp. 1018–1026. Springer, Heidelberg (2006) 23. Szwarc, W.: Permutation flow-shop theory revised, Math Oper. Math. Oper. Res. 25, 557–570 (1978) 24. Syslo, M.M., Deo, N., Kowalik, J.S.: Algorytmy optymalizacji dyskretnej. PWN, Warszawa (1995) 25. Talbi, E.D., Geneste, L., Grabot, B., Previtali, R., Hostachy, P.: Application of optimization techniques to parameter set-up in scheduling. Computers in Industry 55(2), 105–124 (October 2004)
Transaction Management for Inter-organizational Business Process Joonsoo Bae, Nita Solehati, and Young Ki Kang
*
Abstract. Recently inter-organizational standards for electronic commerce such as eAI (e-Business application integration), BPM (business process management) are main topic in research and industry. Especially business processes between organizations are long-lived transactions and have constraints that one organization cannot lock or rollback the computational resources of other organization. The related current researches are focusing on process design and execution of interorganization, such as platform independency, interoperability, etc. But there are not sufficient researches about error recovery when abnormal results are happened. Therefore, this paper proposes a method to maintain integrity of transaction and perform error recovery.
1 Introduction Recently, there are active research on control and management of business process between inter-organization and intra-organization. In order to survive the severe competition of recent business environment, each enterprise must improve efficiency and productivity of not only their own task, but also linked task of external organizations such as business partners or suppliers. The related current researches are focusing on process design and execution of inter-organization, such as platform independency, interoperability, etc. But there are not sufficient researches about error recovery when abnormal results are happened. Especially business processes between organizations are long-lived transactions and have constraints that one organization cannot lock or rollback the computational resources of other organization. Therefore, a method to maintain integrity of long-lived transaction and perform error recovery is needed [2, 3]. This paper will present an error recovery method of inter-organizational business process. If abnormal results are happened during inter-organizational business process through on-line, that process should be recovered to normal status Joonsoo Bae · Nita Solehati · Young Ki Kang Chonbuk National University e-mail: {jsbae,nitasolehati,kykyou}@jbnu.ac.kr *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 629–635. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
630
J. Bae, N. Solehati, and Y.K. Kang
where can start execution again. In order to do error recovery, the necessary dependencies between tasks of process are identified and error recovery algorithms are proposed. If the proposed error recovery algorithms are integrated with existing research on design and execution of inter-organizational business process, more reliable process control and management mechanism can be implemented for e-commerce environment. Thus it will contribute to on-line collaboration between enterprises.
2 Background The term inter-organizational business process means that task processing procedure of e-commerce between independent enterprises, which is different characteristics from intra-organizational business process. In comparison with inner business process which can be controlled and managed by one enterprise, inter-organizational business should agree on capability of transaction and resource allocation issues. Furthermore, agreement on financial/legal responsibility and compensation should be made in case of exceptional situation. Also, more strict security is required because most company uses internet. Since two independent enterprises are linked, there is no direct control between business process engines, but indirect link through message and document [4, 5, 6]. In the view of operation and management, most of all, long-lived transaction is the biggest characteristic. Since inter- or intra- business processes are one unit to perform something, they can be one of transaction, but usually it takes long time to complete because many entities are related in e-commerce. And there are so many computational resources are related, it is principal requirement to maintain transactional integrity like ACID property. Other characteristics of inter-organizational business processes are as follows; first, it is necessary to hide detail process information, all participating enterprises of e-commerce have public processes and its depending private processes. Second, as one enterprise can utilize other company’s service in most collaborative and ecommerce tasks, which is called ‘process outsourcing’, and they can concentrate on the core competitive advantage. This is typical example of information technology application for enhancing competitive power, and this kind of process linkage between enterprises is called ‘nested process’, which is common in the form of node synchronization. Due to nesting of process, the overall architecture of inter-organization business process becomes hierarchical structure in general.
3 Error Recovery Condition for Transaction Integrity The basic control mechanism is based on state transition diagram of component node. Fig. 1 shows the state transition diagram of task node and block node, both of them are component of a process [1].
Transaction Management for Inter-organizational Business Process
631
The possible state of a task node can be ‘Not-Ready’, ‘Ready’, ‘Executing’, ‘Aborting’, ‘Aborted’, ‘Committed’, ‘Compensated’. The initial state of all nodes is ‘Not-Ready’, and if all the preceding tasks are completed, it becomes ‘Ready’ state. Each node can have pre-condition, which have to be satisfied prior to execution. If all pre-conditions are satisfied, the task begins to execute and the state becomes ‘Executing’. If the node has no pre-condition, it begins to execute immediately. If an error is made during execution, all executing tasks are suspended and the state becomes ‘Aborting’, then appropriate error handling is performed according to the situation. An error is made only at the task node, where is executing task actually. All external error messages to ‘Executing’ block node are delivered to ‘Executing’ sub-tasks and then handled by task’s error handling. After all exceptions are handled by task node, the state becomes ‘Aborted’ and the execution will be cancelled. Of course, the tasks are completed without error, the state becomes ‘Committed’. And if a node finished normally and then error is made, the predefined compensating task will be executed in order to undo the state into original state before execution. The state becomes ‘Compensated’.
Fig. 1 State transition diagram of task or block node
The error recovery of inter-organizational is mainly through message delivery including error information. All of error recovery execution conditions are different according to the type of inter-organizational business process.
3.1 Connect Discrete or Chained Process The chained process type can be transformed to Serial block which has sub-node in sequence, that is, only after preceding participant processes are completed, the next participating process can be executed, see Fig. 2 [1, 7].
632
J. Bae, N. Solehati, and Y.K. Kang
Serial SerialA
SerialB
CTA
CTB
Fig. 2 Error Recovery execution condition of chained process
But since the error propagation between process A and process B is through message, and each participant process is independent of individual enterprise, process B can not execute immediately after process A completion without any condition. Also even though compensating transaction of process B complete, compensating transaction of process A can not begine immediately without condition. These relationship is shown by weak begin-on-committed dependency (WBCD). Also, when the whole process is cancelled during execution of process B, the precedence of compensating transaction of process A and B is not important. Therefore, each compensating transaction of completed task has begin-on-aborting dependency (BaD) individually. And since the Serial block has weak aborted-oncommitted dependency (WACD) to individual compensating transaction of participant process, the whole process can be cancelled only after all compensating transaction of individual process are completed.
3.2 Hierarchical or Nested Process Fig.3 shows that a nesting node of super-process A has the uppermost Serial block of sub-process B as nested process. Since the execution or cancel of sub-process is dependent on nesting node of super-process, all of the command about execution and compensation are through message exchange.
T Nested Serial Block Fig. 3 Error Recovery execution condition of nested process
Transaction Management for Inter-organizational Business Process
633
The conditions of error recovery execution of nested process are show at Fig. 3. The nested sub-process has weak begin dependency (WBD) to nesting task ‘T’ of super process. Therefore, if super node ‘T’ is executed, the corresponding event is delivered to process B through message, the business process management system of process B can decide whether process B will be executed or not based on the delivered message. Also, since nested sub-process has aborting dependency (aD) to super node ‘T’, if process B receives ‘Aborting’ message during its execution from super node ‘T’ because of error, process B has to make internal error immediately and cancel the execution after undo by calling compensation task. But reverse is not the same. Even though the message that nested sub-process becomes ‘Aborting’ or is cancelled by compensating or complete normally is delivered to node “T’, the super process does not take action immediately. The super process will check the status based on the message and then will take appropriate action. This relationship from node ‘T’ to sub-process uppermost Serial block can be defined as weak aborting dependency (WaD), weak aborted dependency (WAD), weak committed dependency (WCD).
4 Error Recovery Algorithm In section 3, error recovery conditions are presented according to process types in order to maintain transaction integrity at inter-organizational environment. Now the error recovery algorithm to keep execution conditions will be proposed. We already defined that all errors are made from task node not block node, the algorithm will be different depending on the block type. But in this paper, only ANDparallel block type will be dealt because of page limit.
Æ Æ Æ
Æ Æ
Æ
Fig. 4 Error Recovery of AND-parallel block
634
J. Bae, N. Solehati, and Y.K. Kang
4.1 Error Recovery Algorithm of AND-Parallel Block The whole error recovery algorithm for AND-parallel block is shown at Fig. 4. If one component task makes an error during execution, the task’s state becomes ‘Aborting’ and makes its block and executing sibling nodes ‘Aborting’ state. That is because of AND-parallel characteristics that if even one task is cancelled, then whole block will be cancelled. The next step is to propagate this error to external synchronization node or sub-process of other enterprise. After the error recovery at the current task node has finished, all sibling nodes should be ‘Aborted’ or ‘Compensated’. If not, error recovery of sibling nodes is needed. At this time, exception handling is need when the state is ‘Aborting’ and compensation task is needed when stat is ‘Aborted’. If all sibling nodes finish error recovery, the last step is checking if it is recovery destination or not. If yes, error recovery algorithm will terminate for this block or propagate this error state to super block and handle the same action recursively.
5 Conclusion and Future Research The main contents of this paper are as follows. First, the execution conditions of error recovery to compensate errors by keeping transaction integrity are identified. An extended ACTA formalism is proposed and used to represent dependencies between tasks and blocks, and extracted execution conditions for error recovery by maintaining transaction integrity per each process type in inter-organization. Second, error recovery execution algorithms are proposed by observing the above execution conditions. By presenting different error recovery execution algorithm for each different block type, the automatic error recovery can be executed in the inter-organizational business process. The developed algorithms can be integrated into design and execution of inter-organizational business processes, then collaborative e-commerce can be developed using reliable business process control and management mechanism. This can encourage online collaboration and information utilization between enterprises. Future research can be, first, the proposed algorithm and execution condition should be validated if they are efficient or transaction integrity is guaranteed by using mathematical model, simulation or programming language. Second, since the real enterprise environment is much more complex, more than two errors can be happened simultaneously. So the extended algorithms are needed to handle more than two errors at the same time.
Acknowledgement This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2010-0025650).
Transaction Management for Inter-organizational Business Process
635
References 1. Bae, J., Bae, H., Kang, S.-H., Kim, Y.: Automatic control of workflow processes using ECA rules. IEEE Transactions on Knowledge and Data Engineering (2003) 2. Dayal, U., Hsu, M., Ladin, R.: A transaction model for long-running activities. In: The Sixteenth International Conference on Very Large Databases, pp. 113–122 (August 1991) 3. Elnozahy, E.N., Alvisi, L., Wang, Y.-M., Johnson, D.B.: A Survey of RollbackRecovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002) 4. Luo, Z., Sheth, A., Kochut, K., Arpinar, B.: Exception Handling for Conflict Resolution in Cross-Organizational Workflows. Distributed and Parallel Databases 13, 271–306 (2003) 5. Chiu, D.K.W., Karlapalem, K., Li, Q., Kafeza, E.: Workflow view based e-contracts in a cross-organizational e-services environment. Distributed and Parallel Databases 12, 193–216 (2002) 6. Casati, F., Discenza, A.: Modeling and managing interactions among business processes. Journal of Systems Integration 10, 145–168 (2001) 7. Chrysanthis, P.K., Ramamritham, K.: ACTA: The SAGA continues. In: Database Transaction Models for Advanced Applications, pp. 354–397. Morgan Kaufmann Publisher, San Manteo (1995)
Trend-Extraction of Stock Prices in the American Market by Means of RMT-PCA Mieko Tanaka-Yamawaki, Takemasa Kido, and Ryota Itoi
*
Abstract. We apply the RMT-PCA, recently developed PCA in order to grasp temporal trends in a stock market, on the daily-close stock prices of American Stocks in NYSE for 16 years from 1994 to 2009 and show the effectiveness and consistency of this method by analyzing the whole data at once, as well as analyzing the cut data in various partitions, such as two files of 8 year length, four files of 4 year length, and eight files of 2year length. The result shows a good agreement to the actual historical trends of the markets. We also discuss on the internal consistency among the results of different time intervals. Keywords: RMT-PCA, Correlation, Eigenvalues, Principal Component, Stock Market, Trend.
1 Introduction The random matrix theory (RMT) is now a big issue in many fields of sciences [110]. In particular, the use of asymptotic formula of the eigenvalue spectrum of cross correlation matrix between independent time series of random numbers [11,12], as a reference to the corresponding spectrum derived from a set of different stock price times series in order to extract principal components effectively in a simple way [1316], has attracted much attention in the community of econo-physics [17, 18]. The main advantage of this method as a principal component analysis is its simplicity. While the standard PCA tells us to find the largest PC and subtract this component from the entire data, and apply the same procedure recursively on the remaining data one by one, RMT-based PCA can present all the "non-random" components at once by subtracting the RMT formula from the eigenvalue spectrum of cross correlation matrix. Plerau, et. al. [14] was one of the first attempts to apply this technique on stock price time series. By using the daily close stock prices of NYSE/S&P500, they successfully extracted eminent stocks out of massive data of price time series. However, this method suffers from two difficulties. One is the restriction on the dimensionality, N, and the length of the data, T, such that N < T. Moreover, the entire Mieko Tanaka-Yamawaki · Takemasa Kido · Ryota Itoi Department of Information and Knowledge Engineering Graduate School of Engineering Tottori University, Tottori, 680-8552 Japan e-mail: [email protected] *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 637–646. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
638
M. Tanaka-Yamawaki, T. Kido, and R. Itoi
set of N times T data are needed for analysis, since the basic quantity of analysis is the cross correlation matrix whose elements are the equal-time inner-products between a pair of stocks. Another difficulty is the restriction of the parameter size. Since the RMT formula is derived in the limit of N and T being infinity, we need a special care to keep the range of the parameters in which the RMT formula is valid. By using machine-generated random numbers, such as rand(), etc., we have tested the validity of the RMT formula in various range of N and T, and have clarified that N=300, or larger, is the safe range unless T is not too close to N, and the validity decreases for smaller N, and the borderline is around 50
2 Eigenvalue Problem of Correlation Matrix for Stock Prices We shall briefly review the outline of the methodology used in RMT-PCA. The first step is to prepare the price time series into an N×(T+1) matrix named S, whose i-th row contains the price time series of length T+1. This matrix S is converted into a matrix of log-return as follows.
r ( t ) = log(S( t + Δt )) − log(S( t ))
(1)
We normalize each time series in order to have the zero average and the unit variances as follows. r ( t ) − < ri > x i ( t) = i (i=1,…,N) (2) σi The correlation Ci,j between two stocks, i and j, can be written as the inner product of the two log-profit time series, x i (t ) and x j (t ) ,
1 T ∑ x (t) x j (t ) (3) T t =1 i Here the suffix i indicates the time series on the i-th member of the total N stocks. C i, j =
Trend-Extraction of Stock Prices in the American Market by Means of RMT-PCA
639
The correlations defined in Eq. (3) makes a symmetric (Ci,j = Cj,i), square matrix whose diagonal elements are all equal to one (Ci,i = 1 ) and off-diagonal elements are in general smaller than one (|Ci,j| 1 ). As is well known, a real symmetric matrix C can be diagonalized by a similarity transformation V-1CV by an orthogonal matrix V satisfying Vt=V-1, each column of which consists of the eigenvectors of C.Such that
≤
C v k = λ k vk (k=1,…,N)
(4)
where the coefficient λk is the k-th eigen-value and v k is the k-th eigen-vector. A criterion proposed in Ref. [3-6] and examined recently in many real stock data is to compare the result to the formula derived in the random matrix theory [1]. According to the random matrix theory (RMT, hereafter), the eigenvalue distribution spectrum of C made of random time series is given by the following formula[2], illustrated in Fig.1 for the case of Q=3. PRMT (λ) =
Q (λ + − λ)(λ − λ − ) 2π λ
(5)
in the limit of N → ∞, T → ∞, Q = T / N = const . where T is the length of the time series and N is the total number of independent time series (i.e. the number of stocks considered). This means that the eigenvalues of correlation matrix C between N normalized time series of length T distribute in the following range.
λ− < λ < λ+
(6)
Following the formula Eq. (5), between the upper bound and the lower bound given by the following formula. (7) λ ± = (1 ± Q −1/ 2 ) 2 The proposed criterion in our RMT_PCM is to use the components whose eigenvalues, or the variance, are larger than the upper bound λ + given by RMT. (8) λ+ < λ
Fig. 1 The RMT formula of eigen-value distribution in Eq.(5) for Q=3.
640
M. Tanaka-Yamawaki, T. Kido, and R. Itoi
3 Application of RMT-PCA on the Stock Prices We prepare N normalized stock returns of the same length T, which makes a rectangular matrix of Si,k where i=1,…,N represents the stock symbol and k=1,…,T represents the traded time of the stocks. The i-th row of this price matrix corresponds to the price time series of the i-th stock symbol, and the k-th column corresponds to the prices of N stocks at the time k. We summarize the algorithm that we used for extracting significant principal components in Fig. 2, and show an example of the result in Fig. 3. Algorithm of RMT_PCM: (1) Select N stock symbols for which the traded price exist for all t=1,…,T, corresponding to all the working days of that term. (2) Compute log-return r(t) for the selected N stocks. Normalize the time series to have mean=0, variance=0, for each stock symbol, i=1,…, N. (3) Compute the cross correlation matrix C and obtain eigenvalues and eigenvectors. (4) Select eigenvalues λ larger than λ + , the upper limit of the RMT spectrum, Q (λ + - λ)(λ - - λ ) , where λ ± = (1 ± Q 1 / 2 ) 2 2πλ and identify those eigenstates as the principal components. (5) Sort the eigenvector components corresponding to the eigenvalues identified in the step (4) above, in the descending order and identify the business sectors of the largest 20 components. If those 20 components belong to any particular sector, that is the leading sector in that term. PRMT (λ) =
Fig. 2 The algorithm to extract the significant principal components in RMT-PCA
Fig. 3 A result of RMT-PCA applied on stock prices (solid line) is compared to the corresponding formula derives by RMT, in the case of Q=3.5 (dashed line). The first and the second eigenvalues are shown in the inner window.
Trend-Extraction of Stock Prices in the American Market by Means of RMT-PCA
641
However, a detailed analysis of the eigenvector components tells us that the random components do not necessarily reside below the upper limit of RMT, λ+, but percolate beyond the RMT due to extra randomness added in the process of computing the log-return in Eq. (1) [19, 21]. Based on extensive numerical analysis, this percolation always occurs and the maximum front of the continuum spectrum extends to about 20% larger than the upper limit λ+ of RMT. This fact suggests us that the upper limit λ+ is not appropriate to separate the signal from the noise due to the percolation of the random spectrum over λ+ but an effective upper bound λeff =1.2 λ+ about 20% larger than the upper limit λ+ of RMT. Then λ+ in the step (4) of the RMT-PCA algorithm in Fig. 2 is to be replaced by λeff [22]. We shall discuss this point in more detail in our future work.
4 Trends Extracted as the Eminent Components of Eigenvectors We applied the algorithm stated in Chapter 3 on the daily-close prices of American stocks listed in S&P500, for 16 years from 1994 to 2009. At first, the entire data of this period are used for analysis. Then the entire data is split to 2 parts, 1994-2001 and 2002-2009. Those are further split to 4 parts, 1994-1997, 1998-2001, 2003-2005, 2006-2009. Finally, they are split to 8 parts of 2years data, 1994-1995, 1996-1997, 1998-1999, 2000-2001, 2002-2003, 20042005, 2006-2007, 2008-2009. The results are listed in Table 1. Table 1 Results for 16, 8, 4 year data (Eigenvalues larger than 2λ+ are highlighted in bold-Italic)
N T Q λ+ λ1 λ2 λ3 λ4 λ5 λ6 λ7 λ8 λ9 λ10 λ11 λ12
94-09 373 3961 10.6 1.7 74 11 8.8 7.7 5.1 4.3 3.3 2.9 2.5 2.4 2.0 1.9
94-01 373 2015 5.40 2.1
41 13 8.8 6.9 4.8 4.2 3.5 3.1 2.7 2.2 2.2 2.1
02-09 464 1946 4.19 2.2
150 15 12 11 6.5 5.1 3.8 3.4 3.3 2.8 2.4 2.3
94-97 373 1010 2.71 2.6
98-01 419 1002 2.17 2.8
02-05 464 1006 2.17 2.8
37.2 8.7 5.8 4.6 3.3 3.2 2.8 2.6 2.4 2.4 2.3 2.3
53 19 13 9.2 6.6 5.8 4.7 4.2 3.8 3 2.8 2.7
116 14 13 9.1 6.3 5.3 4.8 4.6 4.0 3.3 2.9 2.9
06-09 468 936 2 2.9 200 18 14 8.9 5.3 5.0 4.4 3.5 3.2 2.7 2.7 2.5
642
M. Tanaka-Yamawaki, T. Kido, and R. Itoi
According to the step (4) in the RMT-PCA algorithm in Fig. 2 and Table 1, those 12 eigenstates can be identified as the principal components, based on the condition λ > λ + = 2.1 for 1994_2001 data, for example. However, we can effectively reduce the number of principal components by taking the information from the corresponding eigenvectors into account, after completing the algorithm to the final step (5). Namely, we identify the business sectors of the companies of 20 largest components in the corresponding eigenvectors in Step (5). If those components are concentrated in any particular business sectors, we identify those sectors as the trend makers during that time period. It is well known that all the components of the first eigenvector corresponding to the largest eigenvalue are equal-signed, and the corresponding sectors are evenly distributed to many sectors, implying that the largest principal component corresponds to the representative index of the market, such as S&P500. Unlike the first principal component, the other eigenvectors have components of both positive and negative signs. It is also known that the positive components and the negative components belong to the two separate business sectors, if they are strongly concentrated to particular sectors. We particularly focus ourselves on the 2nd principal component, which most typically reflects the trend of the time period of the data if any concentration of the sectors occurs. Following the GICS (Global Industry Classification Standard) coding system, we classify the business sectors of stocks into the following 10 categories, which we represent by a single capital letter A-J. A:Energy, B:Materials, C:Industrials, D:Service, E:ConsumerProducts, F:HealthCare, G:Financials, H:InformationTechnology, I:Telecommunication, and J:Utility. If we take λeff instead of λ+, as we explained in the last paragraph of Chapter 3, then we have 10 eigenstates corresponding to the eigenvalues λ1=74.3,..., λ10 = 2.41, the fewer principal components than the above-stated 12. However, we notice that the concentration of business sectors in the eigenvector components occurs only for the 4-5 largest eigenvalues and the concentration quickly becomes blur for 6th or later eigenvalues. Based on this observation, we might increase λeff to the range of λeff =2λ+, 100% larger than the theoretical criterion. Whether we take λeff=1.2λ+ or λeff=2λ+ is not so much of the problem, as we focus only on the 2nd principal component and a few more followers. The 8 bars in Figs. 4 - 7 correspond to v2(+),v2(-),v3(+),v3(-),v4(+),v4(),v5(+),v5(-), where vk(+) / vk(-) indicates the positive-sign part / negative-sign part of the vector of k-th principal component. Each bar is partitioned to 10 sectors denoted by A-J and the corresponding eigenvalues and the sign of the components are written below each bar. We observe from the graphs in Fig. 4 that the sector H(InfoTech) dominates the (+) components of v 2 and the sector J(Utility) dominates the (-) components of v 2 . The result of 8 years data, 1994-2001 and 2002-2009 are shown in Fig. 5, the left figure of which shows the dominance of J(Utility) and H(InfoTech) during the term 1994-2001, and the right figure shows the dominance of A(Energy) and
Trend-Extraction of Stock Prices in the American Market by Means of RMT-PCA
643
Fig. 4 Trends of 16 years from 1994 to 2009 are shown. The sector H (Information Technology) and J(Utility) are the most eminent sectors in this period.
Fig. 5 Trends of 8 years, 1994-2001(left) and 2002-2009(right). In 1994-2001, the sector J (Utility) and H (Information Technology) dominate, but in 2002-2009, A(Energy) and G(Financial) dominate the market.
Fig. 6 Trends of 4 years each are shown. Both in 1994-1997 and 1998-2001, J(Utility) and H(IT) dominate, while A(Energy) and H(IT) dominate in 2002-2005 and A (Energy) and G(Financial) dominate in 2006-2009.
644
M. Tanaka-Yamawaki, T. Kido, and R. Itoi
G(Financials) during the term 2002-2009. This means the active sector has changed from J(Utility) and H(InfoTech) to A(Energy) and G(Financials) at the turn of the century. The results of 4 year data, 1994-1997, 1998-2001, 2002-2005, and 2006-2009 are shown in Fig.6, indicating the dominance of J(Utility) and H(InfoTech) both in 1994-1997 and 1998-2001, the dominance of A(Energy) and H(InfoTech) in 2002-2005, and A(Energy) and G(Financials) dominance in 2006-2009. The corresponding result of 2 year data is shown in Fig. 7. No clear structure is seen after 2002, except weak dominance of G(Financials) and A(Energy).
Fig. 7 Trends of 4 years each are shown. Both in 1994-1997 and 1998-2001, J(Utility) and H(IT) dominate, while A(Energy) and H(IT) dominate in 2002-2005 and A (Energy) and G(Financial) dominate in 2006-2009.
Trend-Extraction of Stock Prices in the American Market by Means of RMT-PCA
645
5 Conclusion and Discussion Our results have shown that the trend of each time period can be successfully depicted by the concentrated business sectors in the positive / negative components of the eigenvector corresponding to the 2nd principal components. Although the condition λ > λ + dramatically reduces the number of principal components compared to the conventional method of PCA, we have further reduced the number of principal components by using the effective condition λ > λ eff . Moreover, our method is considerably simple with much shorter in process to extract principal components, which is a great advantage in the case of analyzing the stock market. The conventional PCA tells us to extract the largest principal component and subtract this element from the entire data, and apply the same procedure recursively on the remaining data one by one. This kind of method requires a lot of computational time and is not suitable for analyzing a system of the large dimension, such as a set of stocks in the market. Another method of PCA uses the eigenvalues of the correlation matrix of times series, but tells us to pick up the components whose eigenvalues are larger than one, or the accumulated sum of eigenvalues exceeds 80 percent of the total sum, etc. Neither one is suitable for analyzing the stocks in the market, since the number of principal components thus obtained usually exceeds 100 for N=400-500, while the RMT- PCA has derived the number of principal components in the range of 5-13 in our lesson in Chapter 4 in this paper. We illustrate this point in Fig. 8.
<<
Fig. 8 The advantage of RMT-PCA (left) offering smaller number of principal components compared to the method of 80 percent accumulative eigenvalues (right)
References 1. Mehta, M.L.: Random Matrices, 3rd edn. Academic Press, London (2004) 2. Edelman, A., Rao, N.R.: Acta Numerica, pp. 1–65. Cambridge University Press, Cambridge (2005) 3. Bai, Z., Silverstein, J.: Spectral Analysis of Large Dimensional Random Matrices. Springer, Heidelberg (2010) 4. Tao, T., Vu, V.: Random matrices: Universality of ESD and the Circular Law (with appendix by M. Krishnapur). Annals of Probability 38(5), 2023–2065 (2010) 5. Beenakker, C.W.J.: Random-matrix theory of quantum transport. Reviews of Modern Physics 69, 731–808 (1997)
646
M. Tanaka-Yamawaki, T. Kido, and R. Itoi
6. Kendrick, D.: Stochastic Control for Economic Models. McGraw-Hill, New York (1981) 7. Bahcall, S.R.: Random Matrix Model for Superconductors in a Magnetic Field. Physical Review Letters 77, 5276–5279 (1976) 8. Franchini, F., Kravtsov, V.: Horizon in random matrix theory, the Hawking radiation and flow of cold atoms. Physical Review Lett. 103, 166401 (2009) 9. Peyrache, A., et al.: Principal component analysis of ensemble recordings reveals cell assemblies at high temporal resolution. Journal of Computational Neurosience 29, 309–325 (2009) 10. Sánchez, D., Büttiker, M.: Magnetic-field asymmetry of nonlinear mesoscopic transport. Physical Review Letters 93, 106802 (2004) 11. Marcenko, V.A., Pastur, L.A.: Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik 1(4), 457–483 (1994) 12. Sengupta, A.M., Mitra, P.P.: Distribution of singular values for some random matrices. Physical Review E 60, 3389 (1999) 13. Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A.N., Stanley, H.E.: Random matrix approach to cross correlation in financial data. Physical Review E 65, 066126 (2002) 14. Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L.A.N., Stanley, H.E.: Physical Review Letters 83, 1471–1474 (1999) 15. Laloux, L., Cizeaux, P., Bouchaud, J.-P., Potters, M.: American Institute of Physics 83, 1467–1470 (1999) 16. Bouchaud, J.-P., Potters, M.: Theory of Financial Risks. Cambridge University Press, Cambridge (2000) 17. Mantegna, R.N., Stanley, H.E.: An Introduction to Econophysics: Correlations and Complexity in Finance. Cambridge University Press, Cambridge (2000) 18. Iyetomi, H., et al.: Fluctuation-Dissipation Theory of Input-Output Interindustrial Relations. Physical Review E 83, 016103 (2011) 19. Tanaka-Yamawaki, M.: Extracting Principal Components from Pseudo-random Data by Using Random Matrix Theory. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010. LNCS, vol. 6278, pp. 602–611. Springer, Heidelberg (2010) 20. Tanaka-Yamawaki, M.: Cross Correlation of Intra-day Stock Prices in Comparison to Random Matrix Theory. In: Intelligent Information Management (2011), http://www.scrp.org 21. Yang, X., Itoi, R., Tanaka-Yamawaki, M.: Testing Randomness by Means of RMT Formula. In: Proceedings (KES-IDT 2011), pp. 589–596 (2011) 22. Arai, Y., Okunishi, K., Iyetomi, H.: Numerical Study of Random Correlation Matrices: Finite-Size Effects. In: Proceedings (KES-IDT 2011), pp. 557–566 (2011)
Using the Rough Set Theory to Investigate the Building Facilities for the Performing Arts from the Performer’s Perspectives Betty Chang, Hung-Mei Pei, and Jieh-Ren Chang
*
Abstract. In this study, the Rough Set Theory (RST) is used to investigate the building facilities for the performing arts. Six condition attributes being employed to investigate the building facilities for the performing arts include clear sound effect, stage light, air quality, stage space, backstage facilities and performance equipment. area total of 140 decision rules were derived by this research. Based on the decision rules, most performers agree that good building facilities for the performing arts should possess characteristics fulfilling the requirements of clear sound effect, appropriate stage lighting, good air quality, suitable stage space, nice backstage facilities and performance equipment. In general, based on the decision rules, music performers demand lower on the stage space, drama performers demand lower on clear voice while dance performers demand lower on backstage facilities. The illustration results imply that performers of different types have different demands for building facilities. The results can serve as basis for designing future building facilities of performance arts. Keywords: Rough Set Theory, Building Facilities, Performance Arts.
1 Introduction Well-designed cultural related exhibition and performing arts facilities (CREPAFs) improve cultural literacy of the public as well as foster local characteristics. Betty Chang Graduate Institute of Architecture and Sustainable Planning, National Ilan University No. 1, Sec. 1, Shen-Lung Road, Yilan City, Taiwan 26047 e-mail: [email protected] *
Hung-Mei Pei Graduate Institute of Architecture and Sustainable Planning, National Ilan University No. 1, Sec. 1, Shen-Lung Road, Yilan City, Taiwan 26047 e-mail: [email protected] Jieh-Ren Chang Department of Electronic Engineering, National Ilan University No. 1, Sec. 1, Shen-Lung Road, Yilan City, Taiwan 26047 J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 647–657. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
648
B. Chang, H.-M. Pei, and J.-R. Chang
The Sydney Opera House is one of the typical example. CREPAFs provide space for public participation in cultural and art activities, perform the functions of social education and leisure recreation (Hsieh and Lai 2005), and impact the urban economic development as well. Various CREPAFs can fulfill the public’s demands for art activities and to expand art education (Hsieh et al. 2006). Building facilities for the performing arts determine the success or failure of a performance. A well-equipped building for the performance arts complements a performance. On the other hand, the best performers cannot bring their talent into the full play in the building facilities for the performing arts in short of facilities. There are two types of cultural facilities according to the nature of activities, namely, the space for exhibition and the space for the performance arts (Hsieh et al. 2006). An indoor cultural facility must be non-profit and open to the public. Besides, it must have regular art related activities such as performances or exhibitions (Kaple et al. 1998). Basically, cultural facilities include concert halls, opera houses, auditoriums, art spaces, theaters, music rooms, town halls, performance centers and so on (Beranek 2007). In this study, buildings for the performing arts such as concert halls, opera houses, auditoriums, art spaces, theaters, music rooms town halls, and performance centers were investigated. Performers are the most frequent users of buildings for the performance arts, some of which such as backstage makeup rooms, rehearsal spaces, lifting equipment and curtains are only accessible to performers instead of the audience. Therefore, an investigation regarding to how buildings for the performance arts facilities can satisfy the requirements of performers will be very important. Most studies investigated buildings for the performance arts facilities from the audience’s point of view (Sintas and Alvarez 2005; Hsieh and Lai 2005; Hsieh et al. 2006). This research, however, approaches this issue from performers’ viewpoint by introducing the Rough Set Theory for deriving rules for future building facilities designs for the performance arts. The research results can serve as a basis for future building designs for the performance arts. In the Second Section, literature on building facilities for the performance arts will be reviewed. The third Section focuses on interpretation of research method and procedure. The analytic process and results will be given in the fourth Section. The fifth Section discusses the findings and summarize the whole paper.
2 Literature Review In the United States, cities used arts investments to stimulate city development (Penne and Shanahan 1987). Organizations in either of private and public sectors focused on flexibility and cost reduction these years (Ancarani and Capaldo 2005). The four characters of the stage performance space include sporadic, popular, snob, and omnivorous consumers (Sintas and Alvarez 2005). The customers’ experiences about the performance arts include the physical surroundings of the building facilities for the Performing Arts and the service personnel (or actors) (Hume and Winzar 2004). Different service quality will result in different service experience for the customers (Silvestro and Johnston 1990).
Using the Rough Set Theory to Investigate the Building Facilities
649
In an auditorium, all the audience should be able to hear and watch a performance. On the other hand, the performers should also be able to command the audience. For a building for performance arts, stage light, stage audio and broadcasting positions are required to be included. In addition, the whole performance area should be monitored from the control rooms (Appleton 2008). Although the subjective acoustic character of a hall is often considered as a whole, audience from different seating areas have different feeling about the sound effect (Barron 2006). An video projection is necessary for an auditorium. In addition, a standard acoustic system nowadays must be connected to the Internet (Julie 2008). Japanese Architecture Association suggested that the stage department of a professional building for the performance arts should include the performance space (main stage), prepared performance space (e.g. side stage, after stage, etc.), support performance space (e.g. dressing room, rehearsal space, props preparation space, etc.), and the performance technology space (e.g. lighting, sound control room, etc.) (Tanabe 1981). While designing the platforms or stages in Buildings for the performing arts, the major factors including the performance area, the stage basement, the side and rear stages, the safety curtain, the performers’ access to the stage, the scenery access to the stage, the flytower, and the suspension should be considered (Appleton 2008). In the stage design, the “stage light, sound, and media technology” part include the stage light, electrical power requirements, and audio systems while in the backstage planning, “backstage rooms” are the most important part which include offices, the wardrobe, hair-and-makeup and props, dressing rooms, break rooms, rehearsal spaces, stage adjacencies, and etc. (Hardy 2006). Based on the literature review results, six attributes were summarized Table 1, namely, clear voice, stage light, air quality, stage space, backstage facilities, and performance equipment. These attributes were used in this study. All the attributes were included in the questionnaire being designed to evaluate the performers. In addition, introductions to attributes were also included in the questionnaire for serving as the basic information to the respondents. Table 1 Attributes for Performance Facilities Researcher
Year
Kavgic. M. Appleton. I. Beranek. L. Hardy. H. Tanabe. T.
2008 2008 2007 2006 1981
Stage Audio
Stage Light
●
●
Air Quality
Stage Space
Backstage Facilities
Performance Equipment
● ●
●
●
●
● ●
● ●
●
●
3 The RST The RST is a mathematical approach for data analysis. The RST based analytic procedures provide different mathematical approaches to manipulation of vagueness
650
B. Chang, H.-M. Pei, and J.-R. Chang
and uncertainty as well as determinations of importance versus each attribute (Pawlak 1982; Chang and Hung 2010; Shyng et al. 2007). The RST comprises the ideas of lower approximation and upper approximation and has the ability to classify data (Pawlak 1991). The concepts of the RST such as the information system, the lower and upper approximations, the core and reduct of attributes, as well as the decision rules and the decision table will be illustrated in the following sub-sections.
3.1 Information System An information system can be regarded as IS, IS = (U, A), U stands for a universe which consists of finite objects. Each attribute a (attribute a belonging to the considered set of attributes A ), however, defines an information function , where set is composed of a values, called attributes a domain. Object set and variable set were U={ , , … , }, A={ , , }; The attributes domain is ={1,2,3}, ={1,2}, ={1,2,3,4} is a collection of attributes values. Construction of basic set is the first step in rough set classification, On the set of attributes B A Ind (B) the indiscernibility relation, if b (xi) = b (xj), set by the attribute B, the two objects and have the indiscernibility relation. The U of any unit , Ind(B) of the is the basic set B, to [ ]Ind(B) (Walczak and Massart 1999).
:
3.2 Lower and Upper Approximations Suppose S is a data table, and X a non-empty subset of U. The formula for P-upper approximation and P-lower approximation of X in S, are as follows respectively:
{
}
The elements of are those objects which are of the equivalence classes originated by the indiscernibility relation , included in X; the elements of X are those objects which are a part of the equivalence classes originated by the indiscernibility relation , including one or more objects belonging to X. is the largest union of the P-elementary sets contained in X, while is the smallest union of the P-elementary sets that including X. • The P-boundary of X in S is • . (Greco et al. 2001).
.
3.3 Core and Reduct of Attributes The concept of core and reduct attribute sets is one of the basics of Rough Sets Theory. Suppose that C, D , C is a set of condition attributes and D is a set of decision attributes. If C' is a minimal subset of C and (C, D) = (C', D), we can
Using the Rough Set Theory to Investigate the Building Facilities
651
say that C', C is a D-reduct. Besides, a D-core is the intersection of all Dreducts. The core is included in every reduct since it is the intersection of all reducts. In other words, each element of the core belongs to some reducts. In sum, the core is the most significant subset of attributes. If any element of this subset is removed, the classification ability of attributes would be affected consequently (Pawlak and Skowron 2007).
3.4 Decision Rules If set C contains condition attributes, and D contains decision attributes, assume that C and C , the information table can be seen as a decision table. The decision table S , , , , the d-elementary sets in S are called decision classes. Suppose that D is a singleton D , it does not decrease the generality of other considerations (Dimitras et al 1999). The major steps of constructing a decision table are as follows: 1. 2. 3. 4.
construct the elementary sets in D-space first, calculate the upper and lower approximations of the elementary sets in D, find the D-core and D-reducts of set A of attributes, find the D-core and D-reducts of attribute values of set A (Walczak and Massart 1999).
4 Research Design The major users of buildings for the performance arts are audience and performers. The functionality of performance facilities has a direct influence on performers’ feelings and an indirect one on the quality and effects of performances. This study aims at prioritizing facilities and investigating the potential needs for facilities and equipments from the performers’ viewpoints. Questions regarding to attributes of facilities in buildings for performance arts are included in the questionnaire. Questionnaire respondents all have performing experiences; that is, the respondents have experiences using buildings for the performing arts so as to attain the goal of the questionnaire.
4.1 The Empirical Study Process The questionnaire is divided into two sections: (1) the evaluation of attributes of performance facilities; (2) the basic information of the subjects. In the first section, seven attributes were investigated while six of them are condition attributes and another one is the decision attribute. The six condition attributes are stage audio (attribute 0), stage light (attribute 1), air quality (attribute 2), stage space (attribute 3), backstage facilities (attribute 4), and performance equipment (attribute 5). The condition attribute is evaluated based on the three-point scale: 3 (satisfying), 2 (neural) and 1 (unsatisfying), while the decision attributes is evaluated on a three-level scale: A (good), B (general) and C (poor). Aforementioned are the basis of establishing the information system. In the second section, the
652
B. Chang, H.-M. Pei, and J.-R. Chang
subjects fill out their basic information: their gender, type of performance, age, seniorities for the performance, level of education and career. This study aims at obtaining the cores and reducts of conditions attributes in this model, so as to establish effective decision rules for constructing buildings for the performing arts. A total of 256 questionnaires were received, among which 205 of them were valid (please refer to Table 2). Table 2 Valid questionnaire data Type of performance Music Drama Dance Total
Number of objects 42 63 100 205
Percentage 20% 31% 49% 100%
4.2 Analytic Results Hong and Chen (1999) used 90% of 150 data for training and 10% of the data for testing in Fussy system. In a study using the RST to analyze insurance market, about 85% of the data was used for training and 15% of the data was used for testing (Shyng et al., 2007). Chang et al.(2010) employed 90% of the data for training and 10% of the data for testing in RST for supplier selection. In this study, out of the 205 sample data, 184 model-training samples and 21 model-testing samples are randomly selected to increase the accuracy of models of the categories. After the calculation of core and reduct of attributes, important variables are determined to list out the minimum attributes set and establish the decision attributes rules. The MATLAB R2009b is applied as the platform for the RST programming. By using the RST algorithm, we developed input, output and RST program modules to investigate the building facilities for the performance arts. Condition attributes of the attributes set in the information system must be determined to obtain the decision attributes values of the buildings for the performing arts facilities. The combinations of the condition attributes values have a great impact on the outcome. Information being obtained from the respondents was presented in Table 3. Accuracy of approximation of classes in the classification is then able to be determined. The class number consists of Class 1, Class 2 and Class 3, which represent the three decision attributes A (Good), B (General) and C (Poor) respectively. Accuracy is also shown in this table. After confirming the accuracy of approximation of the data, we can thereby examine whether the set of attributes is independent or not. Table 3 Classification accuracy of Decision attribute Class number Class 1 Class 2 Class 3 Total
Number of objects
Lower approx.
Upper approx.
Accuracy of approximation.
84 82 39 205
82 79 38 N. A.
85 85 41 N. A.
0.964 0.924 0.926 0.940
Using the Rough Set Theory to Investigate the Building Facilities
653
In order to examine whether the set of attributes is independent or not, we test whether the removal of each attribute can influence the number of basic sets in the information system. The total number of decision rules and number of decision rules after removing each condition attribute are demonstrated in Table 4. Compared with the original attribute sets, variances of a0 (8%), a1 (6%), a2 (11%), a3 (10%), a4 (11%) and a5 (7%) can be observed. These are also the indispensible sets of attributes (core). As shown in the result, after eliminating any of the conditions attributes, the numbers of the attributes set do not amount to 140. Therefore, a0, a1, a2, a3, a4 and a5 are indispensible attributes, and they are also the core variables to determine the quality of buildings for the performing arts facilities. After confirming the independence of attributes and the core and reduct of attributes, the decision rules are then generated. Table 4 Summary of simplification decision rules Total number of decision rules
140 Percentage gap
Number of decision rules after removing each attribute remove(a0) Clear voice
remove (a1) Lighting
remove (a2) Air quality
remove (a3) Stage space
remove (a4) Backstage facilities
remove (a5) Performance equipment
129
131
125
126
125
130
8%
6%
11%
10%
11%
7%
To sum up, we can infer the results from Table 5. Decision Rules: 1. The buildings for the performing arts should possess “satisfying” clear voice, lighting, air quality, stage space, backstage facilities and performance equipment to win an A (Good) from performers. The attributes percentage of lighting and stage space are higher at 90.48% and 88.10% respectively. 2. With “neutral” clear voice, lighting, air quality, stage space, backstage facilities and performance equipment, the buildings for the performing arts gets a B (General) from performers. Performance equipment and air quality have higher attributes percentages, which are 57.32% and 53.66% respectively. 3. According to Table 5, each of the following situations get a C (Poor) from performers: (1) Neutral or unsatisfying clear voice; (2) Any of neutral lighting, air quality or stage space. (3) Unsatisfying background facilities or performance equipment. Under each circumstance in Class C, unsatisfying backstage facilities and performance equipment have higher attributes percentages, which are 82.05% and 58.97% respectively.
654
B. Chang, H.-M. Pei, and J.-R. Chang
Table 5 Decision rules and strengths being derived versus each attribute Condition attributes set
Strength of condition attributes
Attributes percentage
Rule 1. ( a0 = 3) = > (class = A) Rule 2. ( a1 = 3) = > (class = A) Rule 3. ( a2 = 3) = > (class = A) Rule 4. ( a3 = 3) = > (class = A) Rule 5. ( a4 = 3) = > (class = A) Rule 6. ( a5 = 3) = > (class = A) Rule 7. ( a0 = 2) = > (class = B) Rule 8. ( a1 = 2) = > (class = B) Rule 9. ( a2 = 2) = > (class = B) Rule 10. ( a3 = 2) = > (class = B) Rule 11. ( a4 = 2) = > (class = B) Rule 12. ( a5 = 2) = > (class = B) Rule 13. ( a0 = 1) = > (class = C) Rule 14. ( a0 = 2) = > (class = C) Rule 15. ( a1 = 2) = > (class = C) Rule 16. ( a2 = 2) = > (class = C) Rule 17. ( a3 = 2) = > (class = C) Rule 18. ( a4 = 1) = > (class = C) Rule 19. ( a5 = 1) = > (class = C)
71 76 63 74 69 65 39 40 44 40 37 47 15 15 21 21 19 32 23
84.52% 90.48% 75.00% 88.10% 82.14% 77.38% 47.56% 48.78% 53.66% 48.78% 45.12% 57.32% 38.46% 38.46% 53.85% 53.85% 48.72% 82.05% 58.97%
After analyzing the influence of each conditions attribute and decision attribute, we further analyze the decision rules of the set of attributes for different types of performance. Table 6. Decision Rules indicates that for most musical performers, a good building for the performing arts should possess satisfying clear voice, lighting, stage space, backstage facilities, performance equipment and neutral air quality. As for the drama types of performances and dance types of performances, satisfying clear voice, lighting, air quality, stage space, backstage facilities and performance equipment are the presuppositions for a good building for the performing arts. For most music and drama types of performances, buildings for the performing arts possessing neutral clear voice, lighting, air quality, backstage facilities and performance equipment are classified as general. As for dance types of performances, buildings for the performing arts with neutral clear voice, air quality, stage space, performance equipment and satisfying lighting are classified as general. Overall, performers think that good buildings for the performing arts should have satisfying clear voice, lighting, air quality, stage space, backstage facilities and performance equipment. For general buildings for the performing arts, music types of performances have a lower demand on stage space, drama types of performances a lower demand on clear voice and dance types of performances a lower demand on backstage facilities.
Using the Rough Set Theory to Investigate the Building Facilities
655
Table 6 Decision rules of attribute set Type of performance Music Drama Dance
Condition attribute set
Decision attribute
Rule. (a0=3)&( a1=3)&( a2=2)&( a3=3)&( a4=3)&( a5=3) => Rule. (a0=2)&( a1=2)&( a2=2)&( a4=2)&( a5=2) => Rule. (a0=3)&( a1=3)&( a2=3)&( a3=3)&( a4=3)&( a5=3) =>
(Class=A) (Class=B) (Class=A)
Rule. ( a1=2)&( a2=2)&( a3=2)&( a4=2)&( a5=2) => Rule. (a0=3)&( a1=3)&( a2=3)&( a3=3)&( a4=3)&( a5=3) => Rule. (a0=2)&( a1=3)&( a2=2)&( a3=3)&( a5=2) =>
(Class=B) (Class=A) (Class=B)
5 Discussion and Conclusions This study primarily focuses on processing the information system of the database, finding the relations among attributes using the Rough Set Theory. In this study, values with the same attributes value are put into the same category, Dsuperfluous attributes are screened out, and then the combinations of decision rules are generated. The research finds that an increase of attributes values will result in an increase of the number of decision rules. Should there be excessive decision rules, accuracy of approximation of each category will lower. When the knowledge in the database is limited, accuracy of approximation of the lower approximation and upper approximation will influence the distinction of the decision rules. The responses to the questionnaire are obtained by the respondents’ subjective evaluation. However, since the evaluations differ from different individuals, more attribute sets will be generated. The validity of the data will be lowered and will thus affect the determination of the rules. 140 decision rules were derived in the research results. Based on the analytic results, the attributes can be prioritized respectively as: the stage light (91.48%), the stage space (88.10%), the stage audio (84.52%), the backstage facilities ( 82.14%), The performance equipment (77.38%) and the air quality (75.00%). In the final attribute sets of decision rules, most performers agreed that good buildings for the performance arts should satisfying the requirements including the stage audio, the stage light, the air quality, the stage space, the backstage facilities and the performance equipment. In general buildings for the performance arts, music performers demand lower on the stage space, drama performers demand lower stage audio while dance performers demand lower backstage facilities. According to the decision rules being derived (Table 6), music and drama performers perceived that, possessing “neutral” in lighting, air quality, backstage facilities and performance equipment are classified as “general” of buildings for the performance arts. On the other hand, dance performers perceived that neutral clear voice, air quality, stage space, performance equipment and satisfying lighting buildings are classified as “general” of buildings for the performing arts. The results demonstrate that the various performers demand for different facilities. The results can serve as a basis for future designs and constructions of building facilities for the performance arts.
656
B. Chang, H.-M. Pei, and J.-R. Chang
References Ancarani, A., Capaldo, G.: Supporting decision-making process in facilities management services procurement:A methodological approach. Journal of Purchasing & Supply Management 11, 232–241 (2005) Appleton, I.: Buildings for the Performing Arts. Elsevier Limited, Italy (2008) Barron, M.: Objective Assessment of Concert Hall Acoustics. In: Proceedings of the Institute of Acoustics, vol. 28, pp. 70–78 (2006) Beranek, L.: Concert Halls and Opera Houses. Technology Books, Taipei (2007) Chang, B., Hung, H.F.: A study of using RST to create the supplier selection model and decision-making rules. Expert Systems with Applications 37, 8284–8295 (2010) Dimitras, A.I., Slowinski, R., Susmaga, R., Zopounidis, C.: Business failure prediction using rough sets. European Journal of Operational Research 114, 263–280 (1999) Greco, S., Matarazzo, B., Slowinski, R.: Rough sets theory for muliticriteria decision analysis. European Journal of Operational Research 129, 1–4 (2001) Hardy, H.: Building type basics for performing arts facilities. John Wiley & Sons, Canada (2006) Hong, T.P., Chen, J.B.: Finding relevant attributes and membership functions. Fuzzy Sets and Systems 103, 389–404 (1999) Hsieh, Y.L., Lai, R.P.: Cultural Indicators of Performance Facilities. In: 2005 Sustainability and International Conference on Cultural and Creative Design, Taipei, Taiwan (2005) Hsieh, H.R., Hsieh, Y.L., Lai, R.P.: An analysis of citizen attendance and required facilities of arts exhibitions: performances in Kaohsiung and Pingtung areas. Research in Arts Education Journal 11, 77–111 (2006) Hsieh, Y.L., Lai, R.P., Hsieh, Y.Y.: Performance indictors of cultural needs of the construction of facilities. Journal of Architecture 58, 113–119 (2006) Hume, M., Winzar, H.: Repurchase in a Performing Arts Context:The Perspective of Value. Dissertation, The Australian National University (2004) Julie, S.: Building the Best Auditorium. American School Board Journal 44, 30–36 (2008) Kaple, D., Rivkin-Fish, Z., Louch, H., Morris, L., DiMaggio, P.: Comparing Sample Frames for Research on Arts Organizations: Results of a Study in Three Metropolitan Areas. Journal of Arts Management 28, 41–67 (1998) Kavgic, M., Mumovic, D., Stevanovic, Z., Young, A.: Analysis of thermal comfort and indoor air quality in a mechanically ventilated theatre. Energy and Buildings 40, 1334– 1343 (2008) Pawlak, Z.: Rough Sets. International Journal of Computer and Information Sciences 11(5), 341–356 (1982) Pawlak, Z.: Rough Sets Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Netherlands (1991) Pawlak, Z., Skowron, A.: Rudiments of rough sets. Information Sciences 177, 3–27 (2007) Penne, L., Shanahan, J.L.: The Role of the Arts in State and Local Economic Development. In: National Conference of State Legislatures, Denver (1987) Shyng, J.Y., Wang, F.K., Tzeng, G.H., Wu, K.S.: Rough Set Theory in analyzing the attributes of combination values for the insurance market. Expert Systems with Applications 32, 56–64 (2007)
Using the Rough Set Theory to Investigate the Building Facilities
657
Silvestro, R., Johnston, R.: The Determinants of Service Quality-Enhancing Hygiene Factors. In: The proceedings of the QUISII Symposium, St Johns University, New York (1990) Sintas, J.L., Álvarez, E.G.: Four characters on the stage playing three games: performing arts consumption in Spain. Journal of Business Research 58, 1446–1455 (2005) Tanabe, T.: New Series of Architecture33-Theatre Design. Jang Guo Community, Tokyo (1981) Walczak, B., Massart, D.L.: Tutorial Rough sets theory. Chemometrics and Intelligent Laboratory Systems 47, 1–16 (1999)
Part III Data Analysis and Data Navigation
A Novel Collaborative Filtering Model for Personalized Recommendation Wang Qian
*
Abstract. Recommender systems are a class of personalized systems that aim at predicting a user’s interest in available services. Traditional collaborative filtering (CF) has proven to be one of the most successful techniques used in recommendation systems. However, the methods do not consider how the attribute features are related to user preferences, impacting the CF system’s prediction quality. To resolve the problem, this work proposes a novel collaborative filtering model derived from converting the user’s rating of an item to a distribution of attributes to the item. The proposed model is developed using the traditional similarity measure method. Finally, a series of experiments is performed on a typical data set, and the results indicate that the proposed model offers significant advantages in terms of improving the recommendation quality.
1
Introduction
With the development of Web 2.0 technologies and services, an enormous number of products are sold via the web because of the convenience of the Internet. It is getting more difficult to make recommendations automatically to users about what he or she would prefer among those items. Without a better filtering method, users are facing information overload. Recommender systems, a specific type of information filtering, have demonstrated a way to achieve mass customization by suggesting additional products for users to purchase and identifying products according to their needs[1]. One of the most successful technologies among recommender systems is Collaborative Filtering (CF). CF techniques have been developed quickly, not only in the research arena but also in the commercial field. Recently, more and more commercial online companies (e.g. Amazon.com, Netflix.com, and Cdnow.com) employ this technology to provide recommendations to their users. The traditional Collaborative Filtering approaches [2] are based mainly on the assumption that users who have similar rating-behaviors can be grouped to help each other make a Wang Qian School of Business, Sun Yat-Sen University, Guangzhou China J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 661–669. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
662
W. Qian
choice among the potential items[3]. This allows discovery of the relation between user’s preferences and items from a list of transactions, and can then provide an output of several items with the best predicted ratings as the recommendation list. However, the existing CF approaches suffer from the problems of cold-start, sparsity, and scalability. To alleviate the above problems, our work proposes an alternative CF model to expressing user’s preferences, which is derived from converting the user’s rating of an item to a distribution of attributes to the item. Although user’s preferences of an item are closely related to attribute values, attribute characteristics of an item also have significant impact on the quality of results provided by traditional CF. However, the traditional CF model does not consider this important factor. In this paper, the vital attribute close to a user’s preference item can be imputed by employing the user’s preference profile. Meanwhile, the traditional similarity measure method is developed and the amendatory prediction model for mathematical expectation is also improved using the proposed model. Furthermore, a series of experiments is performed on a typical data set, and the results indicate that the proposed model offers significant advantages, both in terms of improving the recommendation quality and in dealing with cold-start items. The remainder of this paper is organized as follows. A review of related work is discussed in Section 2. In Section 3, we describe the proposed model for user’s preferences. Empirical evaluations of our model are expressed, and the related issues are discussed in Section 4. Finally, conclusions are stated in Section 5.
2
Related Works
CF has been acknowledged as the most successful and most widely implemented recommendation technique to date [1]. The basic idea of a CF system is to generate recommendations based on the experiences of past similar users. CF does not restrict the spectrum of recommendations to items similar to those the user has previously evaluated. According to how recommendations are made, CF can be generally classified into the two major categories: model-based and memory-based. The model-based systems are based on a compact model inferred from the data, which is applied upon the target user’s ratings to make predictions for unobserved items. This is typically faster in terms of recommendation time, though the method may have an expensive learning or model building process. A number of past studies motivated by model-based CF have been devoted to the effectiveness of modeling the behaviors over the past few years. Some of the model-based CF algorithms regard personalized recommendations as behavior classification. A new class of model-based CF, called an item-based CF, has been proposed[4][8] and applied to commercial recommender systems. Instead of computing similarities between users, an item-based CF reviews a related set from the target user and selects the most similar items based on similarities between the items. When Nikovski and Kulev [9] discovered frequent item sets lattices, they presented an induction of compact decision trees as the optimal recommendation policy.
A Novel Collaborative Filtering Model for Personalized Recommendation
663
Although model-based CF algorithms have been shown to be very effective in some recommender systems, it may not work well when there is sparse data because the robust associations among the items might not be easy to derive. In comparison to model-based approaches, a considerable number of studies have been conducted on the memory-based learning paradigm. Following the proposal of GroupLens, the first system to generate automated recommendations, memory-based model approaches have seen the widest use in recommendation systems. The most common memory-based model uses a similarity measurement between neighbors and the target users to learn and predict preference toward new items or unrated products. The fundamental algorithm of the memory-based class is the nearest neighbor, which is considered one of the most effective CF approaches. Social filtering is a classic memory-based approach that estimates the group’s potential to share preference information with members in the same group. Although memory-based CF algorithms have been shown to be promising in prediction accuracy, the user-item matrix may not provide sufficient information to achieve a good prediction and also has some serious problems relating to the complexity of computing each recommendation as the number of users and items grow. Memory-based CF algorithms might not perform excellently if a new user or new item is added into the recommender system. Additionally, sparity problems due to the insufficiency of users’ history information should be considered seriously.
3
Proposed Methodology
3.1
Building User’s Profile
A user’s evaluation of items’ attribute values can accurately reflect the user’s preference but not the user’s evaluation of the items. When computing similarities between items using a traditional CF model, an item-based CF reviews a set of items the target user has rated and selects the most similar items based on the similarities between items. The n m matrix of raw ratings from n users and m items is R in a traditional CF. A traditional user profile is constructed from the inherent characteristics of the item, and defining a user as the user’s rating distribution on attribute values. Start by calculating the proportion of each attribute value the user has rated. Next, for each attribute value, calculate the proportion of each rating scale, thus converting the user ratings on items to a rating distribution on attribute values, and substitute it as the user profile. To accomplish this, we need to define representations of the user profiles. Assume that there are k attributes that impact users’ preferences. Let , 1
,
denote the set of all attributes of the item. Here,
represents the set of attributes th, and ,
,
,
,
,
,
,
,
,
,
,
,
,
,
, where
,
, ,
,
,
,
,
is the value
664
W. Qian
th of th attribute.
of
, where
the number of items in subset
. For the users in matrix
means
of the traditional CF,
the total number of the rated items and the number of the values of th, which is the ranking value attribute, needs to be certain. Let denotes the number of the set of number of the value attribute
be the set of items rated by users, ; denotes the , , which the user rated. Using the
1
,
above definition, the probability of attribute
in user
,
profile is calculated as
follows: ,
,
denotes the conditional probability that the rating is
,
the th value of the th attribute, where rating scale. then
1,2, … . ,
denotes that user’s rating value is ,
denotes the top of the on the set of
, and
,
,
no item with the th value of th attribute that user 0 , then
1
,
2 ,
From formula (1) and (2) we can see that if ,
;
in the item with
is calculated using formula (2): ,
If
1
2
,
0, which means there is
has rated, then ,
,
0.
1 . All the
user profiles constitute a user profile matrix, which is the basis for computing the similarity between users. The number of attribute values is much fewer compared with the total number of items, and converting users’ rating on items to a users’ rating distribution on attribute values can reduce the dimension of a user profile effectively, and improve the efficiency of a recommendation system.
3.2
Similarity Computer and Rating Prediction Model
M Most CF based recommender systems build a neighborhood of likeminded users. The Neighborhood formation scheme usually uses Pearson correlation or cosine similarity as a measure of proximity. The most common measure for calculating similarity is the Pearson correlation algorithm. Pearson correlation measures the degree to which a linear relationship exists between two variables. The Pearson correlation coefficient is derived from a linear regression model, which relies on a set of assumptions regarding the data, Namely: the relationship must be linear, the
A Novel Collaborative Filtering Model for Personalized Recommendation
665
errors must be independent and have a probability distribution with a mean equaling 0, and constant variance for every setting of the independent variable. However, the weight of each vector component is not considered in Pearson correlation-based similarity or adjust cosine similarity calculations. This may result in a limitation that the prediction of users preference is inaccurate. Hence, we use improving methods to calculate the similarity. In this paper, all the user profiles have the same dimensions, but different components of each user profile have different weights. The weight of
is determined by the weight of attribute
,
and the product weight of the l th value of attribute 1,2, … , . For example,
,
0.9 , which means that user
where the value of the first attribute equals 90%; that user
, where
,
rated items
0.05, which means
,
rated items where the second value of the first attribute equals 5%;
obviously,
is more representative in depicting user’s interest. If we
,
calculate similarity between users by traditional measurement, the result will have a significant deviation. To solve the problem of similarity measurement between vectors when each component has a different weight, using the cosine-based similarity, we propose a weighted similarity measurement formula. Assume that is the weight , of , ., then the similarity between user and user is calculated as formula (5): ∑
,
∑
,
,
,
,
Assume there are three vectors of weights of
,
,
,
,
,
,
.
improved similarity calculation between vector
3 ,
,
,
: ,
1 ;
, and
,
,
, 1;
min
,
∑
:
, and
,
the weights of min
,
,
,
, :
have the ,
min
,
have ,
,
According to formula (4), the and vector
is as follows: (4)
To further enhance the accuracy of the recommendation for the unrated item, we first calculate its mathematical expectation of the rating based on the user profile and its attribute values. When forecasting the final rating, the collaborative filtering operation should be carried out in addition to the mathematical expectation calculation.
666
W. Qian
Assume that the unrated item of
,
,…,
,…,
has
attributes, each attribute has the weight
, respectively, where 0
1 and
1 in influencing users’ preference. The value of
,…,
,…,
can be attained through
knowledge of the item but also by continually trying to minimize the MAE. Assume that
is the
is the attribute value for the first attribute; then
attribute value for the nth attribute. According to the rating distribution of each attribute value in a user profile, the mathematical expectation of user’s rating is calculated as formula (5): ∑
∑
,
,
5
,
3.3
Building User Preference Model
To construct a user preference model, first the similarity between the active user and other users should be calculated and arranged in descending order. We can forecast the rating of an unrated item according to its mathematical expectation and the ratings given by the neighbors, then take the top ,
as the recommendation. Where , ,…,
is a set of
items,
,…,
items with the highest rating is a set of
users, and
represents the number of items for an
attribute profile. The model for user preferences is shown in model 1.
Model 1. User preference prediction model Input: item attribute profile ; user-item similarity matrix Output: top items with the highest rating as the recommendation. Computing user preference , , For each For i 1 to Calculate the probability of attribute , for each user For u 1 to //get , of each user to item attribute profile matrix Calculate and , If // compute the preference from Then Calculate , and Remove Matrix For each Calculate
A Novel Collaborative Filtering Model for Personalized Recommendation
4 4.1
667
Experimental Results and Analysis Experiment Data Set and Evaluation Metrics
We used experimental data from MovieLens, our web-based research recommendation system, to evaluate different variants of item-based recommendation algorithms. We randomly selected 6,040 users and 3,883 movies ratings from the database. We divided the database into a training set and a test set, where 80% of the data was used as the training set and 20% of the data was used as the test set. For this purpose, we selected 26,108 ratings on 2,736 movies and 163 users as our experimental data set, where each user rated at least 30 movies. Recommender systems research has used several types of measures for evaluating the quality of a recommender system. Mean Absolute Error (MAE) between ratings and predictions is a widely used metric. MAE is a measure of the deviation of recommendations from their true user-specified values. The MAE is computed by first summing these absolute errors of the N corresponding ratings-prediction pairs and then computing the average. We used MAE as our choice of evaluation metric to report prediction experiments because it is most commonly used and easiest to interpret directly. Formally, ∑
4.2 ¾
|
|
The Results and Analysis
Determination of attribute weight Items have many attributes, which influence users’ preference differently. In the experiments of this paper, we take “genre” and “year” as the item attributes. To find the right weight of each attribute, we conducted a series of experiments by changing the combination of parameters. Table 1 demonstrates accuracy comparison of variety of CF. From Table 1, it can be observed that when 1, the MAE is lower than that of 0. In other words, constructing a user profile according to the “genre” attribute is more accurate than that of the “year” attribute. Comparing this with the release time, users care more about the movies’ genre. When 0.6, 0.4, the highest accuracy is reached, therefore, the weight of “genre” and “year” is 0.6 and 0.4, respectively.
668
W. Qian
Table 1 attribute weight comparison of variety of CF Number of user 2 4 6 8 10 12 14
¾
Traditional CF 0.8409 0.8298 0.8249 0.8221 0.8218 0.8199 0.8187
Rating attributes CF 0.8397 0.8168 0.8075 0.8032 0.8027 0.8024 0.8021
Rating prediction CF 0.8421 0.817 0.8059 0.8022 0.8017 0.8019 0.8013
Comparison of our approach and traditional one
To test the effectiveness of the proposed approach, we compared its recommender accuracy with traditional filtering, the results are shown in Table 2. Table 2 demonstrates that the accuracy of collaborative filtering based on a rating distribution of attribute values is significantly improved over traditional collaborative filtering. Converting a user-rating matrix into a user profile matrix reduced the dimensions of the matrix, and increased data density. Weighted cosine-based similarity computing solves the problem of similarity between users where each component in a user profile has a different weight. Compared with traditional collaborative filtering, the recommender quality is improved greatly. Table 2 Accuracy comparison of variety of CF Number of user 2 4 6 8 10 12 14
5
Traditional CF 0.8381 0.817 0.8141 0.812 0.811 0.8096 0.809
Rating attributes CF 0.8397 0.8168 0.8075 0.8032 0.8027 0.8024 0.8021
Conclusions
In this paper, we have presented a novel collaborative filtering model that introduces the character of items to get the description of users’ preference. The improved CF model has higher accuracy than the traditional approach. This study converted users’ ratings on items to a rating distribution on items’ attribute values, and adopted weighted cosine-based similarity. This method overcomes the shortcoming of the traditional approach that cannot precisely describe users’ preference. It not only inherits the merits of traditional collaborative filtering but
A Novel Collaborative Filtering Model for Personalized Recommendation
669
also makes the recommendation meet users’ demand better. This approach also improves the scalability of the recommendation system. At the same time, with an increase of items in a system, the number of columns in a user profile matrix cannot be increased, which can help improve the scalability of the recommendation system.
Acknowledgements This research was supported by the National Natural Science Foundation of China under grants 70971141.This work was also supported by the Natural Science Foundation of Guangdong Province under grants 9151027501000049, by the Ministry of Education Humanities and Social Sciences Planned Project under grant 09YGA630156 and by the Fundamental Research Funds for the Central Universities. We thank anonymous reviewers for their valuable comments.
References [1] Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Analysis of recommendation algorithms for E-commerce. In: Proceedings of the Second ACM Conference on Electronic Commerce, pp. 158–167 (2000) [2] Mobasher, B., Burke, R.R., Bhaumik, C.: Effective attack models for shilling item-based collaborative filtering systems. In: Proc. Int. Conf. KDD Workshop on Web Mining and Web Usage Analysis, Chicago, USA, pp. 23–33 (2005) [3] Ja-Hwung, S., Bo-Wen, W.: Personalized rough-set-based recommendation by integrating multiple contents and collaborative information. Information Sciences 18, 113–131 (2010) [4] Massa, P., Avesani, P.: Trust-aware collaborative filtering for recommender systems. In: Proceedings of International Conference on Cooperative Information Systems, pp. 492–508 (2004) [5] Cho, Y.H., Kim, J.K.: Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce. Expert Systems with Applications 26, 233–246 (2004) [6] Leung, C.W.-K., Chan, S.C.-F., Chung, F.-L.: A collaborative filtering framework based on fuzzy association rules and multiple-level similarity. Knowledge and Information Systems 10(3), 357–381 (2006) [7] Shih, Y.Y., Liu, D.R.: Product recommendation approaches: Collaborative filtering via customer lifetime value and customer demands. Expert Systems with Applications 35(1–2), 350–360 (2008) [8] Miller, B.N., Konstan, J.A., Riedl, J.P.: Towards a personal ecommender system. ACM Transactions on Information Systems, 437–476 (2004) [9] Nikovski, D., Kulev(2006), V.: Induction of compact decision trees for personalized recommendation. In: Proc. Int. Conf. on the ACM Symposium on Applied Computing, Dijon, France, pp. 575–581 (2006)
A RSSI-Based Localization Algorithm in Smart Space Liu Jian-hui and Han Chang-jun
*
Abstract. Location information of mobile nodes is a key technology to solve the localization services in smart space. RSSI (Received Signal Strength Indicator) ranging technology is a localization technology which turns propagation loss into distance by virtue of theoretical or empirical propagation models. During the twodimensional spatial localization process of smart space, due to the factors such as multipath, obstructions and so on, there are some ranging errors for unknown nodes. Through carrying out analysis on the areas where will be any distance error during the localization process, a new localization algorithm (ARSS) based on RSSI has been proposed in this paper that the stimulation experiment shows that ARSS algorithm can improve localization accuracy and reduce the communication overhead, thus to better meet the needs of real-time localization for smart terminals.
1
Introduction
Location information of mobile nodes is a key technology to solve the localization services in smart space. RSSI (Received Signal Strength Indicator) ranging technology is a localization technology which turns propagation loss into distance by virtue of theoretical or empirical propagation models. During the two-dimensional spatial localization process of smart space, due to the factors such as multipath, obstructions and so on, there are some ranging errors for unknown nodes. Through carrying out analysis on the areas where will be any distance error during the localization process, a new localization algorithm (ARSS) based on RSSI has been proposed in this paper that the stimulation experiment shows that ARSS algorithm can improve localization accuracy and reduce the communication overhead, thus to better meet the needs of real-time localization for smart terminals. The localization algorithms can be divided into the following categories according to the fact whether the actual distance or angle between the nodes is ranged in the localization process or not: range-based localization algorithm and range-free localization algorithm. The typical ranging techniques include RSSIbased ranging[3], TOA or TDOA-based ranging and AOA-based ranging. Liu Jian-hui · Han Chang-jun Information Technology College, Eastern Liaoning University, Dandong, China J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 671–681. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
L. Jian-hui and H. Chang-jun
672
RSSI takes advantages of the strength of known transmitted signals that the receiving nodes will turn propagation loss into distance by virtue of theoretical or empirical propagation models according to the loss of signal strength and computing signals received during the propagation process. In smart space, in theory, the location of an unknown node can be determined by trilateration by virtue of the RSSI information of 3 anchor nodes. In the application environment, since wireless signals are weak to adapt to complex environments, the accuracy of this method is not very high[4]. However, only a fewer hardware devices are needed in this method, and many wireless communication modules can offer RSS values. Therefore, the RSSIbased ranging method is still widely used[5]. RSSI-based ranging takes advantages of received signal strength and theoretical or empirical path loss propagation model to calculate distance, the statistical model [6] is as follows: 10 Of which, is the signal strength at Distance D, np is the path loss factor ranging from 2 to 4, and P is the signal strength(dBm) at Reference Distance d .This technology mainly uses RF signals that there will be ranging error during signal transmission process due to multi-path attenuation, background noise and irregular signal propagation characteristics. Generally, RSSI-based localization system employs various algorithms to reduce the impact of ranging error on localization, including multiple measurements and circular orientation refinement, a lot of computation and communication overheads are needed for these algorithms, therefore, how to improve the localization accuracy without increasing computation and communication overheads has become the focus of this paper.
2
Analysis on Localization Accuracy of RSSI-Based Algorithm
During the spatial localization process of unknown nodes in smart space, in the case of no errors, three anchor nodes will provide a perfect localization solution, there is no need to be improved with additional anchor nodes. Assuming that the 1,2,3 , and the , known coordinates of the three anchor nodes are distances to the unknown node , are respectively 1,2,3 , then we can obtain of the unknown point P: 2 2
2 2
1,2,3 are inaccurate, there will be some However, when the distances errors in the obtained solution. Assuming that the error of RSSI ranging is 0, when the actual distance between two nodes is d, then the ranging value shall be within the range of , . Because in smart space, the distribution area of sensor nodes is small, there is little change on environmental factors, it can be assumed that the distance errors
A RSSI-Based Localization Algorithm in Smart Space
673
are all the same. In case of errors, the three circles will constitute a small area (denoted as ) as shown in Fig. 1, the coverage of this area stands for the size of localization error.
Fig. 1 Area of localization errors
Assuming that the straight line passing points P and intersects Circle ( as shown in Fig. 1) at two points. Form two tangents to the Circle respectively passing the two points, the tangents intersect and form a hexagon. When the ranging error is very small, the marginal area can be linearized, and it is estimated to be . Thus, the issue will be turned into to figure out under which circumstance will be smallest. the localization error Definition 1. For a subset , if for any two different points , and for any real number , when is within the range of [0,1], is tenable, then S is a convex set. 1 λ
and
Definition 2. For any two different points and , and for any real 1 λ f number , when is within the range of [0,1], is tenable, then is a convex function defined on basis of convex 1 λ set S . Lemma 1. for a smooth function defined on basis of convex set , is a second-order continuously differentiable function, if the Hesse matrix of is positive and definite everywhere on S, then is a strictly convex function on S . Lemma 2. for a convex function defined on basis of subset S, if there are m , , … , , then / points / ) is tenable, if f ( x ) is a strictly convex function, . it can only be tenable when
L. Jian-hui and H. Chang-jun
674
Theorem 1. Assuming that when the angle (acute) of an unknown node relative to any two anchor nodes 1,2,3 are both equal to /3, the localization error of the unknown node is smallest. is a circumscribed hexagon about the cirProof: Since the area constituted by cle (as shown in Fig. 1), the following expression can be tenable: 2ε for
,
,
tan
tan
tan
(1)
, the following relations are tenable: π π⁄2, the following ex-
since 2 1 0, when 0 pression can be derived by applying Lemmas 1 and 2: 6
1 3
2
6ε
2
6ε
2 2√3ε
6 (2)
Area of localization error/cm 2
⁄3 , the equation is tenable. That is, when the acute when angles formed by an unknown node and the three anchor nodes are equal to /3, the localization error of the unknown node is smallest. Since formula (1) is a strictly convex function, the extreme value obtained is the only minimum, it can be inferred that, when the angles formed by the unknown node and three anchor nodes are close to /3 at the same time, the value of the function will get smaller and smaller, As shown in Fig. 2, during actual localization process, a threshold value and the anchor node whose three angles sa⁄3 λ can be selected tisfying the conditions of ⁄3 , , to achieve the anchor node localization.
4
ε=1cm
3.8 3.6 3.4 2 2
1
β / 23 r ad
1 0
0
/rad
β 12
Fig. 2 Error area vs. angle
Theorem 2. when there are 1 localization anchor nodes meeting Theorem 1 and participating in localization computation of the unknown nodes, along with the increasing of , the localization errors of unknown nodes will get smaller and smaller; when increases infinitely and tends to infinity, the localization error
A RSSI-Based Localization Algorithm in Smart Space
675
will tend to be a constant value ( is the ranging error), that is, the localization error is convergent. Proof: (1) When 1, from Theorem 1 we can learn that localization error 2√3 ; (2) When 2, there will be a new error area around the unknown nodes( de), the possible error area of the unknown nodes will be noted as , from Fig. 3 we can learn that, Circle is the common part of the two; outside the Circle , the six small areas of cut by intersects once again, assuming that the common part obtained through the intersection of every small areas outside the circle is 1⁄ (of which, 1, c is a constant) of every . After derivation, we can obtain angular areas corresponding to the original the coverage of the area, which is denoted as . 2√3
1
ε
1
similarly, the coverage of the area is denoted as . c
1 2√3
1
can also be obtained, it ε
1
1
then, we can obtain the following iterative relation: 2√3 1
1
by solving this differential equation, we can obtain: √
(3)
when n tends to infinity, select a limit value for Eq. (3), then we can obtain:
p
Fig. 3 Error area after several times’ localization
L. Jian-hui and H. Chang-jun
676
lim
lim
√
(4)
The physical meaning of Eq. (4) is that when there are numerous triangle localization anchor nodes meeting Theorem 1 and participating in localization computation of the unknown nodes, the localization error will tend to be a constant ue , that is, the localization error is convergent.
3
ARSS Algorithm
Since the accuracy of RSSI-based localization algorithm in smart space is not very high under the conditions with constrained energy, in order to overcome shortcomings of RSSI-based algorithm, the ARSS algorithm is presented in this paper according to Theorem 1 and Theorem 2. The localization process of RSSI-based localization algorithm is described as follows: (1) The anchor nodes send their own information by cycle: node IDs and their localization information. (2) The unknown nodes receive the RSSI values from several anchor nodes and compute the distances among the nodes according to the channel models. Firstly, it takes advantages of the previously-received three anchor nodes to preliminarily compute the location of the unknown node. (3) All the received information are grouped (three messages for each group) to respectively compute the location of the unknown nodes. (4) Finally, figure out the average value of computed locations as the estimated location of the unknown node. It can be seen that, first, the distance localization algorithm roughly computes the location of the unknown node, then gradually improve the localization accuracy of the unknown node through several similar processes. Along with the increasing of the number of nodes, the computation volume of the algorithm will be increased in geometric progression, which can not be applied to the localization requirements for smart space. The ARSS localization algorithm is described as follows: (1) The anchor nodes send their own information by cycle: node IDs and their localization information. (2) The unknown nodes receive many such information, record the RSSI values of anchor nodes and take use of channel models to compute the distances among the nodes. (3) Arbitrarily select three anchor nodes to compute the angles of the unknown node relative to the top points of the three anchor nodes.
A RSSI-Based Localization Algorithm in Smart Space
677
(4) Take use of the three anchor nodes with the angles within the range of π⁄3 λ, π⁄3 λ to compute the location of the unknown node. Respectively compute the locations of the unknown nodes for each group of anchor nodes meeting the specific requirements. (5) Finally, figure out the average value of computed locations as the estimated location of the unknown node. This algorithm can ensure very small localization errors and improve the localization accuracy of the nodes without increasing the network traffic.
4
Location Error Analyses
First, compute the localization error according to Theorem 1, the location error will be the smallest. Minimum value of location error can be obtained as follows: 2√3 According to Theorem 2, we can further reduce the location error, which tends to be . , namely Secondly, compute the mathematical expectation of error is exthe average error area. Integrated with and substitutes Eq. (1), pressed as 2
2
2
2 ,
for convenience of the calculation and deduction, we replace z, then x, y, z and x +y +z = π. Eliminate the unknown variable z, we have 2
cot
,
by x, y,
(5)
since the variable x, y follows the uniform distribution in region D, D x, y |x 0, y 0, x y π , the joint probability density function of x, y is 2
, substitute it into Eq. (5), we get 4 2
2
cot
2
2
4 2
2 4
2 ln
2 (6)
L. Jian-hui and H. Chang-jun
678
Through the previous deduction of Eq. (6) , we have 2
2
2
|
2
2
(7)
substitute the result of Eq. (6) into Eq. (7), and do suitable variable replacement and the computation, we can know 2
4
2
(8)
Finally, compare Eq. (2) with Eq. (8), the result indicates that placing archor nodes according to the theorem1 can reduce the localization error and improve the location precision. The improvement in location precision is termed η, and computed as 24
2 24
2√3 2
34.9%
After computing the location error in an improved situation, we have noted that the location accuracy can be improved by 34.9% according to the two theorems. The above analysis is on the assumption that the distance measurement error is . However, in the actual location, there are all kinds of uncertain factors, which means that the ideal value for location accuracy is unattainable, What’s more, placing anchor nodes with a purpose can improve the location accuracy of the unknown node effectively.
5
Simulation Experiment
In order to compare the performances of ARSS algorithm and RSSI-based algorithm proposed in this paper, Matlab is applied to set up a unified simulation environment within the area range of 21m × 15m, the anchor nodes are distributed uniformly within this range and it is added with Gaussian distribution noise for interference. As shown in Fig. 4, when the number of anchor nodes is less than 12, the localization accuracy of ARSS algorithm is significantly higher than the RSSI algorithm, along with the increasing of the number of anchor nodes, the localization accuracy of the two algorithms will both be continuously improved.When the number of anchor nodes approaches 16, the localization accuracy will tend to be a constant value. It can be drawn from the figure that, for the same accuracy, RSSI is calculated through several iterations, which will accordingly extend the localization time and energy consumption of the system that RSSI is to improve the localization accuracy of the system by sacrificing time and energy.
A RSSI-Based Localization Algorithm in Smart Space
679
12
Localization error/cm
ARSS algorithm RSSI algorithm
10
8
6
4
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 The number of archor nodes/n
Fig. 4. Comparison of algorithm accuracies
With its master chip designed with an 8-bit microcontroller (AVR Atmega128), Webit5.0[7-8] is a pervasive terminal equipment equipped with a priority-based preemptive embedded real-time Operating System (WebitOS 5.0) and built-in light TCP/IP protocol stacks independently developed by Liaoning Provincial Key Laboratory of Embedded Technology that it can support hard real-time applications and communication between devices. As shown in Fig. 5, it is the comparison of real-time localization performances on Webit5.0 platform when the numbers of localization anchor nodes are the same. Due to the limited processing capacity of Webit5.0, 12 localization archor nodes are applied in the experiment. It can be seen from Fig. 5 that, when the number of anchor nodes is 6, the time overhead for ARSS localization algorithm is 75.0ms, however, the time overhead for RSSI-based localization algorithm is 130.0ms. Along with the increasing of the number of anchor nodes, the time overhead of RSSI-based localization algorithm will be increased by power series, however, the upward trend of time overhead curve for ARSS algorithm is close to a linear performance. Obviously, in order to ensure real-time localization and small localization error during the localization process, the premise is not to use too many anchor nodes to participate in the localization computation. When the best computing unit is available, the localization overhead only takes tens of milliseconds, which can effectively meet the requirements for real-time localization.
L. Jian-hui and H. Chang-jun
680
800 ARSS algorithm RSSI algorithm
Executing time /ms
600
400
200
0
3
4
5 6 7 8 9 10 T he number of archor nodes/n
11
12
Fig. 5 The number of anchor nodes and localization time curve
Fig. 6 shows the comparison of the localization trajectories of two algorithms in a stimulation environment. The moving speed of the unknown nodes is 3m/s. Obviously, the ARSS algorithm is much more accurate, it is closer to the actual moving trajectories of the moving nodes. 2.5 actual track The width of room/m
2.2
ARSS track RSSI track
1.9 1.6 1.3 1
2
4
6 8 10 T he length of room/m
12
14
Fig. 6 Comparison of tracking trajectories of both localization algorithms
6
Conclusion
This paper presents a RSSI-based ARSS localization algorithm, the basic idea is to figure out under which distribution relationship between unknown nodes and anchor nodes the distance errors of unknown nodes are smallest by analyzing the
A RSSI-Based Localization Algorithm in Smart Space
681
distance errors generated in the process of RSSI ranging. This algorithm has low requirements for the hardware, and the simulation results show that the ranging accuracy of ARSS algorithm is better than RSSI-based algorithm without increasing any network traffic, hence it is consistent with the ranging requirements for localization smart space.
Acknowledgements This work is supported by the National Natural Science Foundation of China (69873007), the National Natural Science Foundation of China under grants (70971141),the Hi-Tech Research and Development Program of China (2001AA415320), the Science and Technology Foundation of the Education Department of Liaoning Province (No.2008212, No.L2010141). the National Natural Science Foundation of Eastern Liaoning University (2009-Z03).
References [1] Gu, H.L., Shi, Y.C., Xu, G.Y.: A core model supporting location-aware computing in smart classroom. In: Proc. of the 4th International Conference on Web-based Learning, p. 1213 (2005) [2] He, T., Huang, C.D., Blum, B.M.: Range-Free localization schemes in large scale sensor networks. In: Proc. of the 9th Annual Int’l Conf. on Mobile Computing and Networking, San Diego, pp. 81–95 (2003) [3] Bahl, P., Padmanabhan, V.N.: RADAR: an in-building RF-based user location and tracking system. In: Proc. of the IEEE INFOCOM, pp. 775–784 (2000) [4] Kim, S.B., Park, C., Kang, D.Y.: An efficient positioning algorithm using ultrasound and RF. International Journal of Control, Automation and Systems 6(4), 544–550 (2008) [5] Jalal, A.M.: Augmenting smart spaces with intelligent robots. In: Proc. of 2007 2nd International Conference on Pervasive Computing and Application, pp. 505–511 (2007) [6] Suo, Y., Miyata, N., Ishida, T.: Open smart classroom: Extensible and scalable smart space using Web service technology. In: Proc. of The 6th International Conference on Advances in Web Based Learning, pp. 428–439 (2007)
An Improved Bee Algorithm-Genetic Algorithm Huang Ming, Ji Baohui, and Liang Xu
*
Abstract. To avoid pre-maturity and local constringency of traditional genetic algorithm, an improved bee algorithm-genetic algorithm(BAGA) was proposed, based on the stronger local searching capability. The proposed algorithm employed the strategy of calling artificial bees colony algorithm to avoid the local constringency when genetic algorithm's evolution was stopped, so as to advance the searching speed and shorten the searching time. At the same time, in order to keep the biodiversity, the iteration adjustment threshold was introduced. Finally, simulation examples of typical job-shop schedule problem (JSP) show that the proposed algorithm can avoid pre-maturity and advance the constringency.
1 Introduction Bee Colony Algorithm is a non-numerical optimization methods based on self-organization model and the bees swarm intelligence[1]. Bee colony algorithm was named the colony's self-organization simulation model[2]when it was firstly proposed by Seely in 1995, in the model, although all sectors of bees society should do a single task, but bees exchange information through swing, smell or other ways to collaborate completely with each other for nest building, harvesting pollen and other work. Then, reference[3]proposed the swarm optimization algorithm in 2003, In 2009, Hu Zhonghua et applied artificial bee colony algorithm to job-shop scheduling problems in reference[4], and achieved a satisfactory test results. So far many studies indicate that genetic algorithm was effectively to salve the job shop scheduling problem, but it also has the defect of slow evolution pace and pre-maturity [5], and the studies of bee colony algorithm are still in it’s preliminary stage because it is a new algorithm, many questions such as the long searching time colony algorithm for a and other shortcomings would be desired to solve[6]. To avoid these defaults, an improved bee algorithm-genetic algorithm was proposed in this paper, based on the strategy of calling artificial bees colony algorithm to avoid the local constringency when genetic algorithm’s evolution was stopped, this can Huang Ming · Ji Baohui · Liang Xu School of Software Technology Institute, Dalian Jiao Tong University, Dalian,Liaoning e-mail: [email protected] *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 683–689. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
684
H. Ming, J. Baohui, and L. Xu
effectively improve the convergence speed and convergence. At the same time, the iterative similarity threshold is introduced to adjust mutation time, that is to say, at the end of the populations’ evolution, the population diversity has decreased, when there is high similarity between individuals, do mutation first, and then do the crossover operation, in order to increase the diversity of the population. The new algorithm is applied to the JSP problem, simulation shows that the convergence speed has been greatly improved.
2 The Idea of Improved Bee Algorithm-Genetic Algorithm 2.1 Combined Genetic Algorithms with Bees Algorithm According to lots amounts of reference, the principles of bee colony algorithm can be divided into two types: the fist is based on colony foraging theory, and the other is based on bee breeding principles. At present, most of the references about the bee colony algorithm are introducing the bee breeding principle to improve the performance of genetic algorithm theory, but bees foraging theory combined with genetic algorithms references are few. This paper has worked for combining a colony algorithm with genetic algorithm(BAGA), to improve the performance of genetic algorithm, and make BAGA more suitable for JSP problems. In reference[4], the bee colony algorithm is used to solve JSP problem, although achieving a preferable results, there are also defaults of complex computation and slow convergence speed, this paper will combine the colony algorithm and genetic algorithm based on the reference[4], to improve the efficiency and performance. The main idea of New algorithm (BAGA) is: to call the colony algorithm firstly, search for the initial solution JSP problem, then use these initial solution as the initial population of genetic algorithm, call bee colony to update the solution when the population evolution stagnation or degradation. The process of BAGA: Step 1: Set the minimum number operation
G min and the maximum number of genetic
G max . Colony algorithm is executed first, and the colony algorithm is
from reference[4]. Step 2: Set the sign of population stagnation or degradation ψ , Then
⎧⎪0 ⎪⎩1
ψ =⎨
1 n n −1 n−2 f max ≤ f max ≤ f max ≤ " ≤ f max n n −1 n− 2 1 f max > f max > f max > " > f max
(1.1)
n represents the Nth population’s optimal fitness value of chromosomes in (1.1). f max
An Improved Bee Algorithm-Genetic Algorithm
685
Step 3: If ψ =0, that indicates continuous generations of the best chromosome’s fitness value does not change even change to smaller, that is to say, population evolution is stagnation or degradation, then update the global optimal solution, re-called bee colony algorithm, return to step 1. Otherwise If ψ =1, then continuous generations of the best chromosome’s fitness value is changing, the population is constant evolution, then continue the operation of genetic algorithm.
2.2 The Adjustment of Mutation Time In BAGA algorithm, there will be high similarity between individuals after continuous implementation of several generations in genetic algorithm, at this situation, it is difficult to produce excellent offspring, this paper has adjusted the mutation time to avoid this default. According to schema theorem[7], the main role of genetic algorithm’s mutation operator is to maintain a certain diversity of population, that is, control the similarity between individuals in a certain range. Individual similarity is the ratio of the same allele number which chromosome corresponding between the two individuals and the total number of gen. In the two chromosomes, the more alleles more of the same genes, then the greater similarity of the individuals. And the higher similarity of two individuals, the more difficult to produce the executive individual through crossover operation, so it is very difficult to search for solutions in the new space, this paper does mutation operation fist, this would reduce the similarity between two individuals. Step 1: Calculate the similarity between the optimal population x and other individual chromosomes:
⎡ si = 1 / ⎢ ⎢⎣
⎤ 2 − ( d d ) ⎥ ∑ ik k k =1 ⎥⎦ n
(1.2)
d ik is the kth gene on chromosome, d k is the kth gene on the best chromosome, n is the population size. Step 2: If the calculated similarity value is less than the value of S from (1.2), then do the crossbreed operation first, and then do the mutation operation. Otherwise, do the mutation operation first, and then do crossbreed the operation.
3 The Process of Improved Bee Algorithm-Genetic Algorithm The process of BAGA is listed as follows:
686
H. Ming, J. Baohui, and L. Xu
Step 1: Initialize the total number of bees colonies
g m , leading constant Q ,
transfer intensity σ , the importance of transfer factor α , the importance of heuristic factors β , the minimum iterations number G min , The maximum iterations number G max , termination iterations number of BAGA
K max and so on.
Step 2: Put the bees into the initial node, according to (1.3) to update the generation of leading factor in order to complete the selection of all nodes.
⎧ ρ ij ( NC ) α η ij ( NC ) β ⎪⎪ α β pijk = ⎨ ∑ ρ is ( NC ) η is ( NC ) ⎪ j∈tabuk ⎪⎩1
j ∈ tabu k Otherwise
(1.3)
In (3), η ij = 1 / T ( j ) , T ( j )( j = 1,2, " , ma) represents the processing time of j nodes. Step 3: Put the initial solution finding by bees as the initial population of the genetic algorithm, if the BAGA algorithm’s current iteration K is less than
K max , Then
calculating every individual’s fitness firstly, selecting the fittest individual, beginning genetic operations. Step 4: Using (1.2) to calculate similarity between the fittest individual and other individuals in population, if the similarity is high, do the mutation first, and then crossbreed. Otherwise, crossbreed first, and then mutation. Step 5: Update the leading factor according to (1.3) by the fittest individual in the population. If the iterations’ number of genetic algorithms do genetic operation. Otherwise, if
G ≤ G min , continue to
G > Gmin , and meet the needs of (1.1), then
turn to Step 2 to continue the implementation of colony algorithm. Step 6: Make sure if the current iterations number of genetic algorithms is the maximum number
G max , if not, continue to do the genetic operation. If it is,
perform the following steps. Step 7: Update the global optimal solution, and update leading factor according to (1.3). Step 8: If the current iteration number of BAGA K Otherwise if algorithm.
< K max , then turn to step 2.
K = K max , the calculation results are output, and then end the
An Improved Bee Algorithm-Genetic Algorithm
687
4 Simulation Results This paper select FT06 and FT10 issues to verify the performance of the new algorithm BAGA, to illustrate the superiority and effectiveness of BAGA We choose simple genetic algorithm SGA, and in the artificial bee colony algorithm (ABC) mentioned in reference [4] to do simulation contrast. Set a simple genetic algorithm crossover probability 0.8, mutation probability is 0.2, artificial bee colony algorithm α
=1
, β = 5 ,C=10,Q=10, σ = 0.1 ,
termination conditions for the number of iterations is 60 generations for the FT06 issues, and 300 generations for the FT10 issues, a total bees of 20. Table 1 shows that in the problem FT06 solving, three algorithms find the optimal solution 55, but SGA get the optimal solution of low probability, the larger the randomness of the solution process, and prone to loss optimal solution of the phenomenon. ABC algorithm performance in all areas is improved, but the global convergence slower evolution algebra is at least 54 generations. Compared to SGA and ABC, the searching ability in all aspects of BAGA has increased, a higher probability of finding the optimal solution, and the shorter time spending to search the optimal solution. Figure 1 shows an iterative process curve selected randomly of ABC and BAGA algorithm in FT06 problem solving. It can be seen from Figure 1, with BAGA, the convergence rate faster than ABC, and did not enter the premature stagnation, finally search for the optimal solution successfully. Table 1 The results of simulation
FT06
FT10
SGA
ABC
BAGA
SGA
ABC
BAGA
Jbest
55
55
55
942
930
930
Javg
62
60.5
58
989
981
957
Nmin
47
54
30
279
210
142
Pbest
0.1
0.6
0.9
0.1
0.1
0.3
Var
0.109
0.081
0.054
0.063
0.049
0.030
Note:
J best is
the optimal solution,
J avg
solution of the minimum evolution algebra, obtained,
V ar
is average deviation.
is the average solution,
Pbest
N min
is the optimal
is the probability-based optimal solution
688
H. Ming, J. Baohui, and L. Xu
80 75 70
值 数65
ABC算法 BAGA算法
60 55 50
0
10
20
30
40
50
60
70
时间
Fig. 1 The comparison of convergence curve of algorithm in solving FT06
5 Concluding Remarks This paper analyzes the advantages and disadvantages of bee algorithm-genetic algorithm, based on the combined of colony algorithm and genetic algorithm to overcome their shortcomings, when stagnating in the evolution, genetic algorithm will call colony algorithm to jump out local optimal strategy, so that to improve the algorithm performance. Iteration is introduced to control the adjustment of the similarity of variation time to maintain the population diversity of the late evolution. Finally, the simulation results prove the effectiveness and superiority of the new algorithm.
Acknowledgements I would like to express my sincere gratitude to National 863 Plan Project (2008AA040201), Liaoning Province Science and Technology Plan Project (20102017), Liaoning Provincial Office of Education Plan Project (L2010085), Daian Science and Technology Plan Project(2010J21DW009),which provide financial aid and help for this paper.
References [1] Karaboga, D.: An Idea Based On Honey Bee Swarm For Numerical Optimization. Technical Report-TR06, Erciyes University (2005) [2] Seeley, T.D.: The Wisdom of the Hive: The Social Physiology of Honey Bee Colonies. Harvard University Press, Cambridge (1995) [3] Teodorovic, D., Dell’Orco, M.: Bee colony optimization a cooperative learning approach to complex transportation problems. In: Proceedings of the 10th EWGT Meeting and 16th Mini EURO Conference, Poznan, 13-16 September (2005)
An Improved Bee Algorithm-Genetic Algorithm
689
[4] Wei, Q., Bing, W., Jie, S.: Genetic algorithm for solving a class of uncertain job shop scheduling problem. Computer Integrated Manufacturing System 13(12), 2452–2455 (2007) [5] Jun, D., Fenglei, L.: Colony algorithm for TSP problem in the application and the parameters of improvement. China Science and Technology Information 3 (2008) [6] Karaboga, D., Basturk, B.: Artificial Bee Colony(ABC) Optimization Algorithm for Solving Constrained Optimization. Foundations of Fuzzy Logic and Soft Computing (2007)
Application of Bayesian Network in Failure Diagnosis of Hydro-electrical Simulation System Zhou Yan and Li Peng
*
Abstract. Considering the application background of hydro-electrical simulation, a new algorithm for learning Bayesian network structure is proposed according to the rule base provided by many experts. This algorithm adopts statistical strategy during extracting valid rules from rule base, discards the rules with weak causality, and only retains those with strong causality. Bayesian network topology model is formed by structure learning with these valid rules. One loop-breaking technique is proposed for the causality loop existed in topological structure. And this model is used in the hydro-electrical simulation system of Fengman Hydraulic Power Plant, which exerts the advantage of Bayesian network in solving the uncertainty problem. This model has been proved effective during the actual operation.
1
Introduction
Since entering into 1990s, Bayesian network has developed into an effective tool for uncertainty reasoning and modeling in the field of artificial intelligence[1,2], which organically integrates Bayesian probability methods with directed acyclic graph. However, it has existed a difficult problem worth researching that how to learn its network topology from the data. This paper proposes a new algorithm for learning Bayesian network structure according to the rule base provided by many experts, it is applied in failure diagnosis of hydraulic machine governor in Jilin Fengman hydro-electrical simulation system. The reasoning results are more accurate and reliable.
2
Analysis on Localization Accuracy of RSSI-based Algorithm
Bayesian network topology is the foundation of Bayesian failure diagnosis model reasoning[3,4]. In order to achieve more accurate and precise diagnosis, it is crucial to learn the Bayesian network topology. For various learning algorithm of Zhou Yan · Li Peng Information Technology College, Eastern Liaoning University, Dandong, China J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 691–698. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
692
Z. Yan and L. Peng
Bayesian network topology, the calculated amount will increase in strong exponential order as the increase of node number, belonging to NP-hard problem[5,6]. Therefore, by combining research for Bayesian network and hydro-electrical simulation system, the paper proposes a expert rule base-oriented learning algorithm of Bayesian network topology.
2.1
Building User’s Profile
The algorithm is given below: (1) (2) (3) (4) (5)
Input number of people in expert group Lead rules from database Screen causality rule Call loop-breaking algorithm in case of return circuit, next step; Construct Bayesian network topology
Loop-breaking strategy: Conduct frequency calculation for all causality rules in the loop in case of causality loop (return circuit) in rule base, when frequency is less than threshold value (that is, the impact of cause of this rule is very weak on its effect), the rule will be discarded; repeat in this way until no loop exists. Structure learning algorithm first makes classification for rule base to find out the nodes with out-degree 0, then delete these nodes, and make recursive calls until all nodes in the rule base are deleted or certain rules left cannot be deleted, finally judge whether these rules have loops according to these characteristics. Expert knowledge base is stored in database, which can be both a rule provided by an expert and the rule jointly provided by many experts. This algorithm will first scan all rules for statistics, calculate the appearing probability of each rule according to the number of people in expert group to screen out the rules in certain quantity, in this way, suggestions of many experts can be integrated to construct Bayesian network topology and reduce the undesirable impact caused by human factors during constructing Bayesian network topology, thus enabling the constructed topological structure to be more reasonable.
22
Learning of CPT(Conditional Probabilities Table)
After the configuration of the Bayesian network topology construction, various prior information and Conditional Probability Table (CPT) adopted when fault is diagnosed based on Bayesian network could be obtained through the following two ways: (1) The prior probability required by Bayesian network can be obtained through experts in the field, statistical analysis or a combination of both of them, for as to the equipment of the same type, the conditional probability reflect an intrinsic
Application of Bayesian Network in Failure Diagnosis
693
causal information. This means that probability correlation between different levels of networks is relatively stable rather than diversified significantly because of different equipment. (2) The evidence information required by Bayesian network inference algorithms can be obtained after processing of the evidence information once the characteristics are extracted from equipment running state data through real-time online monitoring of the equipment or portable test system or can be obtained from on-site technicians.
3
3.1
Failure Diagnosis Model of Hydraulic Machine Speed Governor Based on Bayesian Network Discretization of Numerical Value Property
As same as many machine learning algorithms, sorting algorithms and uncertain reasoning algorithms, Bayesian network requires to dispose discretization property, so this paper should discretize continuous values in hydro-electrical simulation database. Discretization here refers to divide the range of value property into several subintervals, each interval corresponds to a discrete value, finally update the original data into discrete value. Value discretization algorithm requires to automatically determine the congruent relationship from continuous property to discretization property. Discretization algorithm can be divided into unsupervised discretization algorithm, such as equal width interval algorithm and equal frequency interval algorithm, K-means algorithm, etc.; supervised discretization algorithm, such as decision tree discretization algorithm, ChiMerge algorithm, D-2 algorithm, etc. During continuous value property discretization in hydro-electrical simulation database, this paper takes the simplest equal width interval algorithm and WILD algorithm[8]. Equal width interval algorithm is the most simplest unsupervised discretization algorithm, which divides range of value property into several intervals according to interval number designated by the user, and enable width of each interval to be same; WILD discretization algorithm is supervised discretization algorithm based on information theory (WILD-Weighted Information-Loss Discretization). Part of hydro-electrical simulation data variable indicators can be directly divided according to national standard, not necessary to do discretization treatment. The following is brief introduction of basic methods of WILD. Suppose sample collection has two properties: value property to be discretized, its range is [Xmin, Xmax] category property C, its range is discrete type, marked as: {C1,C2,…, Ck}.. First, utilize all different observed values of property X in sample collection to construct initial interval. Suppose different observed values from small to big as X1, X2, …, Xn. Then N initial intervals I1, I2, …, In. The structure is as follows:
694
Z. Yan and L. Peng
[xmin, x1], (x1, x2), (x2, x3), … ,(xn, xmax) Then, consider m adjacent intervals as one group, WILD compares intervals of each group [I1, I2,… ,In], [I2, I3, …, Im+1, …, In-m+1, In-m+2, …, In] and selects the most proper group and summarizes into one big interval, circulate in this way until meeting the condition. Because WILD has information loss during discretization, information loss quantity should be calculated to combine the interval group with the minimum weighted information loss. Weighted information loss is: |I| Information-loss WILD= N |I| is the number of samples of property value x on interval I. Information-loss is the information loss before and after combination of adjacent intervals.
=
Information-loss Ent(I)-Ent(I1, I2, … ,Im) Where Ent(Ii) is category entropy, which is defined as: k
Ent ( It ) = −∑ p (Ci , It ) × log p(Ci , It ) i =1
In the formula, p (Ci , I t ) is the probability that category property is Ci when property value x of sample is on interval It. Ent(I1, I2, … , Im) is category entropy before combining one group of adjacent intervals I1, I2, … , Im. into one interval: m
It
i =1
I
Ent ( I1 , I 2 ,…, I m ) = ∑
× Ent ( It )
In hydro-electrical simulation failure diagnosis system, flow, WILD algorithm is very suitable for discretization treatment for water level and other properties. In order to simplify calculation, this paper only selects small quantity of properties, thus improving diagnosis precision by increasing property number during practical application. Sample properties can be obtained after discretization
3.2
Bayesian Network Model for Failure Diagnosis of Hydraulic Machine Governor
Failure of hydraulic machine governor in hydro-electrical simulation system can be divided into 3 major categories: load shedding failure, servomotor failure and idling frequency failure. (1) Factors impacting load shedding failure: governor speed dead band value and PID regulation parameter value. (2)Factors impacting servomotor failure: on time and off time of servomotor. (3)Factors impacting idling frequency failure: idling frequency swing value.
Application of Bayesian Network in Failure Diagnosis
695
According to the relation of the above hydraulic machine governor failure, Bayesian network model for failure diagnosis of hydraulic machine governor can be constructed, as shown in Fig. 1. According to the relation of the above hydraulic machine governor failure, Bayesian network failure diagnosis model of hydraulic machine governor in Fig. 1, we can abstract into Bayesian network model in Fig. 2. Meaning of each variable is given below: A: governor failure. B: servomotor failure. C: load shedding failure. D: idling frequency failure. E: on time of servomotor. F: off time of servomotor. G: governor speed dead band value. H: PID regulation parameter value.
Fig. 1 Failure diagnosis model of hydraulic machine governor
All nodes mentioned above are state nodes including 3 states: slightly high, normal, slightly low. Each state has one value indicated with probability. Therefore, each node has one multiple valued variable. That with directed edge between two nodes represents the correlation between two nodes. Because each node has 3 states, there are totalling 9 conditional probability values between two nodes. Conditional probability table is formed after summary, indicated with matrix. Directed arc is used to indicate in case of any impact between nodes. For causality X → Y , X=(x1,x2,x3), Y=(y1,y2,y3), in Bayesian network method, conditional probability matrix is used to determine the causality between X and Y. As shown in Fig. 2, it is the abstract failure diagnosis model for Bayesian network of hydraulic machine governor, each model with directed edge in the figure stands for a causality, respectively indicated with one conditional probability matrix. In Bayesian network model, we call it conditional probability table.
696
Z. Yan and L. Peng
Fig. 2 Failure diagnosis model of hydraulic machine governor
M y| x
⎛ P(y1 | x 1 ) P(y 2 | x 1 ) P(y 3 | x 1 ) ⎞ ⎜ ⎟ P( y| x ) ⎜ P(y 1 | x 2 ) P(y 2 | x 2 ) P(y 3 | x 2 ) ⎟ ⎜ P(y | x ) P(y | x ) P(y | x ) ⎟ 1 3 2 3 3 3 ⎠ ⎝
=
=
In which, P(yj | x i ) indicates that when X is x i , Y will be probability of yj .
4
Application and Analysis
In the actual application of this simulation system, under normal operating state, setting and elimination can be implemented automatically via coach table according to the scheduled strategy. In the experimental analysis of this paper, in order to perform the contrast between experimental data and site data, first construct two subsystems including sensor threshold value method module and Bayesian network method module adopted in this paper, and replace the former with later during data analysis for conducting effective performance comparison in the same unit environment. Then, select unit no-load running and active load under the parallel network working condition with maximum value 98% as example, randomly and automatically set up the evidences of various failure nodes with coach table, and make statistics of all results for experimental verification. Tables 1-2 are part of result data in the experiment. The following conclusion can be achieved according to the above experimental analysis: (1)Two methods can give correct judgement for the data in normal state and with regular pattern. (2) The two methods can find out the danger and failure of parameter out of standard. For the danger and failure information with parameters below security line hidden in the data, CPT Sheet used in the Bayesian network reasoning method of this paper is obtained by means of many times of learning. Therefore, both the accuracy and effect are satisfactory.
Application of Bayesian Network in Failure Diagnosis
697
Table 1 Experimental results with traditional sensor Working condition
No-load
Active Load 98%
Failure node
Example
Right judgement
Wrong judgment
Accuracy rate
E
147
108
39
73.5%
F G H I E F G H I
236 217 170 195 263 431 506 360 276
172 148 112 148 188 302 335 235 204
64 69 58 47 75 129 171 125 72
72.8% 68.2% 65.8% 75.9% 71.5% 70.0% 66.2% 65.3% 73.9%
Table 2 Experimental results with bayesian network Working condition
No-load
Active Load 98%
Failure node
Example
Right judgement
Wrong judgement
Accuracy rate
E
147
122
25
82.9%
F G H I E F G H I
236 217 170 195 263 431 506 360 276
196 166 127 169 206 345 360 264 231
40 51 43 26 57 86 146 96 45
83.0% 76.5% 74.7% 86.7% 78.3% 80.0% 71.1% 73.3% 83.7%
Bayesian network enables the model to get the reasoning results with strong persuasion by means of many times of Bayesian network learning based on probabilistic reasoning.
5
Conclusions
According to the comparative analysis for the above three experiments, we have found out that during the practical application in problem diagnosis with Bayesian network failure diagnosis method, its precision is superior to sensor threshold value method and Bayesian network model method without learning.
698
Z. Yan and L. Peng
Acknowledgements. This work is supported by the Science and Technology Foundation of the Education Department of Liaoning Province (No.2008212, No.L2010141). the National Natural Science Foundation of Eastern Liaoning University (2009-Z03).
References 1. Baldwin, J.L.: Application of neural network to petroleum of mineral identification from well logs. The Log Analyst 31(5), 279–285 (1990) 2. Wang, L., Liu, M.-h., Wang, W.-p.: Structure learning method of Bayesian network with uncertain prior information. Computer Engineering and Applications 46(16), 39–41 (2010) 3. Huang, H.-x., Heng, X.-c., Peng, J.-h.: Structure learning algorithm of Bayesian networks on particle swarm optimization. Computer Engineering and Applications 46(20), 193–196 (2010) 4. Monson, C.K., Seppi, K.D.: Bayesian optimization models for particles swarms. In: Proceedings of Seventh Genetic and Evolutionary Computation Conference, USA, pp. 25–29 (2005) 5. Chickering, D.M.: Learning bayesian networks is NP-complete. Learning from Data: AI and Statistics, pp. 121–130 (1996) 6. Clerc, M.: Discrete particle swarm optimization illustrated by the traveling salesman problem. In: New Optimization Techniques in Engineering, pp. 219–239 (2004)
Application of Evidence Fusion Theory in Water Turbine Model Li Hai-cheng and Qi Zhi
*
Abstract. In hydro-electrical running simulation system, dynamic characteristics of water turbine unit cause conventional PID machine difficult to adapt to all operating conditions, thus resulting in excessive overshoot during regulation, long regulating time, variation of revolving speed and frequency of the system and other problems, so one water turbine PID model based on evidence fusion theory has been proposed. This model takes local decision results of multi-sensor distributed in the unit as evidence and fuses multi-source evidence in PID fusion center by means of evidence reasoning to improve the whole regulating performance of water turbine. The operation of Fengman hydro-electrical PID simulation system has proved the effectiveness and practical application value of this model.
1
Introduction
As the automation and intelligent level of hydro-electrical running system constantly increase, higher requirements have been proposed for hydro-electrical running simulation system with purpose of scientific research and training[1,2]. Speed governing system of water turbine is the core device for regulating revolving speed and grid frequency, its simulation modeling has become one of the core problems of the whole system simulation. Because PID control method is featured by simple calculation and good stability, most of existing speed governing systems of water turbine adopt PID control method. However, it is discovered through practical development and operation of hydro-electrical running simulation system that, it is difficult for conventional PID control to achieve ideal control effect, main representation is that when traditional PID linear combination algorithm is used, governing system of water turbine is non-linear, time variant, having non-minimum phase system[3], enabling conventional PID difficult to adapt to all operating conditions, thus causing excessive overshoot during regulation, long regulating time, thus leading to vibration of revolving speed and frequency of the system and other problems. Li Hai-cheng · Qi Zhi Information Technology College, Eastern Liaoning University, Dandong, China J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 699–706. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
700
L. Hai-cheng and Q. Zhi
Considering the above problems exist in the process of PID regulation, this paper proposes one water turbine PID model based on evidence fusion theory. This model divides original PID control algorithm into different control algorithm, takes multi-sensor local decision information as fusion evidence during operation, and fuses different evidence in the way of evidence reasoning, thus enhancing the whole regulating performance of water turbine. The operating experiment of Fengman hydro-electrical running simulation system has indicated that[4], compared with traditional PID models, this model can increase response speed, reduce overshot during regulation and effectively improve the control performance of speed governing system of water turbine.
2
PID Control System of Water Turbine
In hydro-electrical running simulation system, speed governing system of water turbine is the equipment regulating revolving speed and frequency of hydroelectric generator set, which jointly constitutes a loop-locked regulating system with hydroelectric generator set. Various dynamic variation processes of this system are mainly controlled by regulating model of speed governing system, and common water turbine regulating model adopts traditional PID control algorithm, so the performance of PID controller can greatly impact the stability and efficiency of operation of whole unit. In PID regulating model, it constitutes proportion, integral and differential of frequency difference into controlled variable through linear combination for controlling frequency. Proportional element timely reflects frequency difference signal of control system in proportion, once the frequency difference generates, the controller will immediately play a role of control to reduce frequency difference; integral element is mainly used to eliminate offset and increase indifference level of system; differential element can reflect variation trend (variation rate) of frequency difference signal as well as introduce one valid correction signal at early stage in the system before value of frequency difference signal becomes too big, thus accelerating the system's speed of action and shortening regulating time. Usually, the dominating PID regulating mathematical model is as shown in Formula (1).
y (t ) = K [ E (T ) +
1 Ti
∫
t
0
E ( s )ds + Td
dE (t ) ] dt
(1)
It is discovered through development and practical operation of the simulation system that, the operating conditions for water turbine PID controller are extremely complicated, including operating conditions for starting up, corresponding period, synchronization, load shedding, shut down and joint regulation, and each particular operating condition has different performance requirements for PID controller, so different control algorithm should be adopted for PID controller. For example, under operating condition of starting up, when revolving speed is less than 95%, PID controller is required to increase revolving speed of unit at maximum speed, at
Application of Evidence Fusion Theory in Water Turbine Model
701
this time, frequency difference can be used as output of PID controller; and when revolving speed is more than 95%, excessive fluctuation of revolving speed should be avoided while enabling revolving speed to rapidly tend to be rated one, therefore, PD algorithm can be adopted at this time. Under operating condition of load shedding, the system requires to close the opening of guide vane and regulate fluctuation of revolving speed at the same time in order to stabilize it until reaching rated revolving speed as soon as possible, at this time, PID algorithm should be used. Thus it can be seen that dynamic characteristics of water turbine unit are main factors impacting performance of PID controller, and at present it is hard for conventional PID control algorithm to adapt to all operating conditions. Considering the above problems exist in the operation of PID controller as well as its non-linear and dynamic operating condition, and these factors of dynamic variation are hard to be precisely defined or described with mathematical model. This paper applies multi-sensor information fusion technology in the regulating process of PID model to obtain precise environmental state, increase control level of confidence, reduce ambiguity to some extent, thus ameliorating the regulating performance of PID controller.
3
Evidence Theory of Information Fusion
Evidence theory of information fusion measures uncertainty of evidence with confidence interval ration made up by belief function and likelihood function of evidence, realizes fusion decision under multiple evidence with evidence rule of combination. Evidence fusion method is applicable to making system decision under the condition of unknown prior probability of decision information[5]. Core thought of evidence fusion theory is as follows: Suppose θ is collection of all possible values of system decision, and all elements within θ are complete and mutually incompatible, then call θ is frame of discernment of system. In frame of discernment, if function m : 2θ → [0,1] meets the following condition:
m ( φ) = 0,
∑
A ⊆θ
m(A) = 1
(2)
call m (A) is elementary probability assignment function of A , it refers to exact degree of trust for this evidence. In evidence fusion, belief function Bel( A) of evidence A is also called lower limit function of evidence, meaning all trust for this evidence. Likelihood Pl( A) function is also called upper limit function, showing degree of trust for non A. Then [ Bel( A) , Pl( A) ] will define the confidence interval of A.
Bel( A) =
∑ m( B)
(3)
∑
(4)
B⊆ A
Pl( A) =
B ∩ A =ϕ
m( B )
702
L. Hai-cheng and Q. Zhi
In the research of evidence fusion theory, rule of combination of evidence is the core of whole algorithm. Suppose m1 and m2 are respectively elementary probability assignment functions of two mutually independent evidence on 2θ , then elementary probability assignment function of achieved conclusion after fusion of two evidence is: ⎧1 ⎪ ∑ m( A)m( B) c ≠ φ m (C ) = ⎨ K A ∩ B = C ⎪0 c=φ ⎩ In the formula, K is inconformity factor.
K = 1−
∑
m( A)m( B)
(5)
A∩ B = φ
4 4.1
Water Turbine PID Model Based on Evidence Fusion Structure of Water Turbine PID Model
In the regulating system consists of PID regulator and water turbine unit, non-linear and dynamic characteristics of water turbine unit are main factors impacting PID control effect, and these factors are very hard to be precisely defined or described with mathematical model. Therefore, one group of sensors in water turbine unit are set in the research to periodically collect dynamic information of operation of water turbine unit, and each sensor will make independent decision fro adopted PID control algorithm based on its own observation and collect decision information to PID fusion center, where fusion rules conduct fusion treatment for all local
Fig. 1 PID Model Structure based on Evidence Fusion
Application of Evidence Fusion Theory in Water Turbine Model
703
decisions, the overall decision acquired will directly act on PID control module to control the regulating quantity of this period. Structure of whole PID fusion module is as shown in Fig. 1.
4.2
Fusion Algorithm of Water Turbine PID Model
In the PID water turbine model based on evidence fusion, in consideration of different performance requirements of different operating conditions for PID regulating model, different regulating algorithms are adopted. PID algorithms may be adopted by water turbine system by means of analysis for practical data and hydroelectric theory are as shown in Table 1, in which Null refers that, when the unit is under operating condition of starting up and with revolving speed less than 95%, the system will directly take difference value of frequency measurement as output value of regulating module without PID regulation. In Table 1, algorithms with different types correspond to different algorithm parameters, so different algorithm parameters can be used according to the original PID Formula (1) so as to simplify design and implementation of water turbine PID model. Table 1 PID algorithms of water turbine system Type
Algorithm
Algorithm Parameters
PID
P+I+D
Kp=kp, Ti=ti, Td=td
PI
P+I
Kp=kp, Ti=ti, Td=0
PD
P+D
Kp=kp, Ti=0, Td=td
Null
Null
Kp=0,Ti=ti, Td=0
In algorithm research for this model, first divide PID algorithms can be adopted by the system into four objective situations as above, and all sensors in system model will respectively make PID algorithms should be adopted by the system based on their observation, current core of algorithm will focus on fusion of each sensor's local decision results, and finally deciding the PID algorithms should be adopted by the system. Therefore, cores of fusion algorithm are divided into rules of combination and rules of decision. Suppose water turbine PID fusion center to have N sensors to monitor operating environment information of water turbine unit, decision made by each sensor is i < m ipid , m ipi , m ipd , mnull > , final decision made by the system after fusion is d ∈ D = {PID, PI , PD, Null} , then algorithm of water turbine PID fusion center is as follows.
704
L. Hai-cheng and Q. Zhi
Step 1 Obtain local decision results of N sensors to fusion matrix R of fusion center;
⎡ m1pid m1pi m1pd m1null ⎤ ⎢ ⎥ L ⎥ R = ⎢K ⎢mN mN mN mN ⎥ ⎣ pid pi pd null ⎦
(6)
Step 2 Initialize sensor No. i = 1 ; i ⎡ mipid mipi1 mipd mnull ⎤ ⎢ ⎥ L ⎥ r = ⎢K ⎢ mi +1 mi +1 mi +1 mi +1 ⎥ pd null ⎦ ⎣ pid pi
(7)
Step 4 Calculate inconformity factor K in this combining process according to Formula (6); Step 5 Calculate evidence combination after fusion of two sensors this time according to Formula (5); s' =< m pid , m pi , m pd , m null > Step 6 Replace evidence of sensor si +1 into s ' ; Step 7 If t constitutes judgement vector, in which decision d = arg max ms' , then d is final decision result of d⊂D PID fusion center. In the above fusion algorithm, Step 2- Step 7 realized combining process of fusion system. Step 8 realized decision process of fusion system. In this algorithm, decision process adopts the principle of selecting maximum probability assignment function.
5
Applied Cases
In practical model application in this paper, this model is applied in the hydroelectric simulation system of Jilin Fengman Hydropower Station to simulate operating dynamic characteristics of whole unit system in this simulation system, including operating conditions for the unit's starting up, shut down, power generation, phase modulation, load shedding, etc. Fig. 2 and Fig. 3 respectively show variation curves of revolving speed and opening of guide vane during load shedding by adopting traditional PID control module and improved PID control module water turbine system under the same unit initial conditions.
Application of Evidence Fusion Theory in Water Turbine Model
705
Revolving Speed of Unit
Percentage/%
0.8
Opening of Guide Vane
0.6 0.4 0.2 0
-0.2
0
0.5
1
1.5
2 T ime/s
2.5
3
3.5
4
Fig. 2 PID model structure based on evidence fusion
According to the comparison between the above two figures, during closing opening of guide vane at full speed and open again, original PID model is too big, impacting stable operation of unit under this operating condition, and the improved model can make revolving speed and opening of guide vane rapidly transit into stable state. 1 Revolving Sp eed of Unit
Percentage/%
0.8
Opening of Guide Vane
0.6 0.4 0.2 0 0
1
2 Time/s
3
4
Fig. 3 PID model structure based on evidence fusion
6
Conclusion
Based on PID model of evidence fusion, the requirements of hydro-electrical simulation system can be completely met to solve the problem caused to system due to disadvantages of PID model. In particular, during dynamic regulation when simulation system is under operation, PID water turbine system based on evidence fusion has fast response time and high regulating precision, thus ensuring the operating effect of system simulation.
706
L. Hai-cheng and Q. Zhi
Acknowledgements This work is supported by the Science and Technology Foundation of the Education Department of Liaoning Province (No.2008212, No.L2010141). the National Natural Science Foundation of Eastern Liaoning University (2009-Z03).
References [1] Brezovec, M., Kuzle, I.: Nonlinear digital simulation model of hydroelectric power unit with kaplan turbine. IEEE Transaction on Energy Conversion 21(1), 235–241 (2006) [2] Munoz-Hernandez, G.A., Jones, D.I., Fuentes-Goiz, S.I.: Modelling and simulation of a hydroelectric power station using MLD. In: Proceedings of the 15th International Conference on Electronics, Communications and Computers, CONIELECOMP, pp. 83–88 (2005) [3] Doan, R.E., Natarajan, K.: Modeling and control design for governing hydroelectric turbines with leaky Wicket gates. IEEE Transactions on Energy Conversion 19(2), 449–455 (2004); Ja-Hwung, S., Bo-Wen, W.: Personalized rough-set-based recommendation by integrating multiple contents and collaborative information. Information Sciences 18, 113–131 (2010) [4] Zhang, x., Zhao, H., Wei, S.: Application of Fuzzy Pattern Recognition in Hydroelectricity Simulation Examining System. Computer Engineering 31(6), 22–24 (2005); Massa, P., Avesani, P.: Trust-aware collaborative filtering for recommender systems. In: Proceedings of International Conference on Cooperative Information Systems, pp. 492–508 (2004) [5] Iang, W., Wu, S.: Multi-data fusion fault diagnosis method based on SVM and evidence theory. Chinese Journal of Scientific Instrument 31(8), 1738–1743 (2010)
Calculating Query Likelihoods Based on Web Data Analysis Koya Tamura, Kenji Hatano, and Hiroshi Yadohisa
Abstract. The language model for information retrieval has statistical background and can adapt previous text information retrieval model. Therefore, this model has attracted much attention in recent years. This retrieval model considers only text information. However, we focus on the Web page retrieval in one of the retrieval tasks. Web pages also have some kind of features, so that we should consider another information for the Web page retrieval. Especially, Web pages consist the hyperlink information that is beneficial information for Web page retrieval. In this paper, we propose new retrieval approach considering a feature of term in neighboring Web pages using the hyperlink information. Keywords: Web Search, Language Model, Hyperlink Analysis.
1 Introduction In today’s advanced information society, the World Wide Web (WWW) has a massive number of Web pages. Therefore, Web search engines are necessary to find useful information. Nevertheless, it make it difficult to find useful information exactly Koya Tamura Graduate School of Culture and Information Science, Doshisha University1-3 Tatara-Miyakodani, Kyotanabe, Kyoto 610-0394, Japan e-mail: [email protected] Kenji Hatano Faculty of Culture and Information Science, Doshisha University 1-3 Tatara-Miyakodani, Kyotanabe, Kyoto 610-0394, Japan e-mail: [email protected] Hiroshi Yadoshisa Faculty of Culture and Information Science, Doshisha University 1-3 Tatara-Miyakodani, Kyotanabe, Kyoto 610-0394, Japan e-mail: [email protected]
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 707–717. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
708
K. Tamura, K. Hatano, and H. Yadohisa
because the data on the Web has increased over the last ten years. Thus improving retrieval accuracy of a Web search engine is one of the important task in this research field. In the research field of information retrieval, the probabilistic language model has attracted much attention in recent years. It mainly used in the machine translation and the speech recognition [1]; however, Ponte and Croft have adopted it to document search [8]. This model, which is usually called the query likelihood model, is based on statistical and represents term appearance stochastically, so that it is said that it has statistical background and can adapt previous information retrieval model easily. The query likelihood model consider only text information in one document. In the case of searching Web pages, however, a Web pages consists of text information as well as hyperlinks. The hyperlinks among Web pages are useful information for Web information retrieval, because some previous researches show that the hyperlink helps to improve retrieval accuracy. for example, the PageRank and HITS algorithms are well-known Web retrieval techniques [5] [6] and utilize the hyperlinks to evaluate which Web page is valuable or not. In this paper, we expand the query likelihood model to utilize the hyperlinks among Web pages, and propose a new retrieval model for searching Web pages. In our model, we calculate term likelihoods of its neighboring Web pages using hyperlink structure among Web pages while the previous model to query likelihood of a document. This is the different point compared with existing researches. The remainder of this paper is organized as follows : In Section 2, we describe related work and basic issue of the query likelihood model. In Section 3, we introduce our proposal considering term likelihoods of neighboring Web pages. In Section 4, we report our experimental results for evaluating our method. Finally, in Section 5, we conclude our paper and mention directions for future work.
2 Basic Issues and Related Work In this section, we describe a basic issue of the query likelihood model and related works.
2.1 Query Likelihood Model The query likelihood model is one of the retrieval models based on calculating query likelihoods which means suitability between query and documents [8, 9]. In context of the query likelihood model, document is regarded as sample from underlying the language model [1]. The language model in the document called document model, we have to estimate the document for calculating the query likelihood. In order to calculate it, the unigram is generally used in the field of information retrieval. The unigram model assume that the words independently occur in each document. Thus, we can calculate their suitabilities as follows: ˆ P(Q|M di ) =
ˆ i j |Md ) ∏ P(t i
ti j ∈Q
(1)
Calculating Query Likelihoods Based on Web Data Analysis
709
Here, di (i = 1, 2, . . . , l) is document, and a query Q which consists of a set of query keywords is issued by a user. The query keywords are usually contained in several Web pages, so that we denote the word as ti j ( j = 1, 2, . . . , m) in the document di . At this time, m is the number of unique words contained in all Web pages. In Equation ˆ (1), P(Q|M di ) is called a query likelihood, so that the documents are ranked in order of their query likelihoods. In the query likelihood model, we have to estimate probabilities of occurring individual query keywords. They depend on not context of a document but the document model, so that we can calculate the query likelihoods using the maximum likelihood estimate of individual word ti j as follows: d
Pmle (ti j |Mdi ) =
t fti ji
(2)
Ndi
where Mdi is the document model which is the language model in a document di , t ftdi ij is occurrence of word ti j in the document di , and Ndi is length of the document di . However, the query likelihood model has a problem called “zero-probability problem.” Probability of occurring word would be zero when the word does not appear in a document. As a result, the query likelihood of the document would also be zero even if many query keywords exist in a query. To cope with this problem, the probability of the word would not be zero using some smoothing techniques to combine them in a document. One of the most famous smoothing technique which is called Jelinek-Mercer [4] is defined as follows: ˆ i j |Md ) = ω Pmle (ti j |Md ) + (1 − ω )Pmle(ti j |Mc ) P(t i i
(3)
where ω is a weighting parameter 0 < ω < 1, and Mc is the corpus model which is based on probabilities of occurring words in all documents. Using the corpus model can help to avoid the zero-probability problem even if Pmle (ti j |Mdi ) = 0. The query likelihoods based on the corpus model Mc can be defined as the following Equation: d
Pmle (ti j |Mc ) =
∑di ∈c t fti ji ∑di ∈c Nd
(4)
In the later of this paper, we regard query likelihood model as baseline method.
2.2 Related Work In the previous research, retrieval model considering neighboring pages have proposed. Once is the method for characterizing Web pages based on the query likelihoods of neighboring pages called method ST [10]. In particular, it is assumed that a hyperlink between a Web pages indicates some kinds of relationship, so that it is to reflect query likelihood of a target Web page, as well as also the query likelihoods
710
K. Tamura, K. Hatano, and H. Yadohisa
of its neighboring Web pages. In experimental result of the study, the retrieval accuracy of the method is better than baseline method. However, method ST dose not consider query keyword likelihoods of neighboring pages. The query likelihood consists of many query keywords likelihood, so that we should dose not consider query likelihood but consider the query keyword likelihood. The other is expansion of the query likelihood model which is called the cluster language model [7]. It divides all documents into K cluster, and query keyword likelihoods of cluster and corpus are used for estimating the query likelihood of a document. In this study, a clustering algorithm is used to obtain similar documents; however, is takes a lot of time to get K cluster.
3 Link-Based Language Model As described above in Section 2.2, our past proposal considers whole query likelihood of neighboring Web pages. Although, our past proposal cannot consider features of each query keywords because the query that the user submits is made up of many query keywords. Fig. 1 shows an example. In this situation, query Q is comprised of three query keywords ’obama’, ’family’, and ’tree’, that is, the user want information of president Obama’s family tree. In order to consider neighboring pages’ content, we should reflect neighboring Web pages’ query likelihoods P(Q|Md2 ), P(Q|Md3 ) to target Web page P(Q|Md1 ). In Fig. 1, query keyword likelihoods of document d2 are calculated P(’obama’|Md2 ) = 0.01, P(’family’|Md2 ) = 0.1, and P(’tree’|Md2 ) = 0.1,and thereby whole query likelihood is calculated P(Q|Md2 ) = 0.0001 by Equation (1). Hence, ’family’ and ’tree’ have high probabilities of occurring in a document d2 - namely, a document d2 contains portion of content related with ’family’ and ’tree.’ In contrast, query keyword likelihoods of a document d3 are calculated P(’obama’|Md3 ) = 0.1, P(’family’|Md3 ) = 0.01 and P(’tree’|Md3 ) = 0.1, therefore, whole query likelihood is calculated P(Q|Md3 ) = 0.0001. Consequently, a document d3 contains portion of content which is written about ’obama’ and ’tree’. The problem here that both whole query likelihoods of document d2 and d3 are same value, 0.0001. Consequently, if we apply our past approach to this situation, we regard different types of documents as same feature documents. For this reason, as shown Fig. 2, we must reflect the query keyword likelihoods of neighboring pages to the query keyword likelihood of target page. As mentioned in the above concept, we propose Link-Based Language Model (LBLM) which is to consider the query keyword likelihoods of neighboring pages connected by hyperlinks. In this proposal, in addition to being based on the query likelihood model described in Section 2.1, we calculate the query keyword likelihood in a set of neighboring pages (Fig. 3) called the Link Model. Our proposal is represented as follows; P(ti j |Mdi ) = λ 1 P(ti j |Mdi ) + λ 2P(ti j |Ldi ) + λ 3 P(ti j |MC )
(5)
Calculating Query Likelihoods Based on Web Data Analysis
711
d3
d2 d1
Fig. 1 Considering whole query likelihood of neighboring pages d2
d3 d1
Fig. 2 Considering each query keyword likelihood of neighboring pages d
Target page
d
i
d
d
k i
d
i
d
k i
k i
d
d
k i
i
d
k i
k i
Set of neighboring pages
Fig. 3 Link-Based Language Model
Here, λ is weighting parameters where λ 1 + λ 2 + λ 3 = 1, 0 ≤ λ 1 , λ 2 , λ 3 ≤ 1.0. Therefore, there are 66 combinations of parameters. Moreover, L is the Link Model. P(ti j |Ldi ) is calculated by the likelihood of term ti j under the Link Model L as follows ; dk
P(ti j |Ldi ) =
∑d k ∈Ld t fti ji i
i
∑d k ∈Ld Nd k i
i
(6)
i
Here, dik is a neighboring page of di connected by hyperlinks and Ldi is a set of them. Finally, the query likelihood is calculated by Equation (1) using the query keywords likelihoods calculated by Equation (5).
712
K. Tamura, K. Hatano, and H. Yadohisa
4 Experimental Evaluations To evaluate the effectiveness of our proposal, we used the TREC test collection (ClueWeb09 Dataset Category B). As a preliminary experiment, we compare the retrieval accuracies with different the hyperlink type and parameter settings. We can compare our proposal to both the BaseLine which does not consider the query likelihood of neighboring Web pages mentioned in Section 2 and method ST which is our past proposal to consider whole query likelihoods of neighboring Web pages [10].
4.1 Test Collection Our experiment uses ClueWeb09 Dataset Category B [2] provided by TREC 1 (Text REtrival Conference). TREC is a workshop focusing on information retrieval (IR) research areas, and cosponsored by US Department of Defense and the National Institute of Standards and Technology (NIST). This dataset consists of 50 million English pages (Unique URLs: 428,136,613CTotal Outlinks: 454,075,638) collected in 2009, 50 topics, and their sets of answers. Eventually, to make our rank list of documents in the top 1,000. we eliminate stop words from all documents using Salton’s stop word list2 and do a stemming processing based on the Porter Stemmer3 . To evaluate the effectiveness of our proposal, we use precision (Prec.) and the number of retrieved relevant documents (Rel. Retr.). We calculate the precision as follows ; number of retrieved relevant Web pages Precision = (7) number of retrieved Web pages In particularly about precision, we use top 10 (P@10), at the 11 point of number of retrieved document (0.0 - 1.0), and mean average precision(MAP).
4.2 Considering Different Types of Hyperlink There are two types of hyperlinks in Web pages the inlink and the outlink. The outlink which is put in place by the author goes to other Web pages. In contrast, the inlink which is created independently of the author come from the other Web pages. In this situation we must decide which type of hyperlink do we choose to achieve the best result in our experiments? Thus in this experiment, we we must compare the results retrieved by using each of the following : the outlink only, the inlink only, and the combination of both the inlink and the outlink. Table 1 show MAP, and Rel.Retr. in each experiment. Where results are averaged by 66 combinations of parameters. 1 2 3
http://trec.nist.gov/ ftp://ftp.cs.cornell.edu/pub/smart/english.stop http://www.tartrus.org/%7Emartin/PorterStemmer/
Calculating Query Likelihoods Based on Web Data Analysis
713
Table 1 Comparing result about hyperlinks type Inlink MAP Rel.Retr. Ave. 0.2174 SD 0.03404 Min. 0.1540 Max. 0.2593
Outlink MAP Rel.Retr.
4,727 Ave. 0.2518 754.8 SD 0.02054 3,213 Min. 0.1825 5,557 Max. 0.2638
Inlink and Outlink MAP Rel.Retr.
5,469 Ave. 0.2176 152.6 SD 0.03399 4,980 Min. 0.1541 5,580 Max. 0.2594
4,735 751.9 3,228 5,562
Averages of MAP and Rel.Retr. using the outlink only is the best in three experiments, and their standard deviation (SD) using the outlink only are also the smallest. From these reason, we believe choosing the outlink only is stable in our experiment.
4.3 Parameter Setting As we described in Section 4.3, we obtain a set of neighboring pages using the outlink only to perform our experiment. LBLM requires the settings of λ1 , λ2 , and λ3 in Equation (5). Therefore, in this section, we discuss how to set them. λ1 , λ2 and λ3 are related to likelihoods in the language model Mdi , the likelihood Ldi , and the corpus model MC , respectively, and fulfill the conditions λ1 + λ2 + λ3 = 1 and 0.0 ≤ λl ≤ 1.0(l = 1, 2, 3) ; therefore, there are 66 combinations of parameters. Eventually, we can calculate 66 pairs of MAP and Rel.Retr. shown in Table 2. As shown in Table 2, (λ1 , λ2 , λ3 ) = (0.4, 0.1, 0.5) is the best result in both MAP and Rel.Retl. Moreover, when we set λ2 ≤ 0.5, retrieval accuracy is relatively-good. Hence, it is said that we can improve retrieval accuracy to small parameter related with the link model.
4.4 Experimental Result As we described in Section 4.3, we set (λ1 , λ2 , λ3 ) = (0.4, 0.1, 0.5) as parameters in Equation (5). Using these parameters, we can get the results shown in Table 3 and 4. These results are given by above mentioned criteria. The difference between LBLM and baseline (BL) is whether we can consider the contents of neighboring pages for searching Web pages or not. Moreover, the difference between LBLM and ST is the procedures for calculating a query likelihood of a Web page. As shown Table 3, MAP and Rel.Retr. of LBLM improve by 3.72 % and 5.78 % respectively compared with those of BL. As shown Table 4, MAP and Rel.Retr. of LBLM also improve by 1.10 % and 1.51 % respectively compared with those of ST. In order to confirm significant difference between LBLM and both BL/ST, we conduct another experiment using the hypothesis testing. Thus, we use the hypothesis testing called wilcoxon signed rank test [3] for testing the difference between LBLM and BL, ST. “*” in Table 3 and 4 represents significance difference. In consequence, especially we can confirm the significant difference of both MAP and
714
K. Tamura, K. Hatano, and H. Yadohisa
Table 2 Comparing result about each combination of parameter λ1 , λ2 , λ3 0.0, 0.0, 1.0 0.0, 0.1, 0.9 0.0, 0.2, 0.8 0.0, 0.3, 0.7 0.0, 0.4, 0.6 0.0, 0.5, 0.5 0.0, 0.6, 0.4 0.0, 0.7, 0.3 0.0, 0.8, 0.2 0.0, 0.9, 0.1 0.0, 1.0, 0.0 0.1, 0.0, 0.9 0.1, 0.1, 0.8 0.1, 0.2, 0.7 0.1, 0.3, 0.6 0.1, 0.4, 0.5 0.1, 0.5, 0.4 0.1, 0.6, 0.3 0.1, 0.7, 0.2 0.1, 0.8, 0.1 0.1, 0.9, 0.0 0.2, 0.0, 0.8
MAP Rel.Retr. λ1 , λ2 , λ3 MAP Rel.Retr. λ1 , λ2 , λ3 MAP Rel.Retr. 0.002955 119 0.2, 0.1, 0.7 0.2627 5,526 0.4, 0.6, 0.0 0.2629 5,540 0.01751 192 0.2, 0.2, 0.6 0.2622 5,525 0.5, 0.0, 0.5 0.2410 5,003 0.01817 194 0.2, 0.3, 0.5 0.2594 5,514 0.5, 0.1, 0.4 0.2628 5,571 0.01815 192 0.2, 0.4, 0.4 0.2534 5,512 0.5, 0.2, 0.3 0.2631 5,572 0.01823 192 0.2, 0.5, 0.3 0.2506 5,494 0.5, 0.3, 0.2 0.2630 5,563 0.01812 192 0.2, 0.6, 0.2 0.2465 5,446 0.5, 0.4, 0.1 0.2622 5,544 0.01789 192 0.2, 0.7, 0.1 0.2423 5,400 0.5, 0.5, 0.0 0.2592 5,381 0.01772 192 0.2, 0.8, 0.0 0.2360 5,217 0.6, 0.0, 0.4 0.2401 5,012 0.01798 193 0.3, 0.0, 0.7 0.2530 5264 0.6, 0.1, 0.3 0.2627 5,565 0.01795 193 0.3, 0.1, 0.6 0.2633 5,561 0.6, 0.2, 0.2 0.2630 5,569 0.01753 187 0.3, 0.2, 0.5 0.2635 5,563 0.6, 0.3, 0.1 0.2622 5,548 0.2418 4,999 0.3, 0.3, 0.4 0.2630 5,559 0.6, 0.4, 0.0 0.2598 5,395 0.2587 5,393 0.3, 0.4, 0.3 0.2615 5,537 0.7, 0.0, 0.3 0.2467 5,099 0.2502 5,376 0.3, 0.5, 0.2 0.2574 5,533 0.7, 0.1, 0.2 0.2618 5,571 0.2435 5,346 0.3, 0.6, 0.1 0.2543 5,508 0.7, 0.2, 0.1 0.2610 5,544 0.2341 5,292 0.3, 0.7, 0.0 0.2491 5,334 0.7, 0.3, 0.0 0.2589 5,399 0.2182 5,212 0.4, 0.0, 0.6 0.2432 5,050 0.8, 0.0, 0.2 0.2541 5,275 0.2041 5,142 0.4, 0.1, 0.5 0.2636 5,580 0.8, 0.1, 0.1 0.2600 5,559 0.1931 5,067 0.4, 0.2, 0.4 0.2634 5,573 0.8, 0.2, 0.0 0.2580 5,415 0.1825 4,980 0.4, 0.3, 0.3 0.2636 5,565 0.9, 0.0, 0.1 0.2458 5,068 0.1739 4,785 0.4, 0.4, 0.2 0.2634 5,547 0.9, 0.1, 0.0 0.2548 5,406 0.2314 5,021 0.4, 0.5, 0.1 0.2629 5,540 1.0, 0.0, 0.0 0.2430 5,123
Rel.Retr. As we described above, our approach is more suitable for searching Web pages based on the language model.
4.5 Discussion of Time Complexity In this section, we discuss about time complexity ties of the cluster based language model (CBLM) [7] and LBLM. Especially, We focus on process of obtaining neighboring pages. To extract a set of neighboring pages in CBLM, K-means clustering algorithm is used. K-means clustering algorithm require to calculate similarity of all pair of Web pages. Generally, time complexity of CBLM O(n2 ) is expressed where n is the number of Web pages. In contrast, LBLM can get neighboring pages collected by the hyperlinks in each document. In the test collection which we use experiment, an Web page includes one hyperlink on an average. If time complexity about similarity of pair of document in K-means clustering is the same as collecting the hyperlinks in a Web pages for LBLM, time complexity of LBLM is O(n). Consequently, LBLM is more efficient than CBLM from the standpoint of time complexity.
Calculating Query Likelihoods Based on Web Data Analysis Table 3 Comparing BaseLine and LBLM BL Rel. 12,544 Rel.Retr 5,275 Prec. P@10 0.7896 0.0 0.9153 0.1 0.3846 0.2 0.2703 0.3 0.2244 0.4 0.1917 0.5 0.1691 0.6 0.1513 0.7 0.1376 0.8 0.1265 0.9 0.1163 1.0 0.1082 MAP. 0.2541
LBLM Improvement(%) 12,544 5,580 5.78% 0.7979 0.9143 0.3990 0.2838 0.2383 0.2056 0.1809 0.1616 0.1455 0.1331 0.1228 0.1146 0.2636
1.06% * -0.11% 3.74% * 4.97% * 6.22% * 7.26% 6.97% 6.77% 5.75% * 5.21% * 5.57% 5.87% 3.72% *
715
Table 4 Comparing ST and LBLM ST Rel. 12,544 Rel.Retr 5,497 Prec. P@10 0.7646 0.0 0.8932 0.1 0.4046 0.2 0.2833 0.3 0.2357 0.4 0.2034 0.5 0.1786 0.6 0.1598 0.7 0.1442 0.8 0.1313 0.9 0.1209 1.0 0.1128 MAP. 0.2607
LBLM Improvement(%) 12,544 5,580 1.51% * 0.7979 0.9143 0.3990 0.2838 0.2383 0.2056 0.1809 0.1616 0.1455 0.1331 0.1228 0.1146 0.2636
4.36% 2.36% -1.39% 0.15% * 1.12% 1.08% * 1.28% * 1.09% * 0.95% * 1.35% 1.55% 1.53% * 1.10%
5 Conclusion In this paper, we have proposed a Link Based Language Model that can consider the likelihood of query keywords. Experimental results showed that LBLM could improve retrieval accuracies more significantly than the conventional approaches. For the future work, we have to propose a method for extracting effective hyperlinks. Our proposal is simple to extract hyperlinks, discriminating the inlink and the outlink. Therefore we obtain a set of neighboring Web pages using both effective the hyperlinks and not effective them, that is, we should choose effective them to obtain a set of neighboring Web pages. In order to extract effective the hyperlinks, we consider a similarity of pair of Web pages connected by hyperlink.
Acknowledgments A part of this work was supported by Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Scientific Research (C) (#21500284) and for Young Scientists (B) (#22700248).
References 1. Charniak, E.: Statistical Language Learning. The MIT Press, Cambridge (1996) 2. Clarke, C.L.A., Craswell, N., Soboroff, I.: Overview of the trec 2009 web track. In: Text Retrieval Conference (TREC) (2009) 3. Hollander, M., Wolfe, D.A.: Nonparametric Statistical Methods. Wiley Interscience, Hoboken (1999)
716
K. Tamura, K. Hatano, and H. Yadohisa
4. Jelinek, F., Mercer, R.L.: Interpolated estimation of markov source parameters from sparse data. In: Proceeding of the Workshop on Pattern Recognition in Practice, pp. 381– 397 (1980) 5. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: SODA 1998: Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 668–677. Society for Industrial and Applied Mathematics, Philadelphia (1998) 6. Lawrence, P., Sergey, B., Rajeev, M., Terry, W.: The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66, Stanford InfoLab (1999), http://ilpubs.stanford.edu:8090/422/ 7. Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 186–193. ACM, New York (2004), doi: http://doi.acm.org/10.1145/1008992.1009026 8. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM, New York (1998), doi: http://doi.acm.org/10.1145/290941.291008 9. Song, F., Croft, W.: A general language model for information retrieval. In: CIKM 1999: Proceedings of the Eighth International Conference on Information and Knowledge Management, pp. 316–321. ACM, New York (1999), doi: http://doi.acm.org/10.1145/319950.320022 10. Tamura, K., Hatano, K., Yadohisa, H.: Characterizing web pages based on the query likelihoods of neighboring pages. In: Proceedings of the 5th International Conference on Digital Information Management (ICDIM 2010), pp. 392–397 (2010)
Calculating Query Likelihoods Based on Web Data Analysis
717
Changed Portion We change the paper for satisfying the requirements of the reviewers as follows; Comment Page 1 (for the title):Please rewrite ”based” as ”Based”. Change We rewrite “based” as “Based”. Comment Page 1 (Abstract): You might be requested to write some keywords of your paper. Please prepare them, and add them after abstract if possible. Ex. Keywords: keyword1, keyword2, keyword3, ... Change We add some keywords. Comment Page 2 (quotation of other paper/book): Please rewrite ”...techniques [5,6] and ...” as ”...techniques [5][6] and ...”. (Please rewrite the following similarly) Change We rewrite “[5,6]” as “[5][6]”. Comment Page 8 (Section 4.4): Please rewrite ”... as parameters in equation ??” as”... as parameters in equation (5)”. (Please write the correct number of equation) Change We rewrite ”... as parameters in equation ??” as”... as parameters in equation (5)” Comment Page 9 (Section 4.5): What is O(n2) ? Is it O(n2 ) ? Please write the correct expression. Change We rewrite O(n2) as O(n2 ). Comment Check your table number and section number (especially section 4.3 and 4.4). Change We change a table number in Section 4.3 and 4.4. Comment In section 4.3, your parameter (0.4, 0.1, 0.5) is the best result. Is this choice based on only Rel. Rer or only MAP or both? Please write your reason. Change We write “As shown in Table 2, (λ1 , λ2 , λ3 ) = (0.4, 0.1, 0.5) is the best result in both MAP and Rel.Retl.” in Section 4.3 Comment If possible, compare content of neighboring Web pages selected by LBLM and another method. Change The Neighboring pages content of LBLM and ST is same. Therefore, we add the difference of between LBLM and BS, LBLM and ST in Section 4.4.
Calculating Similarities between Tree Data Based on Structural Analysis Kento Ikeda, Takashi Kobayashi, Kenji Hatano, and Daiji Fukagawa
Abstract. In recent years, a huge amount of data is generated every day. People usually extract useful information from the data to live conveniently. In order to extract such useful information using computers, the data becomes increasingly complex such as tree and graph structures. Therefore, it is important for us to calculate similarities between the structural data for searching useful information satisfied with a user’s information need. Incidentally, the tree structure data is comparatively simple in such structured data, so that the similarity has been evaluated by simple algorithm called the tree edit distance. The algorithm is an existing method for calculating structured similarities of the tree data, and analyses the structured difference by the edit operations. However, it cannot be said that we can consider characteristics of a tree structure using this method because it uses only the edit operations. For this reason, we propose a new method for calculating similarities of the tree data considering the edit operation as well as other features; e.g. depth of a tree and the number of nodes and so on. Using our method, we can coordinate the tree edit distance and Kento Ikeda Graduate School of Culture and Information Science, Doshisha University, 1–3 Tatara-Miyakodani, Kyotanabe, Kyoto 610–0394, Japan e-mail: [email protected] Takashi Kobayashi Graduate School of Information Science, Nagoya University, Furo, Chikusa, Nagoya, Aichi 464–8601, Japan e-mail: [email protected] Kenji Hatano Faculty of Culture and Information Science, Doshisha University, 1–3 Tatara-Miyakodani, Kyotanabe, Kyoto 610–0394, Japan e-mail: [email protected] Daiji Fukagawa Faculty of Culture and Information Science, Doshisha University, 1–3 Tatara-Miyakodani, Kyotanabe, Kyoto 610–0394, Japan e-mail: [email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 719–729. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
720
K. Ikeda et al.
the characteristics. As a result, our method helps to calculate the similarities exactly compared with the tree edit distance algorithm. Keywords: Tree structured data, Structural Analysis, Calculating similarity, XML.
1 Introduction As people are increasingly dependent on the Internet and computer technologies, a huge amount of data is generated every day. Simple data is usually expressed as a plain text; however, we cannot treat it well because of its enormousness. Using the tagged text, we can give meaning to the plain text; therefore it is said that the tagged text is more complex than the plain text. This is why the tagged text continues to increase. One of the tagged text is an XML document whose structure is a tree. The tree structured data is used in a many research fields such as computational biology [8], information extraction from Web pages [3], image analysis [10], pattern recognition [2], natural language processing [11], and many others. Therefore, if the tree structured data increases, the technique for searching similar data will be required. In order to measure the similarity between tree structured data, the tree edit distance algorithm has usually been used. However, this algorithm is insufficient to detect similar trees. It uses only the edit operations so that we cannot consider characteristics of a tree structure using this method. Of course a value calculated with edit operations is a characteristic of tree structure. However, we are insufficient only with this value to consider characteristics of a tree structure. As well as edit operations, a reason is that there is a way of a consideration of features such as a depth of a tree structured data or the number of nodes which a tree structured data has. Therefore we cannot completely consider features of a tree structured data only in the tree edit distance. In order to solve the problem, we expect that it is necessary to consider what type of the structure the data contents. The tree edit distance can calculate a similarity between tree structured data considering only three types of the following edit operations, i.e. node deletion, insertion and replacement; in other words, it cannot be considered an appearance having keywords or not in a tree structured data. For this reason, we propose a method for calculating similarities of tree structured data considering its characteristics related with both the tree edit operations and the types of an appearance having keywords or not in a tree structured data.
2 Related Work In this section, we describe some related studies of our method. In information theory, the edit distance between two objects is defined as the number of operations; i.e. replace, delete, insert, required to transform one of them into the other. In order to calculate the distance, some algorithms have already proposed.
Calculating Similarities between Tree Data Based on Structural Analysis
721
One of the most famous algorithm is the string edit distance [4] which is a metric for measuring the amount of the difference between two sequences. The string edit distance carries out the comparison of tests, so that it is used for running spellcheck feature. It is also applied to calculate similarities between RNA sequences in the research field of bioinformatics. Moreover, the tree edit distance [9] was also proposed for measuring the difference between two tree structures. The reason why the tree edit distance adopts the edit operation is that the idea of the string edit distance is intuitive and simple to understand. However, when the lengths of two sequences are much different, it is necessary to operate many insertion or deletion operations to transform one of them into the other. As a result, the edit distance becomes large necessarily. In the case of the tree edit distance, the same problem is occurred because the numbers of the two tree structured data are different even if the same kinds of nodes are contained in them. By the same token, the tree edit distance becomes large at two tree structured data if they contain a same part or have a containment relationship.
3 Proposed Method Considering Two Characteristics In this section, we explain our method for calculating similarities of tree structured data. The tree edit distance only focuses on the connection of nodes in the tree structured data and calculates its value as a similarity; however it can be said that it is considered only one characteristic that the tree structured data has. We think that we have to consider the characteristics extracted from both the comparative result of two tree structured data and a tree structured data itself to calculate their similarity. The tree edit distance is one of the former characteristics in our idea, so that we have to define the methods for measuring other characteristics. In the following subsections, we describe them in detail.
3.1 Characteristic of Comparison of Trees As we described above, a tree structured data has two characteristics. The first one is a comparative result of two tree structured data, and the other is a numeric value extracted from the tree structured data itself. At first, we explain the first characteristic. There is edit operations to use in the tree edit distance for an example a feature that we can acquire when we compare tree structured data. Furthermore, we use values that we can acquire only after we make a tree to compare a pair not to be able to acquire it from each tree like edit operations for features. We use it for the purpose of expressing a difference of a tree structure when we compare two trees. In those features, redundant features, which are linear relations, are included each other. Therefore, we remove those redundant features and finally find a candidate of features that is necessary for our method. For this purpose, we apply a regression analysis to pairs of features’ values. When a coefficient of determination calculated with a regression analysis is larger than 0.9, we can judge that those pairs have strong linear relations, and exclude one of the pairs from candidate features.
722
K. Ikeda et al.
As a result, we exclude a feature of a counterpart from candidate features. We may adopt a value of arbitrary one on this occasion. We do not have an influence on our method which value we adopt because features are linear relations. According to the above-mentioned procedure, we prepare for numerical formulas to evaluate degree of coincidence about characteristic of comparison of trees. We explain it with real data concretely in the following. We prepare for some XML data of different features to follow not only XML data according to one DTD but also different DTD to adopt quantity of features. In total 1,555 XML data which we use are data of 517 Java tutorials [6] according to a DTD of JX-model [5] and 920 data according to five kinds of DTD about an article for 1999 years in SIGMOD Record [1] and 48 data according to four kinds of DTD about an article for 2002 years. We calculate the number of nodes, the depth, the number of leaf nodes, the number of kinds of nodes and the number of kinds of edges. We define α and β as features that we can acquire by the comparison of trees. α is the value that attached great importance to having appearance or not of a node than the number of a node in edit operations. Therefore we define α as follows to lower a cost of the edit operation to a node to appear quite frequently. At first we calculate Ei which is an editing cost with the weight for each edit operation. Ei shows a cost of the i-th edit operation and increases i to the number of times of edit operation that is necessary for the tree edit distance. We apply Ei to the reciprocal of an appearance number of node which is became an object of an edit operation included in T1 and T2 in case of insertion or deletion. As a result, we calculate a value which is paid its attention to whether a node is appears or not rather than the difference of numbers that a node appears. There are a node before a replacement and a node after a replacement in the case of a replacement as an edit operation. Therefore, we apply Ei to the reciprocal of an average of an appearance number with two nodes. We use the following notation in this paper. A node is denoted by n, a tree is denoted by T , the number of n which T contains is denoted by Nt (T, n), the number of all nodes that T contains is denoted by N(T ), an insert operation of n is denoted by E{ε → n}, a delete operation of n is denoted by E{n → ε }, and a replacement operation from n to m is denoted by E{n → m}.
Ei =
⎧ 1 ⎪ ⎪ ⎪ ⎨ Nt (T1 , n) + Nt (T2 , n) ⎪ ⎪ ⎪ ⎩
(if E{ε → n} or E{n → ε }) (1)
2 (Nt(T1,n)+Nt(T2,n))+(Nt(T1,m)+Nt(T2,m))
(if E{n → m})
With Ei which we calculated in equation 1, we define α as equation 2. We divide it by average of the nodes’ number with two trees comparing a sum of Ei . Accordingly, we suppress an influence by the number of total nodes with tree. The value of α takes a value within α ≥ 0 and takes larger value as the number of times of necessary edit operation’s increases.
α=
2 Ei N(T1 ) + N(T2 ) ∑ i
(2)
Calculating Similarities between Tree Data Based on Structural Analysis
723
β is a value which means a similarity of sequences of sibling nodes in the same depth with tree to compare. We use the string edit distance for sequences of sibling nodes and calculate a distance to calculate whether sequences of sibling nodes in the same depth similar it and divide it by a large number of sibling nodes of a comparison objects. In addition, we repeat i of equation 3 to depth to min{depth(T1 ), depth(T2 )} that is two trees to compare are common, and to have. β takes a value within β ≥ 0. A value of the string edit distance of Str1 and Str2 is denoted by SED{Str1, Str2 }, a sequence of depth i’s sibling nodes at T is denoted by Strdepth=i (T ), and a number of depth i’s sibling node at T is denoted by Ndepth=i (T ). β =∑ i
SED{Strdepth=i(T1 ), Strdepth=i (T2 )} max{Ndepth=i (T1 ), Ndepth=i (T2 )}
(3)
A coefficient of determination with α and β by a regression analysis is a value that is smaller than 0.3 and it is confirmed that there is not a linear relations.
3.2 Characteristic Extracted from Tree The second characteristic of the tree structured data is a set of numeric values extracted from its statistics. The statistics which are usually used in recent studies are the number of nodes, the depth, the number of leaf nodes, the number of node types, and the number of edge types [7]. Therefore, we define these features as the second characteristic extracted from the tree structured data. Here, we explain how to handle above features with concrete descriptions using XML document collections which we use in Section 3.1. First, we gather five kinds of statistics described above, and apply to every conceivable combination of these statistics using a regression analysis called simple linear regression analysis. Only two pairs’ coefficients of determination of nodes and leaf nodes and the numbers of node types and edge types are larger than or equal to 0.9, so that we can say that these two pairs of statistics have a positive linear relationship, and one value of the pairs is defined as a feature of the tree structured data. As a result, we utilize the numbers of nodes and node types as the features. In addition, only the depth of the tree structured data is highly independent, so that it is also utilized as the feature. Here, we define the depth, the node, and the node type as the features of tree structured data, and call them γ , δ , and ζ , respectively. These features should be able to be calculated only using the statistics extracted from two tree structured data T1 and T2 , and should also express the difference between T1 and T2 . Fortunately, these features are numeric values, so that we can define the measure which is commonly utilized in each feature as follows: ⎧ (if f (T1 ) = 0 ∩ f (T2 ) = 0) ⎨0 min{ f (T1 ), f (T2 )} γ or δ or ζ = (4) (else) ⎩1− max{ f (T1 ), f (T2 )} (0 ≤ γ ≤ 1, 0 ≤ δ ≤ 1, 0 ≤ ζ ≤ 1)
724
K. Ikeda et al.
f (Ti ) means that it is defined as the depth of Ti when we calculate a feature related to γ , the number of leaf nodes of Ti in the case of δ , and the number of node types in the case of ζ . If these features are small, it can be said that T1 and T2 are similar.
3.3 Method for Calculating Similarities At the next step, we unify features provided in a procedure mentioned above as a method for calculating similarities of tree data structure. We sum up things which multiplied parameter w for each feature when we unify features. We define a dissimilarity of Equation 5 that we gave parameters for the quantity of each feature of α , β , γ , δ and ζ . When this value is small, we can judge that those trees are similar data. In addition, we can change the strength of features by the adjustment of parameters w. disSim(T1 , T2 ) = w1 α + w2 β + w3 γ + w4 δ + w5 ζ
(5)
(w1 + w2 + w3 + w4 + w5 = 1)
4 Experimental Evaluations In this section, we evaluate whether our method can calculate a similarity of the tree structured data using our features defined in Section 3. We conduct two experiments using several hierarchical clustering algorithms, because the tree structured data is classified clearly into some categories if the characteristics can express the tree structured data exactly. In the first experiment, we evaluate whether our method can classify XML documents in relation to their DTD or not. This is because XML documents have one feature defined by their DTD as the tree structured data. On the contrary, we also categorize source codes according to their program structures in an arbitrary manner, and confirm whether our characteristics precisely capture the features of their structures using our method.
4.1 Classification of XML Documents In the first experiment, we collect XML documents defined by some DTDs, and confirm whether our characteristics can classify them in relation to the DTDs. It can be said that our method can precisely express the features of the tree structured data if we can categorize them by each DTD. The collected XML documents are compiled by ACM SIGMOD, and their DTDs are defined for proceedings (ProceedingsPage.dtd), ordinary issues (OrdinaryIssuePage.dtd), and index terms (IndexTermPage.dtd). The number of XML documents following each DTD is 17, 51, and 920, respectively, so that we choose ten documents from each category in random order and classify total 30 documents using a hierarchical clustering called Ward’s method. For the sake of fairness, we conduct the cluster analysis ten times.
ProceedingsPage_1 ProceedingsPage_2 ProceedingsPage_10 ProceedingsPage_6 ProceedingsPage_9 ProceedingsPage_5 ProceedingsPage_7 ProceedingsPage_8 ProceedingsPage_3 ProceedingsPage_4 IndexTermsPage_6 IndexTermsPage_8 IndexTermsPage_1 IndexTermsPage_7 IndexTermsPage_10 IndexTermsPage_3 IndexTermsPage_5 IndexTermsPage_4 IndexTermsPage_2 IndexTermsPage_9 OrdinaryIssuePage_9 OrdinaryIssuePage_10 OrdinaryIssuePage_7 OrdinaryIssuePage_8 OrdinaryIssuePage_5 OrdinaryIssuePage_6 OrdinaryIssuePage_1 OrdinaryIssuePage_3 OrdinaryIssuePage_2 OrdinaryIssuePage_4
0
1
1
2
Height 3
4
5
6
In this case, we set parameters in equation 5 as w1 = w2 = w4 = w5 = 0.15, w3 = 0.40. ProceedingsPage_9
ProceedingsPage_6
ProceedingsPage_10
ProceedingsPage_7
ProceedingsPage_5
ProceedingsPage_8
ProceedingsPage_4
ProceedingsPage_3
ProceedingsPage_2
ProceedingsPage_1
OrdinaryIssuePage_9
OrdinaryIssuePage_6
OrdinaryIssuePage_5
OrdinaryIssuePage_8
OrdinaryIssuePage_7
OrdinaryIssuePage_10
IndexTermsPage_9
IndexTermsPage_2
IndexTermsPage_5
IndexTermsPage_3
IndexTermsPage_4
IndexTermsPage_10
IndexTermsPage_7
IndexTermsPage_1
IndexTermsPage_8
IndexTermsPage_6
OrdinaryIssuePage_1
OrdinaryIssuePage_4
OrdinaryIssuePage_2
OrdinaryIssuePage_3
0
2000
4000
Height 6000
8000
10000
12000
14000
Calculating Similarities between Tree Data Based on Structural Analysis 725
Fig. 1 Result of cluster analysis of a first experiment that we used a value of the tree edit distance
Fig. 2 Result of cluster analysis of a first experiment that we used a value of our method
Fig. 1 shows a result of a cluster analysis based on the similarities calculated by the tree edit distance algorithm. For obvious reasons in Fig. 1, XML documents cannot be classified into each category following their DTDs. We describe in detail the result of the cluster analysis that XML documents whose the number of nodes is small tend to be in the cluster on the left. This fact tells that the tree edit distance characterizes the tree structured data in the number of nodes. On the other hand, our method can characterizes the tree structured data precisely, because XML documents can be classified into three categories as indicated by Fig. 2 1 . We apply various values to parameters. As a result, we introduce a value
726
K. Ikeda et al.
got meaningful cluster structure that we can classify the tree structured data into some categories from DTDs by them. If we make w3 larger, a category composed of XML documents whose depth is large is formed. Therefore, it can be said that our method can precisely express features of the tree structured data if the parameters can be adjusted adequately. However, a result of a cluster analysis using our method may not be able to perform well if we set the parameter w3 = 1. This is because two of the three DTDs define the same depth of the tree structured data, so that our method cannot divide XML documents into two categories using γ only. As a result, it can be said that using features except γ can complement and help to precisely express the characteristics of the tree structured data.
4.2 Classification of Source Codes In the second experiment, we evaluate our method to identify a similar program structure in source codes. The program structure of a source code can be regarded as the tree structured data, so that we can calculate a similarity between two source codes in the same manner. Therefore, we also get a result of a cluster analysis, and can confirm whether our method can express the characteristics of the tree structured data exactly from the result as well as the classification of XML documents described in Section 4.1. In this experiment, we use not Java source codes directly, but the tree structured data translated from them using JX-model [5]. JX-model can convert a Java source code into its syntax tree formatted with XML. The syntax tree can express grammatical rules of a programming language, so that we can calculate a structural similarity between source codes using it. As for the syntax tree, the depth of a tree means that of nested structures, and tags and their structure are defined by the JX-model. The intended source codes in this experiment are shown in Fig. 3 - 8. These are Java source codes before converting it with JX-model. These source codes have two kinds of the structural regularity; one is the statement repetition like Fig. 6 - 8 (Pattern A), the other is to replace a simple statement with a control one like Fig. 3 - 5 (Pattern B). The difference between Fig. 7 and 8 or Fig. 4 and 5 is whether the for control statements are nested or not. As a result, we use the tree structured data translated from a total of six source codes for comparing results of a cluster analysis. If we can get a significant result of a cluster analysis, the result has two categories, because we give two kinds of the structural regularity to the intended source codes. Moreover, we get a result of a cluster analysis using the tree edit distance algorithm for comparison. In this experiments, we also use the Ward’s method as a hierarchical clustering algorithms. Using six tree structured data, we can get the results of a cluster analysis shown in Fig. 9 - 11. As Fig. 9 illustrates, if the number of a statement is large, such source codes are classified into the same category like Test2 2 and Test3 2. Using the tree edit distance algorithm, therefore, we can find that the number of nodes in the tree structured data has a direct effect on the result of a cluster analysis.
Calculating Similarities between Tree Data Based on Structural Analysis class Test2 { public static void main (String[] args) { int a = 0; int b = 0; int c = 0;
a = a + 10; b = b + 10; c = c + 10;
for a } for b } for c }
System.out.println(a); System.out.println(b); System.out.println(c);
System.out.println(a); System.out.println(b); System.out.println(c);
class Test1 { public static void main (String[] args) { int a = 0; int b = 0; int c = 0;
}
class Test3 { public static void main (String[] args) { int a = 0; int b = 0; int c = 0;
(int i = 0; i < 10; i++) { = a + i;
for (int i = 0; i < 10; i++) { a = a + i; for (int j = 0; j < 20; j++) { b = b + j; for (int k = 0; k < 30; k++) { c = c + k; } } }
(int j = 0; j < 20; j++) { = b + j; (int k = 0; k < 30; k++) { = c + k;
System.out.println(a); System.out.println(b); System.out.println(c);
}
}
}
}
Fig. 3 Test1.java
}
Fig. 4 Test2.java
Fig. 5 Test3.java
class Test2_2 { public static void main (String[] args) { int a = 0; int b = 0; int c = 0; int d = 0; int e = 0; int f = 0; for a } for b } for c } for d } for e } for f }
class Test1_2 { public static void main (String[] args) { int a = 0; int b = 0; int c = 0; int d = 0; int e = 0; int f = 0; a b c d e f
= = = = = =
a b c d e f
+ + + + + +
10; 10; 10; 10; 10; 10;
class Test3_2 { public static void main (String[] args) { int a = 0; int b = 0; int c = 0; int d = 0; int e = 0; int f = 0;
(int i = 0; i < 10; i++) { = a + i;
for (int i = 0; i < 10; i++) { a = a + i; for (int j = 0; j < 20; j++) { b = b + j; for (int k = 0; k < 30; k++) { c = c + k; for (int l = 0; l < 40; l++) { d = d + l; for (int m = 0; m < 50; m++) { e = e + m; for (int n = 0; n < 60; n++) { f = f + n; } } } } } }
(int j = 0; j < 20; j++) { = b + j; (int k = 0; k < 30; k++) { = c + k; (int l = 0; l < 40; l++) { = d + l; (int m = 0; m < 50; m++) { = e + m; (int n = 0; n < 60; n++) { = f + n;
System.out.println(a); System.out.println(b); System.out.println(c); System.out.println(d); System.out.println(e); System.out.println(f);
System.out.println(a); System.out.println(b); System.out.println(c); System.out.println(d); System.out.println(e); System.out.println(f);
System.out.println(a); System.out.println(b); System.out.println(c); System.out.println(d); System.out.println(e); System.out.println(f);
}
}
}
}
}
Fig. 6 Test1 2.java
}
Fig. 7 Test2 2.java
Fig. 8 Test3 2.java
Height 0
500
727
Height 1000
1500
2
Test2_2
Test2
Test3_2
Test1
Test2
Test3
Test3
Test2_2
Test1
Test1_2
Test1_2
Test3_2
3
4
5
6
Fig. 9 Result of cluster analysis of a second Fig. 10 Result of cluster analysis of a secexperiment that we used a value of the tree ond experiment that we used a value of our method (depend on pattern) edit distance
728
K. Ikeda et al. Height 0.0
0.1
0.2
0.3
0.4
0.5
Test3
Test3_2
Test1
Test1_2
Test2
Test2_2
Fig. 11 Result of cluster analysis of a second experiment that we used a value of our method (depend on the number of nodes)
On the other hand, we can get the meaningful results of a cluster analysis using our method like Fig. 10 and 11 if the parameters in equation 5 are handled properly2. As described in Section 4.1, we introduce values of parameters that can acquire meaningful cluster structure that we can classify the tree structured data into some categories from each point of view. Investigating all combinations of the parameters, we can find the following estimations: • We can distinguish the difference between the number of statements at each depth in the tree structured data if w2 is large. In short, it can be said that β is a feature related to the number of nodes at each depth. • We can distinguish the difference between the depth of the tree structured data if w3 is small. That is, it can be said that γ is a feature affiliated with the depth of the tree structured data. • We can distinguish the difference between the total number of statements in the tree structured data if w4 is small. It means that δ is a feature concerned with the number of leaf nodes in the tree structured data. In particular, it can be said that the feature defined by the tree edit distance algorithm is the same as δ . As we described above, we find that we can get many kinds of results using a cluster analysis by adjusting parameters related with the features of the tree structured data. This means that the similarities calculated by our method can be strongly affected by each feature of the tree structured data. In other words, our method is able to reflect each feature by adjusting the setting of parameters. One of the advantage using our method is to be able to reflect specific feature of the tree structured data to their similarity without any inhibition.
5 Conclusion In this paper, we proposed a method for calculating a similarity between tree structured data, and evaluated it by two kinds of experiments. As a result, we could 2
We set the parameters as w1 = w3 = w4 = w5 = 0, w2 = 1 for getting a result shown in Fig. 10, and as w1 = w4 = w5 = 0.125, w3 = 0.625, w2 = 0 for getting a result shown in Fig. 11.
Calculating Similarities between Tree Data Based on Structural Analysis
729
confirm that we could calculate a similarity of tree structured data precisely by adjusting the setting of parameters related to their features. Meanwhile , we have to automatically set the parameters. Using our method, we can classify the tree structured data into some categories from the various kinds of user’s point of view. However, the larger the number of parameters is, the more difficult it is to adjust them effectively.
Acknowledgment This study was supported by research grant #2144 from JGC-S SCHOLARSHIP FOUNDATION.
References 1. ACM SIGMOD: SIGMOD Record in XML, http://www.sigmod.org/publications/sigmod-record/ xml-edition/ 2. Ferraro, P., Godin, C.: A distance measure between plant architectures. Annals of Forest Science 57, 445–461 (2000) 3. Hogue, A., Karger, D.: Thresher: Automating the unwrapping of semantic content from the world wide web. In: Fourteenth International World Wide Web Conference, pp. 86– 95. ACM Press, New York (2005) 4. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10(8), 707–710 (1966) 5. Maruyama, K., Yamamoto, S.: A CASE tool platform using an XML representation of java source code. In: 4th IEEE International Workshop on Source Code Analysis and Manipulation (SCAM 2004), pp. 158–167 (2004) 6. Oracle: The Java Tutorials, http://java.sun.com/docs/books/tutorial/ 7. Park, Y., Cho, J., Cha, G., Scheuermann, P.: Efficient schemes of executing star operators in xpath query expressions. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 264–278. Springer, Heidelberg (2006) 8. Sakakibara, Y.: Pair hidden markov models on tree structures. Bioinfomatics 19, 232– 240 (2003) 9. Tai, K.-C.: The tree-to-tree correction problem. J. ACM 26, 422–433 (1979) 10. Torsello, A., Hancock, E.R.: Matching and embedding through edit-union of trees. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 822–836. Springer, Heidelberg (2002) 11. Vilares, M., Ribadas, F.J., Darriba, V.M.: Approximate VLDC pattern matching in shared-forest. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 483–494. Springer, Heidelberg (2001)
Continuous Auditing for Health Care Decision Support Systems Robert D. Kent, Atif Hasan Zahid, and Anne W. Snowdon
*
Abstract. Information technology has changed the way organizations conduct business and it has become imperative to make effective and timely decisions that can be verified. In particular, real time systems for health care need new techniques of continuous auditing to ensure they are being used and working properly. Traditional auditing methods carried out on long term schedules are incapable of supporting real time systems to make timely and informed decisions. We propose a service oriented approach, with associated mechanisms, for designing and implementing continuous auditing of data, access to systems, and transactions in order to assure correct and timely decision support. We survey recent research and practice and discuss work in progress towards embedding continuous auditing within a health care system. Keywords: Continuous auditing, Health care, Service oriented architecture, Real-time assurance, Decision support.
1 Introduction Healthcare is a highly complex and multidimensional system that has a social contract with societies to provide safe and effective healthcare to citizens in a timely manner. This social contract requires effective decision making based on strong, empirical evidence in order to achieve best practice quality of care that all patients desire and expect from their health care system. Effective decision making in healthcare also requires access to data, which when analyzed empirically, becomes valuable information about best practice, patient care. The difficulty of collecting and storing data, the distributed and ubiquitous nature of devices being used in healthcare sector and the requirements of timely and effective analysis and reporting provided us motivation to initiate a project for developing a real-time data management and decision support system in the areas Robert D. Kent · Atif Hasan Zahid School of Computer Science, University of Windsor, Windsor, Ontario, Canada N9B 3P4 *
Anne W. Snowdon Odette School of Business, University of Windsor, Windsor, Ontario, Canada N9B 3P4 J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 731–741. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
732
R.D. Kent, A.H. Zahid, and A.W. Snowdon
of injury prevention and patient safety (Kent et al, 2010; Kobti et al, 2011). A challenge that we face is achieving compliance with the requirements of ethics policy and legislative frameworks such as (FIPPA) in Canada and (HIPAA) in the United States. Systems must account for accessing data in real-time to support “on-the-spot” decision making as well. Adherence to policy to obtain assurance requires monitoring of data, transactions and usage of information systems that can only be achieved through continuous auditing. Normally, hospitals conduct "clinical audits" as part of their mandate. A clinical audit involves looking systematically at the procedures used for diagnosis, care and treatment. This also involves examining how associated resources are used and investigating the effect care has on the outcome and quality of life for the patient. Clinical audits are still conducted the traditional way. If the emerging concept of continuous controls monitoring is applied to clinical audits then it would bring measurable benefits to the health care industry. A definition of continuous auditing (CA), proposed by Canadian Institute of Chartered Accounts and American Institute of Certified Public Accountants, describes “…a methodology that enables independent auditors to provide written assurance on a subject matter using a series of auditors reports issued simultaneously with, or a short time after, the occurrence of events underlying the subject matter.” It is evident from this definition that traditional auditing methods cannot keep pace with electronic systems in identifying anomalies. Section 2 provides a review of continuous auditing models. Section 3 discusses implementation approaches. Section 4 presents components and techniques of continuous auditing systems. Section 5 states some of the challenges to continuous auditing. In section 6 we discuss our work in progress with an architectural and workflow model, and examples to clarify the approach. In section 7 we conclude with a brief listing of challenges in health care information technologies.
2 Brief Review of Continuous Auditing Models A generic model of continuous process auditing systems (CPAS) consists of three main entities (Hasan et al, 2005): an auditee that carries out activities based on specifications to achieve a goal; an accountant who observes these activities and records them as facts; and, an auditor who conducts the audit by looking at the specification and facts thereby detecting violations. Different units are combined together to form this generic model. These units include: Auditing, which implements the auditing algorithm of the application; Control, which defines the activities to be carried out; Execution that executes the activities; Accounting, which provides the facts about the activities executed; Report Handling, that generates reports about the activities performed; and, Policy Definition, which defines policies dictating how activities are performed. This generic model uses policy based approaches to configure units and control their behaviour. This gives the benefit of a modular structure where decision making is separated from execution. This works well when policies and procedures are well defined and it provides a step towards a generic model that is applicable to all parts of an auditing system. In health care, it is a challenge to
Continuous Auditing for Health Care Decision Support Systems
733
perform crisp separation of decision making and execution although policies and procedures are generally very well defined. This is largely due to the complex interactions, between groups of practitioners and patients, which occur in parallel and include flexible schedules and dynamic reassignment of priorities in delivery of care. Thus, it is vital to work towards developing a suitable definition of continuous process audit in the context of health care. Several conceptual models have been suggested, all of which use CPAS as their base model. Below, we review three proposed models.
2.1 Continuous Auditing: Building Automated Auditing This model, due to (Rezaee et al, 2002), has the capacity to run on a distributed client/server network and provides the functionality of transmitting data to audit workstations via the web. In this model, data is collected from transactional systems. Using ETL process (extract, transform and load), it is loaded into a data warehouse where standardized tests are run continuously or periodically.
2.2 Towards a Paradigm for Continuous Auditing Keystroke level monitoring was proposed (Onions, 2003) to monitor the integrity of data in this model. This model provides dual protections where each transaction is audited individually and as a whole over a period of time searching for possible patterns for fraud. In this model data is mapped to XCAL (eXtensible Continuous Auditing Language) schema where real time CAATT (Computer Assisted Audit Tools and Techniques) processing is used to check transactions/key strokes and expert systems are used for finding patterns leading to fraud.
2.3 Continuous Auditing within a Debt Covenant Domain This model, due to (Woodroof et al, 2001) is limited in scope as it only applies to a debt covenant system and introduces the notion of on-demand-reporting. In this model agents within client system monitor transactions after a request for report is initiated. Digital agent on auditor systems requests digital agent on client system for real time balances from accounts and trims unnecessary information. An evergreen report is generated based on results. These models use different techniques to provide a near-real-time solution. All models check transactions to make sure they are accurate and have reporting mechanisms to report anomalies. A major problem faced is the variety of data formats that can be present in the systems; this increases the complexity and cost of creating standardized tests, and it also slows down the whole process as data must be converted to standard formats before performing the tests.
3 Implementation Approaches and Problems Several different approaches have been considered by different researchers to implement the notion of continuous auditing. They have proposed different
734
R.D. Kent, A.H. Zahid, and A.W. Snowdon
architectures to create their respective systems with the capability to perform continuous auditing. Three of the most popular implementations are discussed below.
3.1 Service Oriented Architecture (SOA) Perhaps the most straightforward design platform used to implement continuous auditing is service oriented architecture. SOA revolves around three main entities: service provider, service requestor and service registry (Papazoglou, 2003). The resolution of problems in retrieving, converting and translating data from different sources is largely resolved through granularity of services. One such architecture is Collaborative Continuous Auditing Model (CCAM) that utilizes a combination of XML web services and SOA to support continuous auditing (Chen, 2007). The use of XML schema in this model is one of the advantages as it becomes easy to transform data from any format. Another advantage is that the data remains within the custody of the client so security issues are resolved to a large extent.
3.2 Enterprise Service Bus Model Another model that is based on the concept of services employs enterprise service bus (ESB) model (Ye et al, 2008). In this model, third parties and client register and submit an application to the auditing system before transaction takes place. During execution auditing system extracts information using ESB and intelligent agents make sure that transactions are taking place as per the contract. A report is generated and displayed to the client in the end. The advantages of this model include the separation of user interface, use of standard technologies and use of mirror method to save client system in shadow subsystem. The disadvantages include processing time, cost and trust issues between client and auditing systems.
3.3 Agent-Based Architecture (Wu et al, 2008) proposed an agent-based architecture for collaborative continuous auditing. This model consists of two main entities: auditor site and auditee site. Auditor site is the master site whose duties include planning and implementing an audit service using different agents. The auditee site consists of various agents who implement the audit service. Respective agents match their results to make sure no errors or exceptions are performed at the auditee site and take corrective measures if required. Limitations of this approach derive from the use of the JAVA language and processing framework, a requirement for using and deploying thirdparty software on the client system, and implementation of the framework on distributed client systems within different domains.
4 Components and Techniques of Continuous Auditing Continuous auditing requires performing control and risk assessment on a regular basis, using two main components: continuous data assurance; and, continuous control monitoring.
Continuous Auditing for Health Care Decision Support Systems
735
Continuous data assurance deals with the auditing of the data itself. This means that the data itself is under investigation for auditing. For example, in a financial company, continuous data assurance assures that the financial information is correct. This category of continuous auditing deals with the data part of the system and makes sure that it is correct and without any errors/fraud. This implies looking at each transaction to make sure it complies with all of the controls that are in place. This process is performed at all times and for each and all transactions. Every transaction that the system performs is checked to make sure that it is working as specified and all the data matches the expected results. Continuous control monitoring assures that the control mechanisms within the system are working correctly by checking the settings in the system. It compares them with a given model and ensures that the system settings are working as prescribed. By measuring specific attribute parameters, failures in auditing logic can trigger auditor-initiated actions. The nature of these actions may vary according to the risk, or anomaly, identified. Hence, the main objective of continuous control monitoring is to focus on the effectiveness of the control itself. In health care we hope to determine and classify the nature of controls and kinds of monitors that serve continuous auditing requirements. Different techniques have been used to achieve the goals of continuous auditing. Some common techniques (Vasarhelyi, 1991; 2004) (Murthy, 2004) (Kuhn, 2010) that have been utilized include: Embedded Audit Modules (EAM); Integrated Test Facility (ITF); General Audit Software (GAS); EAM Ghosting; and, Monitoring Control Layer (MCL). EAMs are specialized programming modules inserted in the client system to achieve continuous auditing. ITF involves creating fictitious entities in the database to process test transactions simultaneously with live input. These test transactions are then incorporated in the production system along with normal transactions. GAS involves using specialized software to assist in the auditing of the system. Typically, EAMs and ITFs involve modifications to the client system. Hence, these are not only expensive to achieve but also require trust by clients as it involves modification to client systems. EAMs also slow down system response as the audit module is checking and validating the live system. GAS is typically preferred by audit firms as it allows them to achieve goals without interfering with the client system. However, GAS is based on periodical auditing process model (PAPM) which means that it is not suitable for real time auditing and reporting. A widely used continuous auditing software in business is Audit Command Language (ACL), which performs data extraction and analysis for customized solutions to different organizations. Extending ACL to incorporate health care requirements is certainly feasible. EAM Ghosting provides the benefits of EAM with the advantage that audit functionality is implemented, operated and maintained outside the production system of the client. This segregation can be achieved by two methods. In first method, the production system (PRD) is separated from the quality assurance system (QAS) by creating partition on the server. QAS is a mirror copy of the production system but is used to house the EAM module to complete the continuous auditing process. In the second method, virtualization techniques are
736
R.D. Kent, A.H. Zahid, and A.W. Snowdon
used again to mirror the PRD system and the EAM module is embedded in the QAS server, which is hosted on the virtual server. Virtualization is preferred among these two methods as it requires less physical hardware space and less memory to operate. Hence EAM ghosting retains the integrity of the system and reduces the cost of the system as well. One concern, however, is the existence of non-native code (EAM in the ghost system) affecting the transactions in the ghost system to an extent where the system trudges along and possibly fails. Monitoring Control Layer (MCL) creates a bridge between the auditing system and the client system. It consists of a middleware layer which binds the auditing system with the client system. MCL not only captures and filters data, it also stores it and performs analytical actions to send alarms and create reports accordingly. Advantages of using MCL include the separation of auditing functionality from the client system. It is easy to implement even if the client system is distributed on different platforms. Hence, it provides client independence as well as system design and maintenance freedom.
5 Challenges of Continuous Auditing Continuous auditing is expected to provide multiple benefits which include real time monitoring and real time reporting of problems, tracking high risk areas and measuring variability of the system under different constraints. In addition, CA can provide: assurance over the integrity of health care activities; confidence that services are being executed as designed; freedom from doubts over concerns and risks; assurance that internal controls are adequate to mitigate risks; guarantees that governance processes are effective and efficient; assurance that health goals are met properly; automated alarms and triggers for anomalies and errors; help in creating design control tests; and, fraud and negligence prevention. Finally, CA’s are useful for identifying: areas, activities or risks for which no controls were designed; a suite of controls explicitly required to mitigate specific risks; and, thresholds for tolerance levels. Some of the challenges to CA implementation are described by (Huanzhuo et al, 2008) and (Kuhn et al, 2010). They include: accuracy; real-time comprehension; flexibility; system design and maintenance; client independence; legal liability; and, impediments to people, process or system implementation and usage. Wangler et al (Wangler, 2003) have proposed an architecture for collecting and storing health care process information. By contrast, Chou et al (Chou, 2007) have studied aspects of continuous auditing within a multi-agent system framework. Both works reflect the infancy of this research area as well as the differences in perspectives in approaching this problem domain.
6 Work in Progress Our group is investigating the suitability of a modified SOA model with semantic extension, using the Resource Description Framework Schema (RDFS) and Ontology Language for the Web (OWL) (Allemang et al, 2008). RDFS enables
Continuous Auditing for Health Care Decision Support Systems
737
the inclusion of rich semantics that may provide for definitions of policies and practices associated with specific services. As patterns of auditing logic are identified and classified, semantics will be necessary to express what is to be monitored and what actions are required and feasible. In particular, audit as a service (AAAS) may be achievable through virtualization. Figure 1 provides a high-level abstraction of our approach to continuous auditing using SOA. Various actors interact with the health care system by requesting services. The list of services must be identified and the design of services is vital for effective implementation and maintenance. It is during this design phase, however, that consideration of the auditing requirements should lead to a clearly defined implementation strategy. Thus, we propose that each service have an associated monitoring (MCL) component and an audit action component that is triggered on a failure of an audit test. Monitored services that pass the monitored audit analysis are enabled. The audit action, however, must be specified for each service. Each of the transaction types can be carried out using a bus-based approach with a service requires bus (SBUS), monitoring bus (MBUS), and audit action bus (ABUS).
Fig. 1 Continuous Auditing Framework for Distributed Service Provision.
The ability to compose services into sets of possibly interdependent services is important in health care practice. Hence, it is likely rare that a specific single monolithic service will be requested; rather, a set of services is combined and must be orchestrated carefully, taking into account dependencies and liabilities of each separate service. In such cases, the audit of the full set of services consists of
738
R.D. Kent, A.H. Zahid, and A.W. Snowdon
both the audit trail associated with each service, as well as an accumulative audit on the set itself. Security and independence of the auditing, or quality assurance, system (QAS) from the client production system (PRD) may be obtained by combining the SOA approach with audit modules embedded within a mirror system, hence EAM Ghosting. This approach is shown in Figure 2. The results of audit tests are stored in a secure document, called a sticky log (SLK), proposed by (Ringelstein, 2010). The benefit of these logs is that they pertain to each separate service audit and they may be systematically appended to form a growing document that represents the complete audit trail at any moment during the execution of the service set. Each sticky log may contain basic audit parameters and also rich metadata that provided interpretive information. The use of ghosting ensures a virtual environment to collect logs from possibly independent, and distributed, sub-systems working collaboratively.
Fig. 2 Audit workflow with MCL and a QAS/EAM ghost system using sticky logs.
A use case example could be a doctor who prescribes a certain drug to a patient while asking questions about allergies or stating side effects. If the doctor forgets to specify a specific side effect (or patient forgets to mention a certain allergy), this can have dire consequences. The CA system can solve such problems by looking at the metadata and comparing with the actual procedure proposed, or completed, to determine that established practice was not followed. This can also remove the redundancy problem as allergies might have been asked and documented in a different department or account for new knowledge recently added to the knowledge base. At the same time this would help in finding the missing information that was missed in the first step. More generally, an important set of cases to consider involve delayed response on delivering of service due to a supply-chain of actors, or processes, that are part of a service composition. The composed service may be time critical, or tolerant of delays (i.e. evaluated in a lazy manner). As a second example, consider the problem of patient falls in hospitals. While in care, an evolving history of the patient is documented, from admission, to the
Continuous Auditing for Health Care Decision Support Systems
739
administration of care and recording of falls events. In addition, other data describing the background world knowledge also exists. Such data may reflect aspects of the hospital environment, staffing, medication, meals and a host of other variables of a local nature, but also extending to include remote access to research and product databases, thereby defining world knowledge. From a service oriented perspective, a continuous audit of a specific patient fall event can be modeled with an audit rule set that requires comparison of the event with a set of exemplars. The patient data is, in principle, the complete set of documents that define their history, and the exemplars are obtained from the world knowledge through analytical models obtained from population research and data mining. At the least, it is feasible to generate a detailed report of all aspects of the patient history, based on the set of all services rendered. Using more sophisticated analysis, the auditing may identify a recurring pattern that every time a particular patient is given a certain medication, the patient falls within 30 minutes. The specific service that is called whenever a patient fall is reported would have an associated audit rule that would check for specific conditions, themselves identified through research to provide an evidence based approach. Supporting the findings of the report would necessarily state the assumptions and limitations of the audit rules employed. Issuing a report in real time that profiles the patient and their condition against their environment, and presents opinions about possible causative factors, is one goal of our research. It is intriguing to consider more elaborate problems. To extend the previous example of patient falls, assume that the medication in question was a narcotic. Protocols require that controlled substances must be counted by two people who sign off on the correct count. Yet, narcotics often get stolen, pocketed, mishandled, and so on. Thus, using targeted auditing, it may be possible to identify the trending in strange counts to identify and associate risk, which may be due to individual error, or mislabelling of the dosage on the drug bottle. Further, as a byproduct of the audit on the patient falling that identified the narcotic, could that itself lead to identifying the need to analyze the funny count to probe for deeper insight as to the real risk? The generation of audit rules, or policies, and actions with rich semantic expressiveness must be accompanied by well-defined logic. Implementation of higher-order audit logic will be based increasingly on statistical approaches or emerging techniques from artificial intelligence, pattern analysis and other research areas. Although our proposed approach may help to identify points of failure and liability in delivery of health care, it is considerably more relevant to identify weaknesses in practice and policy that may be fixed with consequent measurable increases in patient safety (Bates, 2003).
7 Conclusion and Future Research The Health Care industry is one of the possible areas where this concept has not been reached due to reliance on non-financial quantitative and qualitative information. Typically, hospitals that have applied continuous auditing have done
740
R.D. Kent, A.H. Zahid, and A.W. Snowdon
so in their payroll and billing administration departments. But other possible areas are still missing this emerging technology. Hospital Corporation of America (HCA Inc.), a USA based major hospital company, determined seven essential steps required to create a successful CA environment. These include determining the type of tests to be performed; selecting testing methods; identifying testing criteria; automating tests; communicating test results securely; obtaining feedback; and, tracking progress. Real time monitoring control systems could be placed in various stages of diagnoses and treatments that could become part of hospital mandates and form an integral component of internal controls. The set of standards or mandates should focus on control effectiveness and ensure the set of procedures or prerequisites are met before moving to the next stage. This could ensure effectiveness in every stage and raise the confidence level all stakeholders. If a procedure is missed, or performed in error, then automatic triggers could signal anomalies and errors, further initiating actions to ameliorate the situation. In addition, to make this system more applicable would require frequent revision of pre-requisites and standards. A team has to be assigned with the tasks of revising each department’s mandates in terms of adherence. Sets of procedures can be put into the system and adhered to in order to control errors before serious damage is done. In health care, planning, timing and accuracy are critical.
Acknowledgments RDK and AWS acknowledge support from AUTO21 and CIHR.
References [1] ACL. Audit Command Language, http://www.acl.com [2] Allemang, D., Hendler, J.: Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL. Morgan Kaufmann, San Francisco (2008) [3] Bates, D.W., Gawande, A.A.: Patient safety: Improving safety with information technology. New England Journal of Medicine 348(25), 2526–2534 (2003) [4] Chou, C.L.-y., Du, T., Lai, V.S.: Continuous auditing with a multi-agent system. Decision Support Systems 42, 2274–2292 (2007) [5] FIPPA: Freedom of Information and Protection of Privacy Act (FIPPA), http://www.gov.mb.ca/chc/fippa/ [6] Flowerday, S., Blundella, A.W., Von Solms, R.: Continuous auditing technologies and models: A discussion. Computers & Security 25(5), 325–331 (2006) [7] Hasan, Stiller, B.: A generic model and architecture for automated auditing. In: Schönwälder, J., Serrat, J. (eds.) DSOM 2005. LNCS, vol. 3775, pp. 121–132. Springer, Heidelberg (2005) [8] HCA Inc., Hospital Corporation of America, http://www.hcahealthcare.com [9] HIPAA: Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy and Security Rules, http://www.hhs.gov/ocr/privacy/
Continuous Auditing for Health Care Decision Support Systems
741
[10] Kent, R.D., Kobti, Z., Snowdon, A., Aggarwal, A.: Towards a Unified Data Management and Decision Support System for Health Care. In: The 3rd International Symposium on Intelligent and Interactive Multimedia: Systems and Services (KESIIMSS 2010), Baltimore, USA, July 28-30. Smart Innovation, Systems and Technologies, vol. 6, pp. 205–220 (2010) [11] Kobti, Z., Snowdon, A.W., Kent, R.D., Bhandari, G., Rahaman, S.F., Preney, P.D., Kolga, C.A., Tiessen, B., Zhu, L.: Towards a “Just-in-Time” Distributed Decision Support System in Health Care Research. In: Annals of Information Systems: Supporting Real Time Decision-Making, vol. 13, Part 3, pp. 253–285 (2011) [12] Kuhn Jr., J.R., Sutton, S.G.: Continuous Auditing in ERP System Environments: The Current State and Future Directions. Journal of Information Systems 24(1), 91–112 (2010) [13] Murthy, U.S., Groomer, S.M.: A Continuous Auditing Web Services Model for XML-based Accounting Systems. International Journal of Accounting Information Systems 5(2), 139–163 (2004) [14] Onions, R.L.: Towards a paradigm for continuous auditing (2003), http://www.auditsoftware.net/community/how/run/tools/ Towards%20a%20Paradigm%20for%20continuous%20Auditin1.doc [15] Rezaee, Z., Sharbatoghlie, A., Elam, R., McMickle, P.: Continuous auditing: building automated auditing capability. Auditing: A Journal of Practice & Theory 21(1), 147– 163 (2002) [16] Papazoglou, M.P.: Service Oriented Computing: Concepts, Characteristics and Directions. In: Proceedings of the Fourth International Conference on Web Information Systems Engineering, WISE 2003 (2003) [17] Ringelstein, C., Staab, S.: DIALOG: A Distributed Model for Capturing Provenance and Auditing Information. Intern. Journal of Web Services Research 7(2) (2010) [18] Chen, R.-S., Sun, C.-M.: A Collaborative Continuous Auditing Model under ServiceOriented Architecture Environments. In: 6th WSEAS International Conference on EActivities, Tenerife Spain, December 14-16 (2007) [19] Sarbanes-Oxley Act 2002, http://frwebgate.access.gpo.gov/cgi-bin/ getdoc.cgi?dbname=107_cong_bills&docid=f:h3763enr.tst.pdf [20] Searcy, D., Woodroof, J., Behn, B.: Continuous Audit: The Motivations, Benefits, Problems, and Challenges Identified by Partners of a BIG 4 Accounting Firm. In: Proceedings of the 36th Hawaii International Conference on System Sciences, January 2003, pp. 10–20 (2003) [21] Vasarhelyi, M.A., Alles, M., Kogan, A.: Principles of analytic monitoring for continuous assurance. Journal of Emerging Technologies in Accounting 1, 1–21 (2004) [22] Woodroof, J., Searcy, D.: Continuous audit: model development and implementation within a debt covenant compliance domain. International Journal of Accounting Information Systems 2, 169–191 (2001) [23] Wangler, B., Åhlfeldt, R.-M., Perjons, E.: Process oriented information systems architectures in healthcare. Health Informatics Journal 9(4), 253–265 (2003) [24] Wu, C.-H., Shao, Y.E., Ho, B.-Y., Chang, T.-Y.: On an Agent-based Architecture for Collaborative Continuous Auditing. In: 12th International Conference on Computer Supported Cooperative Work in Design, April 16-18, pp. 355–360 (2008) [25] Ye, H., Chen, S., Gao, F.: On Application of SOA to Continuous Auditing. WSEAS Transactions on Computers 7(5), 532–541 (2008)
Design and Implementation of a Primary Health Care Services Navigational System Architecture Robert D. Kent, Paul D. Preney, Anne W. Snowdon, Farhan Sajjad, Gokul Bhandari, Jason McCarrell, Tom McDonald, and Ziad Kobti *
Abstract. We report on development of a primary health care data navigation decision support system. The system was designed around four primary services, including data entry, query support, gap analysis and report generation. The context for development required a centralized approach to deal with security issues of access, data integrity and privacy and the PHC DSS was intended for use by a restricted group of policy analysts and decision makers. We focus primarily on the abstractions to data entry, query support, and the approach to interface design and functionality. Keywords: Decision support systems, Primary health care, Health Informatics, Semantics, Query support.
1 Introduction Our research group is focused on design, development and validation of decision support systems in primary health care services in Canada. Primary care (PC) services play important role in the overall health of a population. For health systems, primary care services are the first-contact points for accessing health services. Primary care also ensures continuity of health care services so that Canadians receive integrated and coordinated health care that optimizes health and quality of life. Primary care services include health promotion, risk reduction, illness management and injury prevention. Primary health care strives to achieve a major objective of health care systems which is to maximize the efficient use of health care resources through seamless integration with other services and sectors, such as secondary and tertiary health care services, Therefore, the design of PHC services requires multidisciplinary and collaborative decision making to ensure the right services are available to health consumers when and where such services are Robert D. Kent · Paul D. Preney · Farhan Sajjad · Jason McCarrell · Tom McDonald · Ziad Kobti School of Computer Science *
Anne W. Snowdon · Gokul Bhandari Odette School of Business, University of Windsor, Windsor, Ontario, Canada N9B 3P4 J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 743–752. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
744
R.D. Kent et al.
needed. The challenge is to ensure decisions are based on accurate and comprehensive evidence of population level primary health needs and service delivery outcomes. . Primary care health care data is currently not electronic in Canada. Thus, the difficulty of collecting and storing data, the distributed and ubiquitous nature of data used in the primary health care sector, and the requirements of timely and effective analyses and reporting provided the motivation to initiate a project for developing a real-time data management and decision support system in this area. The overarching goal of this project is to develop a Unified Data Management and Decision Support System (UDMDSS) (Tiessen et al 2008; Kent, et al 2010; Kobti, et al 2010) that could achieve a simple and cost effective method of collecting and analyzing available primary health care data, using advances in information system technology, modelling and simulation. An important question for effective delivery of primary health care (PHC) services is whether adequate resources are provided for client communities to support and improve health and wellness. Such decision making clearly involves consideration of many factors, including budgets, physical proximity of services, health professional capacity, and population health needs, but the analysis leading to a decision is predicated on the availability of fundamental and accurate data sets. Typically, within regionally structured health networks, PHC is represented by a diverse and geographically distributed collaborative group of autonomous agencies. Electronic files created and stored by each agency may be publicly available or restricted by privacy regulations, and may reflect different data organization and formatting. A challenge in designing data model and query capabilities to support decision making is to provide an abstract basis and methodology to deal with system design and implementation. In the context of PHC, abstractions must be flexible and be able to handle manual and file based inputs, data cleaning, database implementation, distributed datasets, remote access, security, query, data mining, and many other issues. In our system development work, compliance with open source tools and protocols have been ensured to achieve broad based applicability of our approach. Recently our group was involved in design and implementation of a PHC DSS for the Erie St. Clair Local Health Integration Network (ESC LHIN) in Ontario, Canada. There were more than ten different data providers involved. In this paper we present an abstract design for handling two-dimensional tabular input in a PHC context, mapping the input to a database format suitable for immediate query, a query support engine, and user interface for decision makers. In section 2, we discuss the nature of the PHC data and a framework for design. In section 3 we present our design approach based on metadata and meta-process specifications, selection of tools and identification of services. Section 4 summarizes the overall PHC DSS model with focus on the query support and relationship to the metadata framework. Section 5 concludes with a short discussion of the role of semantics and reasoning in modeling with health care data for decision support as part of our current and future research work.
Design and Implementation of a PHC Services Navigational System Architecture
745
2 Primary Health Care Data and Abstract Design Design of a primary health care (PHC) data navigation decision support system (DSS) must account for the scope of the system, data entry and specification, update and maintenance of databases, support for queries and generation of meaningful query reports. The nature of PHC agency networks is complex and dealing with the many issues of heterogeneity suggests using an approach based on health grids (Healthgrid). Initial discussions with Erie St. Clair Local Health Integration Network (ESC LHIN) identified various limitations on such a broad based scope and a limited core functionality was established, suitable for their needs. The proposed PHC DSS would be a web-based system accessible from anywhere, anytime in a secured manner. The DSS design would be based on principles that constitute best practices in primary health care and would have four major service modules: Data Entry, Query, Gap Analysis, & Reporting. The anticipated benefits to be derived included an integrated system, whereby the DSS would integrate data from various PHC service providers located in the ESC LHIN, thereby providing a complete view of this service domain. Analyses can be completed quickly and easily across a number of datasets available to LHIN decision makers. A second benefit is the ability to analyze and evaluate current services and potential gaps in service. Thus, the DSS would assist decision makers to understand and examine the effectiveness of existing health services and identify specific service gaps in this sector. A third set of benefits derive from having a secured, modular, and scalable system. The DSS is developed using a service-oriented approach and web technologies coupled with secured communication protocols thereby making it possible to scale up the existing system in future, if necessary, and allowing the system to be extended and further tailored to the needs of the LHIN. The main list of high-level requirements included the following. i.
ii.
iii.
Data Entry: This module of the DSS would enable the user to enter data about primary care services (who, when, what, where, how) in an ongoing basis. The system could also import or export the services data using standard formats (e.g. CSV text files) Query: The query module of the DSS would allow the user to ask a variety of meaningful questions. There would be two types of queries: standard, and ad hoc. Standard queries would be generated in a routine (periodic) manner and would incorporate standard features. Ad hoc queries, on the other hand, could be developed spontaneously to meet specific requirements for a given situation. Some sample queries that could be generated from the proposed DSS are as follows: a.Provide a list of all providers and their services grouped by the cities they are located in. b. For a given geographical location (identified by postal code, city, or county), list all available PHC services grouped by their types. c.Or any list that could be meaningful to the decision makers. Gap analysis: This module of the DSS would allow the user to dynamically define the meaning of service gap for a location. For example, if a
746
iv.
R.D. Kent et al.
geographical region with a current population of 10,000 has fewer than x number of a particular PHC service, then that could be considered as a service gap. It would also be possible to quantify the level of service gap so that meaningful comparisons can be made across various geographical locations. This module could also be equipped with regressions, trend analysis and forecasting features provided that relevant data (such as projected population growth and PHC service needs) are available. Reporting: This module would present the results generated by the Query and the Gap Analysis modules in user-friendly formats such as pdf files which could then be saved in the database as archives or emailed to appropriate parties.
Fig. 1 Data and Query system architecture for PHC DSS
The core architecture and services to be provided limited the scope to a centralized data repository and analysis point with secure web connectivity. Data is prepared by each contributing agency based on security and privacy concerns, but following general established practices for data typing and labelling. In most cases, data sets were prepared using standard office software tools or paper based forms. All data was provided to our team securely and the entire software system and populated database was installed on the secure premises of an ESC LHIN data handling firm. All aspects of secure communication with our system were also handled by this firm and the number of users restricted to those with designated responsibilities. Query and reporting functions were required to be supported by a user interface provided through a secure web browser session.
Design and Implementation of a PHC Services Navigational System Architecture
747
Our data and query system architectural approach is shown in Fig. 1. Agency datasets (DSk) are provided using a secure interface to a data entry service. Data is appropriately marked up and identified with suitable semantics to populate the database and ontology, thereby providing indexing to data. Query services are supplied through either a form based interface, or using a generalized, simple query logic interface. Results of queries may be exported to files or to customized reports.
3 PHC Data Navigation System Architecture We determined at an early stage that all agency datasets (DSk) provided were in the form of spreadsheets; that is, two-dimensional tables referenced by row or column and whose cell entries were ill-specified in that they might be text or numeric values. The lack of specification within the data sets required cleaning of data in consultation with agency data providers. To remove the necessity for broad based file interoperability we opted for straightforward text based, comma separated value (CSV) file formats. We need to extract information, or data mine, from within those data sets by combining them and enabling the querying of such. Some queries will result in text representations, while simple graphics, such as frequency histograms, are rendered for limited kinds of queries. Data cleaning is essential and the need for cleaning arises in many forms. Most data is used in spreadsheet form, or, exchanged in that manner. Field formats are human-created and maintained and therefore vary in consistency and interpretation of formatting. For example, a text cell might contain a variable number of subfields with no delimiters, relying on general human knowledge of the background of such material to parse the subfields. Agency descriptions involve hours of operation and other kinds of time or data information, all of which must be properly reconciled to a common format. We developed automatic data cleaning at the level of column field types by using regular expression (regex) parsing to determine the consistency and type of data; this is straightforward for strictly numeric data and certain encoded patterns (e.g. dates, postal codes). However, it is still required to confirm data type assignments with data providers to ensure semantic integrity of data interpretation, and it is still the case that much data must be cleaned “by hand”, especially in cases of missing information. Data sets must all be described by agency, date and time, and by column headings. Often PHC practitioners develop their own standard acronyms so standards for clinical terminology, such as HL7, are not applicable or available. Thus, establishing effective column heading descriptors is vital for supporting query capabilities. This is a critical area where semantic richness is needed; in our current approach we have focused on the column label (which may be cryptic), a short description and a long description describing the field (along with assumptions and constraints) from the perspective of a practitioner or decision maker. The process of obtaining and inserting data into the database requires determining the number of rows and columns in tables, column labels as metadata descriptors, importing files to database tables and identifying which columns are to be used for indexing. Thus, a database structure is created for each file entered.
748
R.D. Kent et al.
Other structures are used for data that identifies and describes agencies and these are inputted using an interface designed for that purpose. Indices can be built on the columns required for queries. All other columns are left-alone. While this can be costly, it speeds up initial queries until more information is known to transform the imported data. Queries are constructed and submitted using a web based interface which allows users to build query expressions using a free-form approach, using simplified operators based on an SQL model of query. The interface utilizes menu choices for quick and accurate selection of fields, including database, table, column, value and metadata as expressions are built. For added convenience and later analysis, the system stores queries for re-application and simple editing. Finally, the system supports file exporting and various standard reports actuated through a report generation interface. Consistent with the limitation on importing CSV data files, only those are outputted. It is possible to produce reports in other file formats (e.g. PDF, ODF) using XSLT based document transformation approaches. We developed the PHC DSS using open source software components to derive benefits of open standards and self-management for a technical support team already knowledgeable in their use. Web based development, particularly for generating user interfaces for data entry and forms for query construction and submission, utilize web standards, including: CSS v2.1, DOM, HTML v4.0.1, HTTP v1.1, PHP v5.3.3, JavaScript, jQuery v1.4.4, XML v1.0, XML XPath Language v1.0, XHTML v1.0, and XSLT v1.0. These are employed using the AJAX approach to data exchange and update with a server, without having to reload web pages. For the database engine and for SQL support standards we used PostgreSQL v8.4. The critical module providing query expression processing was developed in C++, using the Boost Project with critical components from the Spirit, Variant and Fusion libraries. The Spirit Library Component is used for parsing a boolean logic grammar into an abstract syntax tree and generating an equivalent XML document.
4 Query Support for PHC DSS Implementing a suitable query interface for users required isolating the user from the details of database logic, such as SQL. To this end, we designed and implemented a pure Boolean propositional and first-order logic. The logic is based on a compiler supported simple language that permits users to build a query expression using AJAX requests that takes the inputted form data and maps it to a parser with an option for the user to save the query expression to the DB. The boolean expression (bool-expr) is then transformed to SQL form suitable for accessing the database. The expression is also rendered using XML to facilitate both transformation of expression and rendering within a browser context; XML also provides for additional metadata description that we are using to build rich semantics. Thus, our simple approach to ontology based semantics is more of a data dictionary enriched with XML tag data for added descriptiveness. If the form is loading a previously
Design and Implementation of a PHC Services Navigational System Architecture
749
saved expression, then the stored grammar string from the DB is passed to an XML formatter which is then transformed via XSLT into an HTML form which is used to populate the form with the previously entered data. The database information is represented by db:/tc/tablename/colname, where "tc" means "table-column" form separated by a '/'. The "tc", while crude, allows us to extend the nature of the data files entered to the system at a later date; such extensions may include multiple page spreadsheets (ie. 3-dimensional data), binary files, or other file types and organizations. The "db" prefix identifies such as a pure literal that references the "default" database; this can also be extended to access multiple databases. Table 1 Sample queries for LHIN DSS Description
Sample bool-expr query
Simple Query
eq(db:/tc/lhin_organization/org_name, "Canadian Medical Laboratories")
Slightly More Complex Query
eq(db:/tc/lhin_organization/org_name, "Canadian Medical Laboratories") or eq(db:/tc/lhin_organization/org_name, "Dental Health Center")
OHIP By Month Age
lt(db:/tc/ohip_by_month/age_yrs, 20)
NACRS Semiurgent Triage 65+
eq(db:/tc/nacrs_06070809/triage_level, "(4) LESS-URGENT/ SEMI-URGENT") and eq(db:/tc/nacrs_06070809/age_grp_65, "65+")
NACRS Nonurgent 65+ Female 0 to 4.0 hours
eq(db:/tc/nacrs_06070809/triage_level, "(5) NON-URGENT") and eq(db:/tc/nacrs_06070809/age_grp_65, "65+") and eq(db:/tc/nacrs_06070809/sex, "FEMALE") and eq(db:/tc/nacrs_06070809/emg_wait_time_group, "0 to 4.0 hours")
The query expressions are established using the complex query builder form. There is a feature where a user can choose a database table/column as a dropdown within each form component. That feature simply inserts a database literal (e.g., db:/tc/sometable/somecol) in that location. Boolean expressions are constructed using relations expressed as boolean functions that require arguments; hence, eq, ne, lt, le, gt, ge and contains. In addition, the binary infix boolean operators and and or, and unary not are also supported. Samples of the bool-expr queries that are typically generated are represented in Table 1. Because this format is more straightforward to express complex queries and edits, this grammar liberates the user and also permits greater extensibility in software design by raising the level of abstraction above specific APIs. Since the grammar is pure boolean propositional and first-order logic it has no side-effects and is declarative. This is useful for us to reinterpret its meaning (i.e., transform it
750
R.D. Kent et al.
to new forms) under different contexts (e.g., re-populating the complex query form to edit a query, or to render the query). The bool-expr grammar permits arbitrary queries to be built by users. Any table columns mentioned should have indices built at some point for speed. Such should be stored in a most-recently-used DB structure so that indices can be built, or purged, over time.
Fig. 2 Query processing workflow
The query construction and processing workflow is shown in Fig. 2. Following the numbered steps in Fig. 2, the details of the workflow consist of: 1. Build query using AJAX web form. We used AJAX so the user does not have to reload their web page. 2. Submit query. 3. Server-side PHP code converts the POST data into a bool-expr grammar form (i.e. a string). 4. Server-side PHP code them pumps the bool-expr grammar string into C++ code. 5. C++ code converts the bool-expr grammar into an equivalent XML grammar using Boost.Spirit's Qi (parser) and Karma (generator) components. 6. Server-side PHP code captures the output of C++ code and pumps it through an XSLT script to produce SQL. 7. Server-side PHP code pumps the resulting SQL to PostgreSQL. 8. Server-side PHP code takes the PostgreSQL response and generates the table. 9. The output is passed back to the AJAX web form to show the results.
Design and Implementation of a PHC Services Navigational System Architecture
751
5 Conclusion and Future Work towards Semantic Integration We have reported on development of a primary health care data navigation decision support system. The system was designed around four primary services, including data entry, query support, gap analysis and report generation. The context for development required a centralized approach to deal with security issues of access, data integrity and privacy and the PHC DSS was intended for use by a restricted group of policy analysts and decision makers. The research and development team worked directly with the PHC team through the project. This collaboration was essential in rapidly evolving the initially ambiguous requirements to detailed and focused specifications for IT developers. Open source software components and protocols were utilized throughout the project, enabling ongoing maintenance and low cost support by ESC LHIN personnel. Within this project the primary contributions to IT systems development for PHC lie in the abstractions to data entry and to query support. Given a rapid application development cycle, our team had to abstract from a few sample datasets how a broad range of datasets could be incorporated into the database as easily referenced tables and entries using standard SQL. Although our approach restricted importing of files to be in text based CSV formats, we perform data cleaning, label assignments and a richer descriptive semantics with a view to extending the file types for data entry as part of ongoing work. Our approach to query support was motivated by the user requirements that an interface support query expression building, coupled with direct referencing of tables, columns and value or range selection. We developed a boolean expression grammar, language and workflow that is non-standard in the sense that it is: (i) intended to be general in handling complicated grammars and (ii) used (embedded or partially) in other places within our module and code structure (e.g. to handle table joins). Although the core system we developed satisfied the stated requirements, our group is also involved in similar applications of greater scope where the role of semantics and reasoning are important for distributed query and query and file optimizations. Two directions of our current research program include extending the semantic basis for PHC information systems. We are investigating the Guideline Expression Language (GELLO) framework (Sordo et al 2004) as a suitable approach to integrating HL7 codes for clinical practice and data specification and other nonstandard codes. Enriching the semantics through ontology development is also part of our approach. Our group has been investigating the application of cultural modeling and simulation using multi-agent based approaches (Kobti et al 2011) and the query support developed in this paper is being extended towards supporting direct query interaction, using the bool-expr grammar inside a larger more expressive grammar, by AI agents in a real-time analytical context. Acknowledgments. RDK, AWS and ZK acknowledge support from AUTO21 and CIHR and the Primary Health Care Taskforce Group (PHCTG) of the Erie St. Clair Local Health Integration Network (ESC LHIN).
752
R.D. Kent et al.
References 1. Allemang, D., Hendler, J.: Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL. Morgan Kaufmann, San Francisco (2008) 2. Berner, E.S. (ed.): Clinical Decision Support Systems: Theory and Practice (Health Informatics). Springer, Heidelberg (2010) 3. CSV Standard File Format. RFC 4180, http://www.ietf.org/rfc/rfc4180.txt 4. Burstein, F., Holsapple, C.W.: Handbook on Decision Support Systems 1: Basic Themes. International Handbooks on Information Systems. Springer, Heidelberg (2008) 5. Burstein, F., Holsapple, C.W.: Handbook on Decision Support Systems 2: Variations. International Handbooks on Information Systems. Springer, Heidelberg (2008) 6. Healthgrid, http://www.healthgrid.org/ 7. Holsapple, C.W., Whinston, A.B.: Decision Support Systems: A Knowledge Based Approach, 10th edn. West Group (1996) 8. Kobti, Z., Snowdon, A.W., Kent, R.D., Bhandari, G., Rahaman, S.F., Preney, P.D., Kolga, C.A., Tiessen, B., Zhu, L.: Towards a “Just-in-Time” Distributed Decision Support System in Health Care Research. Annals of Information Systems: Supporting Real Time Decision-Making 13, Part 3, 253–285 (2011) 9. Kent, R.D., Kobti, Z., Snowdon, A.W., Aggarwal, A.: Towards a Unified Data Management and Decision Support System for Health Care. In: Tsihrintzis, G.A., Damiani, E., Virvou, M., Howlett, R.J., Jain, L.C. (eds.) Proceedings of The 3rd International Symposium on Intelligent Interactive Multimedia Systems and Services, IIMSS 2010. KES International, Springer, Heidelberg (2010) 10. Moss, L.T., Atre, S.: Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications. Addison-Wesley, Reading (2003) 11. Sordo, M., Boxwala, A.A., Ogunyemi, O., Greenes, R.A.: Description and Status Update on GELLO: a Proposed Standardized Object-Oriented Expression Language for Clinical Decision Support. Studies in Health Technologies and Informatics 107(Pt 1), 164–168 (2004) 12. Tiessen, B., Snowdon, A., Kent, R., Hussein, A., Preney, P., Woolcock, S.: Innovation in Patient Falls: Development of a Wireless Falls Reporting System to Prevent Falls in the Hospitalized Elderly. In: The International Society of Quality in Health Care (ISQual), Copenhagen, Denmark, October 24 (2008)
Emotion Enabled Model for Hospital Medication Administration Dreama Jain, Ziad Kobti, and Anne W. Snowdon
*
Abstract. In this paper we introduce emotions in the reasoning process of the nurses administering patient medication in a simulated hospital environment. Of particular interest is to examine the effects of emotional stress due to workspace events, the influence from patients or coworkers on the precise task of medication administration. Based on the five known rights of medication administration each step may be potentially influenced by the emotional challenge. Such mistakes in turn translate to a reduced quality of patient care causing patients to suffer needlessly. This work introduces the preliminary model for emotional decision making in an effort to better understand the risks and identify the factors to minimize them in a health care setting.
1 Introduction The medication administration process is commonly carried out by nurses in a hospital. Errors arising from the medication administration process present a challenge for the quality of patient care. Current research shows that some medication errors occur due to underperformance of the nurses in their day to day tasks. One of the reasons for this underperformance is emotional stress. Due to the complexity of the human emotional behaviour and difficulty in capturing such emotions in professional settings we seek to recreate such settings in an artificial computer model with the aim to accurately capture the hidden aspects of emotions and their consequences on decisions and underperformance. Risk here means the risk which threatens a patient’s life and decreases the quality of care. According to Bechara, decision making is a process which is influenced by marker signals which takes into account emotions and feelings. Somatic markers as described in [1] are the signals which influence decisions; the change in these somatic states can enable a person to do or to avoid doing a certain thing. In [2] authors suggest the use of somatic-marker hypothesis theory according to which Dreama Jain · Ziad Kobti · Anne W. Snowdon School of Computer Science and Odette School of Business, University of Windsor Windsor, ON, Canada, N9B-3P4 e-mail: {jainh,kobti,snowdon}@uwindsor.ca *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 753–762. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
754
D. Jain, Z. Kobti, and A.W. Snowdon
emotions influence decisions as being advantageous or disadvantageous. The emotions act as bodily states in the brain which influence decisions biasing towards choices which maximize rewards and minimize punishment. The authors discuss a study on patients with focal brain damage, which have lost the capability of making decisions emotionally. This suggests that the patients are unable to make quality decisions for life as they do not learn from their mistakes. From this we can state that emotions do play an important role in decision making and its influence can lead to good or bad decisions. In this paper we use emotional stress to influence decision making of nurses in a limited simulated and controlled setting while they perform the medication administration task. Medication administration is a task with five rights [3]: right time for the medication, right drug to be given, right measure of the dose of the medication drug, right route through which medication will be given and right patient for that medication. Corresponding to these rights we generate risks. In this task nurses play the central role, as they spend one-third of their time in medication administration. According to [3] medication administration task is complex and is pertained to errors especially by nurses. In this paper we simulate the medication administration task performed by nurses and see how emotional stress influences nurses’ decision making abilities while they perform this task. In this study we introduce an emotion enabled hospital system computer simulation based on artificial agents which models nurses and patients. The model generates emotions in patients as well as nurses, and records the performance of the nurses in medication administration under the influence of negative emotional stress. Corresponding to the five rights of medication administration we generate risks at each stage. If under emotional stress the nurse is unable to perform one of the rights correctly, the corresponding risk for that right is updated. The system reveals correlations between elevated patient emotions, nurse emotions, and resulting increased risk. The simulation is built using RepastJ, an agent-based modeling toolkit. In our previous work [4] three psychological models for emotions are examined and a corresponding algorithm is developed for each depicting its process. A generalized multi-agent model is designed to demonstrate the implementation of each of the three methods. An agent thus represents a human capable of exhibiting emotional state in response to an arbitrary emotionally charged event of varying impact. The performance of the three theories is almost similar with the difference in the number of emotions generated by each. We choose Ortony, Clore and Collins theory [5] based on previous work [6] to see the influence of the emotional stress on the task performance of the nurse and patient agents. We first develop a general model to outline a generic emotional agent behavior. Agents are then socially connected and surrounded by objects, or other actors, that trigger various emotions. A case study is built using a basic hospital model [7] where nurse servicing patients interact in various static and dynamic emotional scenarios. Exchange of emotions takes place between the nurse and the patient when the nurse serves a patient. When a nurse interacts with other
Emotion Enabled Model for Hospital Medication Administration
755
nurses, all are affected by the emotions. We assume that when nurses are under the influence of negative emotions they tend to make mistakes in their day to day tasks. These mistakes are recorded by two variations of the task’s performance: logical and emotional. The difference between the two shows that increase in emotional stress leads to higher error rates in nurse task performance. In previous work [7] we describe an agent based simulation which models patient falls in a dynamic hospital environment. The nurse agents follow a path represented by a directed weighted graph to move from one room to another and serve patient agents. At the end of the simulation the total number of patients served on time is calculated and success rate of the nurses is reported. As described in [8] emotional stress increases the complexity of decisionmaking leading the unconscious mind to make decisions faster than the conscious mind. Conscious (deliberate) decision-making requires cognitive resources, and because that places increasing strain on those resources, the quality of our decision-making declines as complexity increases [8]. Studies suggest that most common errors that occur in hospitals are with medication administration that leads to reduced quality of patient care and higher rate of fatality. These errors may occur due to stress among the nurses. Ortony, Clore and Collins (OCC) [5] developed the cognitive structure of emotions, according to which the emotions are generated in reaction to events, agents and objects. Well being emotions, fortune of others emotions and prospect based emotions are categorized in event based or reaction to events emotions. These emotions take into account the desirability of an event to take place, its deservingness for the other person and likelihood of an event to take place. Well being emotions are characterized into pleased and displeased. Fortune of others is characterized according to the desirability of the event for the other agent into pleased and displeased. Prospect based emotions have hope and fear, and confirmation of a prospect has satisfaction and fear. Confirmed and disconfirmation of prospect have relief and disappointment. Reactions to agents generate attribution emotions which characterize into approving or disapproving of one’s own action or someone else’s action. There are some compound emotions that can be generated. Approving one’s own action with pleasure leads to gratification, and disapproving one’s own action with displeasure leads to remorse. When approval of another agent’s action combines with pleasure, the outcome changes to gratitude, else anger if action is disapproved. Reactions to objects lead to attraction emotions which characterize to either liking or disliking the object.
2 Approach In the simulated hospital model we have nurse and patient agents. The nurses’ task is to serve the patients, specifically to administer medication on time. All nurses are assigned patients and it is assumed that only the assigned nurse will serve the designated patient. The nurses are already aware of the shortest path to follow to reach a patient’s room. The hospital floor plan used in this simulation is the actual
756
D. Jain, Z. Kobti, and A.W. Snowdon
floor plan of the Leamington General Hospital. The patient agents have fixed locations, while the nurse agents move from one room to another according to a schedule they generate to provide medication to a given patient on time. To make the model more realistic, while a nurse is traversing the path toward a patient’s room, when he/she encounters another nurse they interact, causing delay. The nature of the interaction reflects typical consultation between nurses where we model it at different parameterized rates and durations. Each patient has some severity according to which the patient generates an emotion. The nurse is also affected by the patient’s emotion which may lead to underperformance while he/she is under the influence of stress. The nurses exchange emotions while they interact with each other. The emotion for each agent is updated according to some event that takes place throughout the simulation. In case of patients, when their severity is high they tend to have negative emotions. Generation of emotion depends upon change in some variables. These variables are desirability, deservingness, probability of event described in [4]. The hospital system runs for different time steps, where one time step is equivalent to 12.5 seconds simulated in real time. We run the simulation for an 8 hour shift that is for 2304 time steps. Figure 1 describes the five rights that are followed by nurse for medication administration. We describe the process of medication administration as a series of steps. The effect of error made in any of these five rights depends upon the medication risk of the patient. The medication risk is defined as high, medium or low (Figure 2). A patient with high medication risk will be at higher risk of fatality in case if something goes wrong in its medication administration. Knowing the time at which the patient should be receiving the medication we measure against the time at which the nurse has actually reached the patient room. If she/he has arrived late then depending upon the medication risk of the patient, the value of time risk is calculated. As shown in Figure 2 if a high risk medication patient is served medication 15 to 20 minutes late, then the value of time risk crosses the threshold of maximum value of 20 and can reach up to 100. Similarly in case of medium risk medication if a nurse is late for more than 25 minutes, then the risk can go beyond its threshold. In case of low risk medicine, a patient can be served late up to 35 minutes. Now once the nurse has reached a patient’s room, in order to prepare the medicine the nurse needs to follow the steps of medication administration. If a nurse is under emotional stress then there are higher chances of her making an error, which will correspondingly increase the risk to the patient. Drug risk is calculated while a nurse is under emotional stress or pursuing negative emotion; there is 50% chance of making an error while picking the drug for that patient. Dose risk and route risk also depends upon the emotional stress of the nurse with 50% chance that the nurse makes an error in measuring the dose of the medication or giving the medication through the wrong route. Finally patient risk is calculated if the nurse gives the medication to the wrong patient, which has only 10% chance of happening. Each risk has a maximum value of 20 with the exception of higher risk in some cases. So the total risk value is calculated out of 100 by adding all the risks for each of five rights.
Emotion Enabled Model for Hospital Medication Administration
757
Fig. 1 Five rights of medication administration 120
100
Risk
80
60
40
20
0 0
5
10
15
20
25
30
35
40
45
50
Time Medium Risk
High Risk
Low Risk
Fig. 2 Graph showing the rate of risk increase with time for High, Medium and Low risk medications.
3 Experiments We ran the simulation for an equivalent of an 8 hours shift for 12 nurses and 60 patients where each nurse is assigned to 5 patients. The simulation ran again with 90 patients but the same number of nurses to simulate additional stress on the nurse’s workload. For every time step negative emotions of the patients as well as nurses are recorded. Each of the risks (five rights) is recorded as well and the average of the total, minimum and maximum risk for every time step is recorded and plotted for all agents. Experiments show that when the patients that have the highest negative emotion the corresponding nurses also display the highest negative emotions around the same time. Such patterns can be examined in Figures 3, 4, 5, 6 and 7. When emotions peek we see the peeks in the risks along with a clear correspondence between emotional stress and increased risk; that is the increased likelihood of medical errors.
758
D. Jain, Z. Kobti, and A.W. Snowdon 20 18
Number of Patients
16 14 12 10 8 6 4 2 0 0
100
300
500
700
900
1100
1300
1500
1700
1900
2100
Time Step
total
displeased
fear
disappointment
remorse
Fig. 3 Graph showing negative emotions in patients for every time step (60 patients) 7
6
Number of Nurses
5
4
3
2
1
0 0
100
300
500
700
900
1100
1300
1500
1700
1900
2100
Time Step
remorse
total
displeased
fear
anger
disappointment
Fig. 4 Graph showing negative emotions in nurses for every time step (60 patients)
Emotion Enabled Model for Hospital Medication Administration
759
70
60
Total Risk
50
40
30
20
10
0 0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
Time Step lowrisk
highrisk
mediumrisk
total
Fig. 5 Graph showing the total number of patients at risk every time step (60 patients)
16 14 12
Risk
10 8 6 4 2 0 0
100
300
500
700
900
1100
1300
1500
1700
1900
2100
Time Step
Time Risk
Drug Risk
Dose Risk
Route Risk
Patient Risk
Fig. 6 Graph showing the value for the five risks for every time step (60 patients)
760
D. Jain, Z. Kobti, and A.W. Snowdon 120
100
Total Risk
80
60
40
20
0 0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
Time Step minimum
average
maximum
Fig. 7 Average, minimum and maximum value of total risk for every time step (60 patients)
90 80
Total Patients at Risk
70 60 50 40 30 20 10 0 0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
Time Step lowrisk
highrisk
mediumrisk
total
Fig. 8 Average, minimum and maximum value of total risk for every time step (90 patients)
Figures 8, 9 and 10 summarize the simulation results with 90 patients but the same number of nurses. Nurses’ emotions are influenced by patients’ emotions. Emotional stress among nurses can lead to some fatal errors due to their performance in the medication administration task. With the increase of patients the rate of risk also increases. We observe that in the case of 60 patients the average risk goes beyond 60, while for 90 patients it does not, this is so because with the increase in the number of patients, more become at risk but low risk or medium risk and not high risk (Figure 8). In 60 patients the patients with high risk are more (Figure 5)
Emotion Enabled Model for Hospital Medication Administration
761
compared to the case with 90 patients, that is why it does not affect the average value of risk as much. 16 14 12
Risk
10 8 6 4 2 0 0
100
300
500
700
900
1100
1300
1500
1700
1900
2100
Time Step
Time Risk
Drug Risk
Dose Risk
Route Risk
Patient Risk
Fig. 9 The value for the five risks for every time step (90 patients) 180 160 140
Total Risk
120 100 80 60 40 20 0 0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
Time Step minimum
average
maximum
Fig. 10 Average, minimum and maximum value of total risk for every time step (90 patients)
4 Conclusion and Future Work While this preliminary work does not provide a definitive tool nor is validated yet, it is intended to introduce the possibility of such a tool with hypothetical scenarios. For instance, it is logical to conclude that a simulation can be built to incorporate the
762
D. Jain, Z. Kobti, and A.W. Snowdon
complex effects of emotions between patient and nurse, nurse to nurse, and ultimately nurse performing a critical task. In our hypothetical test cases, when the nurses were faced with an increased number of patients we see a clear increase in negative emotions. In many other runs we observed a “despair” or “disappointment” emotion as patient tempers build up. Aside from quantitative analysis this tools sheds light on the possibility of evaluating the quality of the care in terms of a specific feeling the nurses and patients encounter as examined under a specific setup. Much work remains to be done in validating the simulation with real case scenarios and improve the capabilities of the simulation.
Acknowledgements This work is partially funded by a grant from Auto21, NSERC Discovery and the many conversations and support of the nurses at the Leamington General Hospital.
References 1. Bechara, A.: The role of emotion in decision-making: Evidence from neurological patients with orbitofrontal damage. Brain and Cognition 55, 30–40 (2004) 2. Naqvi, N., Baba Shiv, B., Bechara, A.: The Role of Emotion in Decision Making a Cognitive Neuroscience Perspective. Current Directions in Psychological Science 15(5), 260–264 (2006) 3. Wakefield, B.J., Wakefield, D.S., Uden-Holman, T., Blegen, M.: Nurses’ perceptions of why medication administration errors occur. Medsurg Nurs. (1), 39–44 (1998) 4. Jain, D., Kobti, Z.: Emotionally Responsive General Artificial Agent Simulation. In: FLAIRS 2011, AAAI Proceedings (to appear, 2011) 5. Ortony, A., Clore, G., Collins, A.: The cognitive structure of emotions. Cambridge University Press, Cambridge (1988) 6. Jain, D., Kobti, Z.: Simulating the effect of emotional stress on task performance using OCC. In: Canadian AI. LNAI (submitted 2011) 7. Bhandari, G., Kobti, Z., Snowdon, A.W., Nakhwal, A., Rahman, S., Kolga, C.A.: Agent-Based Modeling and Simulation as a Tool for Decision Support for Managing Patient Falls in a Dynamic Hospital Setting. In: Schuff, D., Paradice, D., Burstein, F., Power, D.J., Sharda, R. (eds.) Deci. Supp. 2011. Annals of Information Systems, vol. 14, pp. 149–162. Springer, Heidelberg (2011) 8. Dijksterhuis, A.: Think Different: The Merits of Unconscious Thought in Preference Development and Decision Making. J. of Pers. and Soc. Psych. 87(5), 586–598 (2004)
Health Information Technology in Canada’s Health Care System: Innovation and Adoption Anne W. Snowdon, Jeremy Shell, Kellie Leitch, O. Ont, and Jennifer J. Park
*
Abstract. Health information technology (HIT) offers improved efficiency and effectiveness in health care systems, especially in regards to patient safety, disease screening, communication between clinicians, and overall cost effectiveness through reduced duplication of processes and testing. However, Canada has been slow to adapt and implement these technological advances compared to other developed nations. Barriers in HIT adoption for Canada include cost, value for investment, variability in needs and resources, lack of standardization, and lack of human resources. In addition, there is a lack of reliable data or study to support the benefits of HIT. To ensure the sustainability of its health care system as well as its competitive economic future, Canada needs to support, innovate, and implement HIT.
1 Introduction Canada has a longstanding reputation globally for having one of the foremost health care systems that offers high quality health care to all citizens, based on the principles of universal access. A considerable amount of funding goes into the Canadian health care, with Canada’s health sector accounting for twelve percent of the national economy. According to the Conference Board of Canada, Federal and Provincial governments put $183.19 billion into the Canadian health systems each year, which is supplemented by an additional $40 billion from private sources. [1] Yet, like many developed nations, the Canadian healthcare system is facing increasing challenges in delivering high quality health care services that are timely and accessible to an aging population with increasing rates of chronic illness. [1] To make matters worse, Canada is not among the twelve top-performing countries in health care provision. [1] This is reflected in a 2008 report on innovative capacity by McKinsey & Company where Canada ranks thirteenth out of seventeen industrialized nations measure. [2] Innovation in the health sector has lagged behind other industry sectors. Both the Conference Board of Canada and McKinsey point to Canada’s “innovation deficit” in adoption of new technologies as one of the biggest barriers facing the nation’s health care system. By failing to Anne W. Snowdon · Jeremy Shell · Kellie Leitch · O. Ont · Jennifer J. Park Odette School of Business, University of Windsor, Canada *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 763–768. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
764
A.W. Snowdon et al.
more quickly adopt new technologies as well as innovative processes and procedures, Canadian health care is becoming less efficient, more expensive, and in danger of losing sustainability while failing to meet the high standards for quality citizens expect. This chapter examines the current position of information technology (specifically, health information technology: HIT) within the Canadian health care system, it’s future role, and what steps need to be taken in order to necessitate improvement in efficiency and effectiveness of Canada’s health sector.
2 Health Information Systems and Personalized Medicine The digital era has rapidly improved the efficiencies and effectiveness of access to information and enabling businesses to customize services and products to greater degrees than ever before. [3] Health care is arguably the most informationintensive of industries with approximately 2,000 health care transactions taking place every minute in Canada. [4] Health information has the potential to create seamless delivery of health care services by digitizing basic health records, using electronic tracking of patient information through the health system, and improving digital communication among health professionals, patients, and their family members. Health information technology (HIT), therefore, can transform the heath care system and contribute greatly to its long-term effectiveness and sustainability. However, in the Canadian health care sector these technologies have been slow to be adopted despite substantial investment of resources. An innovative Canadian health care system could leverage personalized health care technologies and products to achieve greater quality of care and enhanced efficiency of health service delivery. [5] Personalized health care refers to health services and technologies that are customized to the unique needs of the patient, resulting in a more patient centered approach to health care delivery. Such emerging health information technologies are positioned to have a substantial impact on Canada’s health system. These technologies can enable patients to communicate directly to their physicians or other health care providers, supported by detailed, accurate records of their personal health experiences and trends over time. [6] Despite evidence that improvements in productivity can be linked to investments in information and technology, the amount spent by Canadian hospitals on information and communication technology in 2005 constituted only 1.5% of their total operating budgets. [7] This figure stands in contrast to countries such as Italy, Sweden, and the United Kingdom where over 5% of hospital budgets are allocated to information and communication technologies. [8]
3 Barriers to HIT Adoption There are numerous barriers in the adoption of HIT within the Canadian health care system. Of these, 1) cost, 2) value of investment, 3) variability in needs and resources, 4) lack of standardization, and 5) human resource requirements for health information system management have been identified by researchers.
Health Information Technology in Canada’s Health Care System
765
3.1 Cost Implementation costs, such as installation of substantial hardware and ongoing maintenance, of incorporating latest HIT are high. [9,10] Currently, clinicians in the Canadian health system receive inadequate reimbursements and no compensation for support of HIT adoption. [5, 9-13] Improvements in quality and patient safety are perceived to accrue to the payers of health care while providers are required to invest in the implementation of electronic health records (EHR). [14, 15] Fee-for-service reimbursement models give little incentive to health care providers to adopt EHR. [14, 16] Clinicians avoid information technologies due to fear of potential disruption of their practices in the wake of increased workloads and major investments of their time spent monitoring online patient data. [9, 12, 15, 17]
3.2 Value for Investment Lack of understanding and underestimation of the value for HIT investment is largely due to the fact that recent studies show no significant HIT impact on patient outcomes. [11, 18] In addition, the value of HIT has not been proven empirically. Attention has been paid to the notion that HIT may actually introduce errors into health data rather than reducing error. [20] In order to convince professionals across all systems of care in the field of health services, adoption of HIT and its benefits need to be clearly documented and described in detail in the health systems level.
3.3 Variability in Needs and Resources Some clinical agencies without existing computer technology expertise would require a steeper learning curve compared to other more technologically advanced practices to adopt HIT. [9, 10, 20] Thus, variability in the needs and resources of clinical settings is substantial. So far, strategies that offer rapid, efficient, and effective approaches to HIT adoption have not been developed or tested.
3.4 System Standardization The lack of HIT standardization poses a significant challenge in the adoption of HIT in Canadian health care system. Clinical data formats, health information exchange across agencies or settings, data security in exchanges between clinics, laboratories, pharmacies and radiological imagining labs are seen as a barrier to adoption by clinicians. [14, 15, 21, 22] Moreover, health experts worry about health illiteracy of patients in taking greater stakes in management of their own personal health. [23, 24, 25] There are additional concerns over who owns health data and how to manage discrepancies between EHR and personal health records (PHR). [14,21,26] Privacy concerns have not been examined in detail as practitioners fear legal liability issues related to data management, which in turn may pose as a challenge to patients who rely heavily on the integrity of their own personal health records. [12,13,22,24,27, 28]
766
A.W. Snowdon et al.
3.5 Human Resources Investment in Canada’s HIT capacity will require additional human resource skills. There is a serious risk of labour and skill shortages that will constrain the implementation of HIT across Canada. [29] Large amounts of patient-generated data may be overwhelming for clinicians to assess and respond to in a time sensitive and personalized manner. [28] HIT innovation may pose a substantial challenge for a health professional workforce which is already faced with mounting pressures to deliver health services to growing patient demands. It is estimated that Canada will need to fill 112,000 IT-related health care jobs in the next five years. [30] Beyond new hires, approximately 33,000 health informatics workers will need to significantly upgrade their skills by 2014. This does not take into account the projected workforce needs to accommodate the disruptive nature of the HIT adoption the industry needs in order to build and sustain capacity and productivity. [30] Currently, Canada’s federal government operates 124 networks and 144 data centres across the country, yet the federal government’s 120,000 Wintel and Unix federal servers use less than 10% of their operating capacity. [30] In addition, 40% of the IT professionals who oversee these networks are eligible for retirement in the next five years. [30]
4 Conclusion Innovation and adoption of HIT systems and practices are expected to hold great potential for improving safety, disease screening, communication between clinicians, and overall cost effectiveness through reduced duplication of processes and testing. Existing evidence strongly favors the potential of HIT to improve the capacity and sustainability of the Canadian health care system. [5] Yet, despite evidence of its potential and recent technological developments, HIT is poorly adopted in Canadian health care. In global comparisons with other countries, Canada ranks last in all measures of HIT adoption in physician practice. The cause for this phenomenon may be linked with the limitation of current incentives, health legislation and policy, and inadequate profile of the value of HIT innovation adoption. In addition to ongoing research that will highlight the value of HIT in Canadian health care, improving HIT systems and increasing the number of trained IT professionals will increase Canada’s capacity for health care provision as well as much needed system-wide sustainability efforts.
References 1. Conference Board of Canada, How Canada Performs: A Report Card on Canada (2008) 2. McKinsey and Company, Breaking Away from the Pack: Enhancing Canada’s Global Competitiveness (2008) 3. McGahan, A.: How Industries Change. Harvard Business Review, pp. 87–94 (October 2004)
Health Information Technology in Canada’s Health Care System
767
4. Canada Health Infoway Annual Report (2008), http://www2.infoway-inforoute.ca/Documents/ar/ Annual_Report_2008-2009_en.pdf 5. Hillestad, R., Bigelow, J., Bower, A., Girosi, F., Meili, R., Scoville, R., et al.: Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. Health Aff (Millwood) 24(5), 1103–1117 (2005) 6. Tufano, J.T., Ralston, J.D., Martin, D.: Providers’ experience with an organizational redesign initiative to promote patient-centered access: a qualitative study. J. Gen. Intern. Med. 23(11), 1778–1783 (2008) 7. Arsenault, J.F., Sharpe, A.: An analysis of the causes of weak labour productivity Growth in Canada since 2000, Calculators based on the Canadian productivity accounts from Statistics Canada, Canism Table 383-0021. International Productivity Monitor 16 (2006) 8. Prada, Santaguida: Conference Board, p. 30, Chart 16 (2007) 9. Miller, R.H., Sim, I.: Physicians’ use of electronic medical records: barriers and solutions. Health Aff (Millwood) 23(2), 116–126 (2004) 10. Blumenthal, D., Glaser, J.: Information technology comes to medicine. N. Engl. J. Med. 14;356(24), 2527–2534 (2007) 11. Carrier, E., Gourevitch, M.N., Shah, N.: Medical homes: challenges in translating theory into practice. Med. Care 47(7), 714–722 (2009) 12. Kaelber, D.C., Jha, A.K., Johnston, D., Middleton, B., Bates, D.: A research agenda for personal health records (PHRs). J. Am. Med. Inform. Assoc. 15(6), 729–736 (2008) 13. Halamka, J.D., Mandl, K.D., Tang, P.: Early experiences with personal health records. J. Am. Med. Inform. Assoc. 15(1), 1–7 14. Middleton, B., Hammond, W.E., Brennan, P.F., Cooper, G.: Accelerating U.S. EHR adoption: how to get there from here. recommendations based on the 2004 ACMI retreat. J. Am. Med. Inform. Assoc. 12(1), 13–19 (2005) 15. Detmer, D., Bloomrosen, M., Raymond, B., Tang, P.: Integrated personal health records: transformative tools for consumer-centric care. BMC Med. Inform. Decis. Mak. 6(8), 45 (2008) 16. Davidson, S.M., Heineke, J.: Toward an effective strategy for the diffusion and use of clinical information systems. J. Am. Med. Inform. Assoc. 14(3), 361–367 (2007) 17. Hess, R., Bryce, C.L., Paone, S., Fischer, G., McTigue, K.M., Olshansky, E., Zickmund, S., Fitzgerald, K., Siminerio, L.: Exploring challenges and potentials of personal health records in diabetes self-management: implementation and initial assessment. Telemed. J. E. Health 13(5), 509–517 (2007) 18. Keyhani, S., Hebert, P.L., Ross, J.S., Federman, A., Zhu, C.W., Siu, A.: Electronic health record components and the quality of care. Med. Care 46(12), 1267–1272 (2008) 19. Ash, J.S., Berg, M., Coiera, E.: Some unintended consequences of information technology in health care: the nature of patient care information system-related errors. J. Am. Med. Inform. Assoc. 11(2), 104–112 (2004) 20. Reardon, J.L., Davidson, E.: An Organisational Learning Perspective on the Assimilation of Electronic Medical Records among Small Physician Practices. European Journal of Information Systems 16, 681–694 (2007) 21. Wiljer, D., Urowitz, S., Apatu, E., DeLenardo, C., Eysenbach, G., Harth, T., Pai, H., Leonard, K.J.: Canadian Committee for Patient Accessible Health Records. Patient accessible electronic health records: exploring recommendations for successful implementation strategies. J. Med. Internet. Res. 31, 10(4):e34 (2008)
768
A.W. Snowdon et al.
22. Yasnoff, W.A., Humphreys, B.L., Overhage, J.M., Detmer, D.E., Brennan, P.F., Morris, R.W., Middleton, B., Bates, D.W., Fanning, J.: A consensus action agenda for achieving the national health information infrastructure. J. Am. Med. Inform. Assoc. 11(4), 332–338 (2004) 23. Lorence, D.P., Greenberg, L.: The zeitgeist of online health search. Implications for a consumer-centric health system. J. Gen. Intern. Med. 21(2), 134–139 (2006) 24. Weitzman, E.R., Kaci, L., Mandl, K.: Acceptability of a personally controlled health record in a community-based setting: implications for policy and design. J. Med. Internet. Res. 29,11(2):e14(2009) 25. Nijland, N., van Gemert-Pijnen, J., Boer, H., Steehouder, M.F., Seydel, E.: Evaluation of internet-based technology for supporting self-care: problems encountered by patients and caregivers when using self-care applications. J. Med. Internet. Res. 13, 10(2):e13 (2008) 26. Hughes, B., Joshi, I., Wareham, J.: Health 2. J. Med. Internet Res. 6, 10(3):e23 (2008) 27. Grossman, J.M., Zayas-Cabán, T., Kemper, N.: Information gap: can health insurer personal health records meet patients’ and physicians’ needs? Health Aff (Millwood) 28(2), 377–389 (2009) 28. Tang, P.C., Ash, J.S., Bates, D.W., Overhage, J.M., Sands, D.: Personal health records: definitions, benefits, and strategies for overcoming barriers to adoption. J. Am. Med. Inform. Assoc. 13(2), 121–126 (2006) 29. Health Informatics, J.: Health Information Management: Human Resources Report (2009) 30. Information Communication Technology Council (ICTC), Canada faces widespread ehealth skills shortage. CIO Canada: December 3 (2009)
Hierarchical Clustering for Interval-Valued Functional Data Nobuo Shimizu
Abstract. In this paper, we deal with hierarchical clustering for interval-valued functional data. Functional data is defined as the data which is function, or as the data approximated as a function. Functional cluster analysis is proposed as cluster analysis for functional data. Interval-valued functional data is defined as the functional data whose range corresponding to each value in the domain is interval-valued data. Interval-valued data is typical of symbolic data, and also interval-valued functional data can be considered to be a kind of symbolic data. We propose hierarchical clustering for interval-valued functional data as the extension of functional clustering method, and apply this method to real data. Keywords: Functional data analysis, Symbolic data analysis, Hierarchical clustering.
1 Introduction Ramsay proposed functional data analysis (FDA) [11][12][13] as the statistical analyzing method that discrete data set which represent a continuous sequence is regarded as a real-valued function. In FDA, including the case that functions are given from a beginning, we can propose more various approaches than in conventional statistical methods. For example, we can use differentiation of functional data, apply conventional methods to finite multivariate data which transformed functional data by approximate expansion using finite basis functions, and analyze functional data by using variation method. On the other hand, Diday [3] defined symbolic data as a new concept of statistical data, and proposed symbolic data analysis (SDA) as the new methods for symbolic data. Symbolic data can include not only single numerical/categorical data, but Nobuo Shimizu The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562 Japan e-mail: [email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 769–778. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
770
N. Shimizu
also interval-valued data, modal-valued data, distribution-valued data, hierarchicalstructured data, and so on. In SDA, many conventional statistical methods are extended [1][2][4], however most of them are the methods for interval-valued data. Interval-valued functional data is defined as the functional data whose range corresponding to each value in the domain is interval-valued data, and can be considered to be a kind of symbolic data. There is room for development about statistical methods for interval-valued functional data. In this paper, we propose hierarchical clustering for interval-valued functional data. We extend various distance/dissimilarity criteria between interval-valued data to interval-valued functional data, apply these new criteria to real data and show the results of clustering.
2 Functional Cluster Analysis In this section, we show the previous studies for functional data, cluster analysis, and functional cluster analysis.
2.1 Functional Data In conventional multivariate analysis, we analyze a numerical multivariate data set. On the other hand, in functional data analysis, each data is not a numerical data set but a set of functions, and we directly analyze functions. Fig. 1 is an example of functional data. It represents daily average temperature data from 1960 to 1994 at 32 different locations in Canada[12][13]. The data are obtained as discrete-valued, but they can be regarded as continuous sequence. It can be more reasonable to analyze the continuous sequence, i.e. functions, than to analyze the discrete-data.
2.2 Cluster Analysis Cluster analysis is a technique to classify an original data set into some subsets (called clusters) by using some distance or similarity/dissimilarity criterion. There are Euclidean distance, Manhattan distance, Chebyshev distance, and so on that often used as distance criteria. Cluster analysis is divided roughly into hierarchical clustering and non-hierarchical (or partitional) clustering. Hierarchical clustering makes an original data set a hierarchy of clusters which may be represented in a tree structure (called dendrogram) based on some linkage criterion. Single linkage, complete linkage, median method, centroid method, Ward’s method, Mcquitty’s method, group average method, etc. are often used as linkage criteria. On the other hand, non-hierarchical (or partitional) clustering assigns each data to the cluster whose center is nearest. K-means algorithm is a typical technique of non-hierarchical clustering.
Hierarchical Clustering for Interval-Valued Functional Data
771
Fig. 1 Daily average temperatures at 32 different locations in Canada[7]
2.3 Functional Cluster Analysis In conventional cluster analysis, we deal with numerical or categorical data. On the other hand, in functional cluster analysis, we deal with functional data. Mizuta [10] proposed some methods for functional cluster analysis. In this paper, we consider that each functional data has a common argument with the same domain and belongs to the same functional family. The distance between two different functional data f and g is represented by the following integration. That is, DM ( f , g) = and DE ( f , g) =
| f (t) − g(t)|dt
(1)
{ f (t) − g(t)}2 dt,
(2)
t∈T
t∈T
where T is the domain of f and g. DM ( f , g) and DE ( f , g) correspond to Manhattan distance and Euclidean distance respectively.
772
N. Shimizu
3 Interval-Valued Functional Data In this section, we refer to interval-valued data in SDA, and explain interval-valued functional data which consist of a pair of functions, an upper function and a lower function.
3.1 Interval-Valued Data in SDA In SDA, an one dimensional interval value Ai (i = 1, · · · , N) is represented by two real values; (3) Ai = [aLi , aUi ], aLi ≤aUi , aLi , aUi ∈ R. Suppose an interval value aggregated single real-valued data Ai , the interval value [aLi , aUi ] can be represented as aLi = infAi , aUi = supAi .
(4)
3.2 Interval-Valued Function Interval-valued functional data are extensions of functional data to interval-valued data. They are datasets of function consisting of two functions. That is fi (t) = [ fiL (t), fiU (t)], fiL (t)≤ fiU (t), (i = 1, · · · , N),
(5)
where fiL (t) and fiU (t) are a lower function and an upper function of fi (t) respectively. Fig. 2 is an example of one interval-valued function.
Fig. 2 Interval-valued function aggregating data shown in Fig. 1[7]
Hierarchical Clustering for Interval-Valued Functional Data
773
3.3 Distance Criteria of Interval-Valued Data For interval-valued data, some distance/dissimilarity criteria are defined [1][2][4]. We introduce them. We define p-dimensional interval-valued data L U L U Ai = ([aLi1 , aUi1 ], · · · , [aLip , aU ip ]) and A j = ([a j1 , a j1 ], · · · , [a j p , a j p ]) (i, j = 1, · · · , N; i= j; k = 1, · · · , p). Hausdorff Distance. Hausdorff Distance between Ai and A j is p
D(Ai , A j ) =
∑ Dk (Ai , A j ),
(6)
k=1
where the distance for k-th dimension Dk (Ai , A j ) is Dk (Ai , A j ) = max(|aLik − aLjk |, |aUik − aUjk |).
(7)
Euclidean Hausdorff Distance between Ai and A j is p
∑ [Dk (Ai , A j )]2 ,
D(Ai , A j ) =
(8)
k=1
where Dk (Ai , A j ) is (7). Gowda-Diday Dissimilarity. Gowda-Diday Dissimilarity[5] between Ai and A j is p
DGD (Ai , A j ) =
3
∑ ∑ DGD kl (Ai , A j ),
(9)
k=1 l=1
where
⎧ GD L −aL |/|y | k ⎨ Dk1 (Ai ,A j )=|a ik jk
U L DGD −aik |−|aUjk −aLjk |/Ui jk k2 (Ai ,A j )=|a ik ⎩ GD U L U L Dk3 (Ai ,A j )=|aik −aik |+|a jk −a jk |−2Ii jk /Ui jk ,
(10)
|yk | = max(aU (aLmk ), (m = 1, · · · , N), mk ) − min m m Ui jk = max(aUik , aUjk ) − min(aLik , aLjk ) ,
(11)
Ii jk = max[min(aUik , aUjk ) − max(aLik , aLjk ), 0].
(13)
(12)
Ichino-Yaguchi Dissimilarity. Ichino-Yaguchi Dissimilarity[6] between Ai and A j is p
DIY (Ai , A j ) =
∑ DIYk (Ai , A j ),
k=1
(14)
774
N. Shimizu
where DIY k (Ai , A j ) L U L = (Ui jk − Ii jk ) − γ (|aU ik − aik | + |a jk − a jk | − 2Ii jk ) L U = (1 − γ )(Ui jk − Ii jk ) + γ max[max(aik , aLjk ) − min(aU ik , a jk ), 0],
(15)
0 ≤ γ ≤0.5, Ui jk and Ii jk are (12) and (13) respectively.
4 Clustering for Interval-Valued Functional Data When we apply functional clustering to interval-valued functional data, we can define new distance/dissimilarity criteria as well as the case for interval-valued data and functional data. Ikeda et al. [8] have already defined and used Functional Normalized Euclidean Hausdorff Distance as another dissimilarity between two different interval-valued functions. We suppose that an interval-valued functional data fi (t) and f j (t) (i= j) is defined by (5), and define some other criteria. Functional Hausdorff Distance. We define Functional Hausdorff Distance as follows: DFHD ( fi , f j ) =
t∈T
max(| fiL (t) − f jL (t)|, | fiU (t) − f U j (t)|) dt.
(16)
That is, (16) is the extension of (6). We also define Functional Euclidean Hausdorff Distance as follows: 2 DFEHD ( fi , f j ) = max{| fiL (t) − f jL (t)|2 , | fiU (t) − f U (17) j (t)| } dt. t∈T
Functional Gowda-Diday Dissimilarity. The total length for interval-valued data set to normalize span is defined as (11). The span normalizing function y(t) can be defined as follows: ∀
t,
y(t) = max fmU (t) − min fmL (t), (m = 1, · · · , N). m
m
(18)
Union and intersection for interval-valued data are defined as (12) and (13) respectively. Union and intersection function are extensions of (12) and (13), that is L L Ui j (t) = | max( fiU (t), f U j (t)) − min( f i (t) − f j (t))|
(19)
Ii j (t) = max[max( fiL (t), f jL (t)) − min( fiU (t) − f U j (t)), 0],
(20)
and respectively. We define Functional Gowda-Diday Dissimilarity as follows: DFGD ( fi , f j ) =
t∈T
3 ( fi , f j )(t) dt, ∑l=1 DFGD l
(21)
Hierarchical Clustering for Interval-Valued Functional Data
where
775
⎧ FGD ⎨ D1 ( fi , f j )(t)=| fiL (t)− f Lj (t)|/|y(t)|,
DFGD ( f i , f j )(t)=| f iU (t)− fiL (t)|−| f U (t)− f Lj (t)|/Ui j (t), j 2 ⎩ FGD L D3 ( f i , f j )(t)=| f iU (t)− fiL (t)|+| f U j (t)− f j (t)|−2Ii j (t)/Ui j (t).
(22)
Functional Ichino-Yaguchi Dissimilarity. We define Functional Ichino-Yaguchi Dissimilarity as follows: DFGD ( fi , f j ) = t∈T {(Ui j (t) − Ii j (t)) L − γ (| fiU (t) − fiL (t)| + | f U j (t) − f j (t)| − 2Ii j (t))} dt = t∈T {(1 − γ )(Ui j (t) − Ii j (t)) + γ max[max( fiL (t), f jL (t)) − min( fiU (t), f U j (t)), 0]} dt,
(23)
where 0 ≤ γ ≤0.5, Ui j (t) and Ii j (t) are (19) and (20) respectively.
5 Application to Real Data In this section, we apply interval-valued functional hierarchical clustering to real data, and consider the differences between dendrograms obtained by using each distance/dissimilarity criterion.
5.1 Temperature Data in Portugal In this paper, we use monthly average temperature data from 1971 to 2000 at 9 cities in Portugal[9]. Table 1 shows the 9 cities. We regard monthly high and low average temperatures as interval-valued data. Fig. 3 is an example data of minimum/maximum of average temperatures (=M´edia da M´ınima/M´axima) in Lisbon. The data sets of minimum/maximum of average temperatures in each city are regarded as lower/upper function of the interval-valued function respectively. We transform temperature data to interval-valued functional data as 6th-degree polynomial of which argument is the number of month. Fig. 4 is an example.
Table 1 9 cities in the temperature data City Lisbon Coimbra Porto
Location City Location City Location c, M. sou. Braganc¸a i, M. n.e. V.R.S.Anton´ıo c, M. sou. i, M. cent. Portalegre i, M. s.e. Porto Santo c, I. c, M. no. Beja i, M. s.e. Ang.Hero´ısmo c, I. i:inland, c:coast, M.: mainland, I.:islands, cent.:central, n.e.:northeast, no.:northern, s.e.:southeast, sou.:southern
776
N. Shimizu
Fig. 3 Minimum/maximum of average temperatures in Lisbon[9]
Fig. 4 Transformed data as interval-valued functional data in Lisbon
5.2 Hierarchical Clustering with Complete Linkage We apply interval-valued functional hierarchical clustering to the data, and select complete linkage as the linkage criterion that is often used in hierarchical clustering. The results of using each distance/dissimilarity criterion are in Fig. 5. For these results, (i) Islands part (= Porto Santo and Ang.Hero´ısmo) and (ii) Northeast inland part of mainland (= Braganc¸a) are shown as characteristic clusters by using all the distance/dissimilarity criterion. (i) has smaller difference between the minimum and the maximum of the mean temperature in each month than in other cities. (ii) has large difference between the minimum and the maximum of the mean temperature in each month, and it is very cold in winter. Lisbon and V.R.S.Anton´ıo, which are located in southern coast part of mainland, have small variation of the difference between the minimum and the maximum of the mean temperature in each month. The structures of the dendrogram of 6 cities except (i) and (ii) are different depending on the distance/dissimilarity criterion. As for the case using functional Gowda-Diday dissimilarity, the data to which lower functions are similar tends to
Hierarchical Clustering for Interval-Valued Functional Data
(a) Functional Hausdorff Distance
777
(b) Functional Euclidean Hausdorff Distance
(c) Functional Gowda-Diday Dissimilarity (d) Functional Ichino-Yaguchi Dissimilarity (γ = 0.5) Fig. 5 The dendrograms by interval-valued functional hierarchical clustering with complete linkage and each distance/dissimilarity criterion
be aggregated earlier than the case using other criterion. And also, as for the case using functional Hausdorff distance or functional Ichino-Yaguchi dissimilarity, we tend to obtain the same or similar result.
6 Concluding Remarks In this paper, we proposed hierarchical clustering for interval-valued functional data which has a common argument with the same domain.
778
N. Shimizu
Clustering methods for interval-valued data with various distance/dissimilarity criteria are proposed in SDA. We showed clustering method for interval-valued functional data as the extension for functional data and for interval-valued data, and applied hierarchical clustering with complete linkage and various distance/ dissimilarity criteria to temperature data as interval-valued functional data. We considered the results of applying our methods to real data, and described the property of data and some characteristics of the distance criterion. We will propose other statistical methods (including non-hierarchical clustering and multidimensional scaling etc.) for interval-valued functional data, and consider the results of applying new methods to real data.
References 1. Billard, L., Diday, E.: Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Chichester (2007) 2. Bock, H.-H., Diday, E.: Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, Heidelberg (2000) 3. Diday, E.: The symbolic approach in clustering and related methods of Data Analysis, Classification and Related Methods of Data Analysis. In: Bock, H. (ed.) Proc. IFCS, Aachen, Germany, North-Holland, Amsterdam (1988) 4. Diday, E., Noirhomme-Fraiture, M. (eds.): Symbolic Data Analysis and the SODAS Software. Wiley, Chichester (2008) 5. Gowda, K.C., Diday, E.: Symbolic clustering using a new dissimilarity measure. Pattern Recognition 24(6), 567–578 (1991) 6. Ichino, M., Yaguchi, H.: Generalized Minkowski metrics for mixed feature type data analysis. IEEE Transactions on Systems, Man and Cybernetics 24(4), 698–708 (1994) 7. Ikeda, T., Komiya, Y., Minami, H., Mizuta, M.: An Extension of Functional PCA to Interval-Valued Functional Data. In: Proceedings of IASC 2008, the Joint Meeting of 4th World Conference of the IASC and 6th Conference of the Asian Regional Section of the IASC on Computational Statistics & Data Analysis, pp. 640–647 (2008) 8. Ikeda, T., Komiya, Y., Minami, H., Mizuta, M.: Derivation of interval-valued functional data and its application (in Japanese). In: Proceedings of the 23rd Symposium, Japanese Society of Computational Statistics, pp. 11–14 (2009) 9. Instituto de Meteorologia, IP Portugal - Climate Normal, http://www.meteo.pt/en/oclima/normais/ 10. Mizuta, M.: Clustering methods for functional data. In: Proceedings in Computational Statistics, pp. 1503–1510. Physica-Verlag, A Springer Company (2004) 11. Ramsay, G.O.: When the data are functions. Psychometrika 47, 379–396 (1982) 12. Ramsay, G.O., Silverman, B.W.: Functional Data Analysis. Springer, Heidelberg (1997) 13. Ramsay, G.O., Silverman, B.W.: Functional Data Analysis, 2nd edn. Springer, Heidelberg (2005)
Multidimensional Scaling with Hyperbox Model for Percentile Dissimilarities Yoshikazu Terada and Hiroshi Yadohisa
Abstract. In symbolic data analysis, we can consider more complex dissimilarity data existing between objects. Dissimilarity between objects may be described in various ways, including using a single value, an interval, a histogram, and a distribution. Analysis of such data may be carried out using multidimensional scaling. For histogram-valued dissimilarity data, Groenen and Winsberg proposed the “HistScal” algorithm, which focuses on the percentiles of histogram dissimilarities [3]. For the I-Scal algorithm [2], the solution is guaranteed to improved after every iteration. However, for the Hist-Scal algorithm, this improvement cannot be guaranteed since iterative majorization is used in combination with the weighted monotone regression. In this paper, we propose a new improved algorithm applicable to percentile dissimilarities, called “the hyperbox model Percen-Scal” algorithm. Since the algorithm is based on iterative majorization, an improvement in the solution is guaranteed after every iteration of the algorithm. We apply both the hyperbox model Percen-Scal and the Hist-Scal algorithms to artificial data and compare the results obtained from both methods. We then discuss the validity of the results produced using the hyperbox model Percen-Scal algorithm. Keywords: Symbolic data analysis, Distribution-valued data, Iterative majorization.
1 Introduction In symbolic data analysis, we deal with higher level objects, referred to concepts, and more complex dissimilarity data that exist between objects are introduced. The dissimilarity between objects can be described by not only a single value, but also an Yoshikazu Terada Division of Mathematical Science, Graduate School of Engineering Science, Osaka University, Japan e-mail: [email protected] Hiroshi Yadohisa Department of Culture and Information Science, Doshisha University, Japan e-mail: [email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 779–788. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
780
Y. Terada and H. Yadohisa
interval, a histogram, a distribution and, so on. Analysis of such data may be carried out using multidimensional scaling (MDS). For interval-valued dissimilarity data, Denœux and Masson proposed the hypersphere and hyperbox models of MDS, as well as the algorithm for solving them using a gradient descent method [1]. Groenen et al. proposed an improved algorithm based on iterative majorization, called the “IScal,” for the hyperbox model [2]. Terada and Yadohisa also proposed the I-Scal algorithm for the hypersphere model [4]. For histogram-valued dissimilarity data, Groenen and Winsberg proposed the “Hist-Scal” algorithm, which is an extension of the hyperbox model I-Scal, in that it focuses on quantiles of histogram dissimilarities [3]. In the “Hist-Scal” algorithm, we assume that each object is represented by the nested hyperboxes. For the I-Scal algorithm, the solution is guaranteed to improve after each iteration. However, for the Hist-Scal algorithm, this guarantee cannot be made since iterative majorization is used in combination with the weighted monotone regression in each iteration. In this paper, we propose an improved algorithm, called “the hyperbox model Percen-Scal” algorithm, which is suitable for percentile dissimilarities. This algorithm guarantees that there will be an improvement in the solution after each iteration of the algorithm. Moreover, we apply both the hyperbox model Percen-Scal and the Hist-Scal algorithms to artificial data and compare their result. Then, we show the validity of the hyperbox model Percen-Scal algorithm.
2 Percentile Model and Stress Function In this paper, we assume that the percentile dissimilarity data ξ = (ξ0 , ξ1 , . . . , ξK ) (with q0 %, q1 %, . . . , qK % quantiles), for a distribution of dissimilarities between two objects are given by ⎤ ⎡ L U ] [ξ L , ξ U ] · · · [ξ L , ξ U ] [ξ11k , ξ11k 12k 1nk 12k 1nk ⎢ [ξ L , ξ U ] [ξ L , ξ U ] · · · [ξ L , ξ U ] ⎥ 2nk 2nk ⎥ ⎢ 21k 21k 22k 22k (1) ξk = ⎢ ⎥ (k = 0, 1, . . . , K), .. .. .. .. ⎦ ⎣ . . . . L , ξ U ] [ξ L , ξ U ] · · · [ξ L , ξ U ] [ξn1k n2k nnk n1k n2k nnk where 0 ≤ [ξiLjk , ξiUjk ] (i, j = 1, . . . , n; k = 0, 1, . . . , K) and [ξiLjk , ξiUjk ] ⊆ [ξiLjk∗ , ξiUjk∗ ] for qk > qk∗ . In Hist-Scal, the self dissimilarities are not given. Here, we assume that the self dissimilarities are given since we would like to deal with the internal variation of objects.
2.1 Percentile Model and Dissimilarities In the percentile model, we assume that each object is described by using nested hyperboxes that have the same center point in R p . Fig. 1 illustrates the relationships in terms of a dissimilarity between two objects. In Fig. 1, a object is represented by a set of the nested hyperboxes, where diLjk and diUjk are the lower and upper distances between the kth hyperboxes of objects i
Multidimensional Scaling with Hyperbox Model for Percentile Dissimilarities
781
dijC object i
2ri2k 2ri1k
object j
dijLk
U djjk
2rj2k
2rj1k
dijUk
Fig. 1 Relationships in terms of dissimilarity between objects i and j
and j respectively. For the two kth hyperboxes of objects i and j, if the center points and two sides are given, we can calculate the lower and upper distances diLjk and diUjk between these hyperboxes using the following formula [2]: p
2 L di jk = ∑ max 0, xis − x js − (risk + r jsk ) and diUjk =
s=1 p
∑
xis − x js + (risk + r jsk ) 2 .
(2)
s=1
where xi = (xi1 , . . . , xi p ) is a center point of the nested hyperboxes with object i, and risk is one half of the length of the sth side with the kth hyperbox of object i. Let X = (xis )n×p be a matrix whose rows represent the coordinates of the center points of nested hyperboxes with each object and Rk = (risk )n×p (k = 0, . . . , K) be a matrix whose rows represent one half of the lengths of the sides of the kth hyperboxes with each objects. In this paper, we describe the lower and upper distances diLjk and diUjk using diLj (X , Rk ) and diUj (X , Rk ), since the lower and upper distances diLjk and diUjk are considered to be functions of X and Rk (k = 0, · · · , K). We can rewrite Rk (k = 0, · · · , K) using R0 and the non-negative matrix Ak (k = 1, . . . , K) since 0 < ris0 ≤ · · · ≤ risK for Rk (k = 0, · · · , K). Let Ak (k = 1, . . . , K) be a non-negative matrix satisfying Rk = R0 + ∑kl=1 Al (k = 1, . . . , K). We can consider lower and upper distances diLjk and diUjk to be functions of X , R0 , and Ak (k = 1, · · · , K). We therefore describe them using diLj (X , R0 , A1 , . . . , Ak ) and dU (X , R0 , A1 , . . . , Ak ). ij
782
Y. Terada and H. Yadohisa
2.2 Stress Function In the percentile model, the objective of MDS for percentile dissimilarities is to approximate ξiLjk and ξiUjk using the lower and upper distances diLjk and diUjk between two kth hyperboxes of two set of nested hyperboxes in the least-square criteria. In other words, we estimate values for X and Rk (k = 0, . . . , K) that minimize the following stress function [3]: 2 σHist (X, R0 , . . . , RK )
2 K n n
2 K n n = ∑ ∑ ∑ wi j ξiLjk − diLj (X, Rk ) + ∑ ∑ ∑ wi j ξiUjk − diUj (X, Rk ) k=0 i=1 j=i
k=0 i=1 j=i
(subject to 0 < ris0 ≤ · · · ≤ risK ),
(3)
where wi j ≥ 0 is a given weight. The constraint of this function (0 < ris0 ≤ · · · ≤ risK ) makes it difficult to op2 timize σHist using iterative majorization. In the Hist-Scal algorithm, we estimate X, R¯ k (k = 0, . . . , K) using iterative majorization as an unconstrained optimization, and using the weighted monotone regression to R¯ k , we derive a value of Rk (k = 0, . . . , K) which satisfies the constraint. Therefore, with the Hist-Scal algorithm, an improvement in the solution for each iteration cannot be guaranteed. In this paper, in order to resolve the difficult encountered due to the constraint 2 0 < ris0 ≤ · · · ≤ risK , we consider σHist to be a function of X, R0 , and Ak (k = 1, · · · , K). We therefore consider optimization using the following stress function, called Percen-Stress: 2 σpercen (X, R0 , A1 , . . . , AK ) = n
n
∑ ∑ wi j
i=1 j=i
2 K ξiLj0 − diLj (X, R0 ) + ∑
n
n
∑ ∑ wi j
2 L ξi jk − diLj (X, R0 , A1 , . . . , Ak )
k=1 i=1 j=i
2 K n n 2 U U w ξ − d ( X, R ) + ∑ ∑ ∑ wi j ξiUjk − diUj (X, R0 , A1 , . . . , Ak ) 0 ∑ ∑ i j i j0 i j n
n
i=1 j=i
k=1 i=1 j=i
(subject to aisk > 0; k = 1, . . . , K).
(4)
The constraint in Percen-Stress requires only that each element of Ak (k = 1, . . . , K) is non-negative. This constrained optimization can therefore be achieved using only iterative majorization. That is, we can develop an improved algorithm that guarantees that the solution has improved after each iteration.
3 Majorizing Function of Stress Function First, we introduce a majorizing function and the framework of iterative majorization. A majorizing function of f : Rn×n → R is defined as a function g : Rn×n × Rn×n → R that satisfies the following conditions:
Multidimensional Scaling with Hyperbox Model for Percentile Dissimilarities
783
i) f (X0 ) = g(X0 , X0 ) for X0 ∈ Rn×n , ii) f (X) ≤ g(X, X0 ) (X ∈ Rn×n ) for X0 ∈ Rn×n . This majorization function has a useful characteristic: for a given X0 ∈ Rn×n and X˜ = arg min g(X, X0 ), ˜ = g(X, ˜ X) ˜ ≤ g(X, ˜ X ) ≤ g(X , X ) = f (X ). f (X) 0 0 0 0
(5)
The framework of iterative majorization allows an optimum solution to be achieved by minimizing the majorizing function instead of the original function in each iteration. Iterative majorization comprises the following steps: and to the convergence criterion; k ← 0. Set X0 (ε > 0) to an initial matrix
If k = 0 or f (Xk ) − f (Xk−1 ) ≥ ε then go to Step 3, else stop. k ← k + 1; compute X˜ = arg min g(X, Xk−1 ). ˜ go to Step 2. X ← X;
Step 1 Step 2 Step 3 Step 4
k
From Eq. (5), it is guaranteed that the solution Xk improves after each iteration. The majorization function for iterative majorization should be a linear or quadratic function for which the solution can more easily be found. For the majorization algorithm for Percen-Stress, we derive a majorizing func2 tion. We can expand Percen-Stress σpercen to 2 σpercen (X, R0 , A1 , . . . , AK ) = K
n
n
∑ ∑ ∑ wi j (ξiLjk
k=0 i=1 j=i n n
+ ∑ ∑ wi j
i=1 j=i
2
2 n n n n 2 + ξiUjk ) + ∑ ∑ wi j diLj (X, R0 ) − 2 ∑ ∑ wi j ξiLj1 diLj (X, R0 ) i=1 j=i
2 n n diUj (X, R0 ) − 2 ∑ ∑ wi j ξiUj1 diUj (X, R0 )
i=1 j=i
i=1 j=i
2 L L L + ∑ ∑ ∑ wi j di j (X, R0 , A1 , . . . , Ak ) − 2ξi jk di j (X, R0 , A1 , . . . , Ak ) K
n
n
k=1 i=1 j=i
K
n
n
+ ∑ ∑ ∑ wi j
2 diUj (X, R0 , A1 , . . . , Ak ) − 2ξiUjkdiUj (X, R0 , A1 , . . . , Ak ) . (6)
k=1 i=1 j=i
In Eq. (6), wi j , ξiLjk , and ξiUjk are fixed. We therefore derive inequalities for each 2
2
term without (ξiLjk + ξiUjk ) and obtain the following inequality function of Percen2 : Stress σpercen
784
Y. Terada and H. Yadohisa 2 σpercen (X, R0 , A1 , . . . , AK )
≤
n
n
K
p n−1
∑ ∑ ∑ wi j (ξiLjk + ξiUjk ) + ∑ 2
2
k=0 i=1 j=i
K
∑ ∑ ∑ (αi jsk + αi jk )(xis − x js)2 (1)
(2)
s=1 i=1 j=i+1 k=0
p n−1
−2 ∑
n
n
K
∗(1)
∑ ∑ ∑ (αi jsk
∗(2)
∗(3)
+ αi jsk + αi jsk )(xis − x js )(yis − y js )
s=1 i=1 j=i+1 k=0
n n K (1) (2) (3) 2 (1) (2) (3) + (β jisk + β jisk + β jisk )r2js0 + ∑ ∑ ∑ ∑ (βi jsk + βi jsk + βi jsk )ris0 p
s=1 i=1 j=i k=0 p
n
K
n
−2 ∑ ∑ ∑ ∑ (βi jsk + βi jsk )(ris0 + r js0 ) ∗(1)
∗(2)
s=1 i=1 j=i k=0
p
n
n
K
+∑∑∑∑
k
∑
(1) (2) (3) (1) (2) (3) (γi jskl + γi jskl + γi jskl )a2isl + (γ jiskl + γ jiskl + γ jiskl )a2jsl
s=1 i=1 j=i k=1 l=1 p
n
n
K
−2 ∑ ∑ ∑ ∑
k
∑ (γi jsk + γi jskl )(aisl + a jsl ) ∗(1)
∗(2)
s=1 i=1 j=i k=1 l=1
p
n
n
K
+ ∑ ∑ ∑ ∑ (δi jsk + δi jsk ), (1)
(2)
(7)
s=1 i=1 j=i k=0
where (1)
αi js0 = wi j
|yis − y js | + (qis0 + q js0 ) (2) , αi j0 = 2wi j , |yis − y js |
|yis − y js | + (qis0 + q js0 ) + ∑kl=1 (bisl + bisl ) (2) (1) , αi jk = (k + 2)wi j , αi jsk = wi j |yis − y js | ⎧ ⎨wi j |yis −y js |+(qis0 +q js0 ) dU (Y, Q0 ) > 0 and |yis − y js | > 0 ij ∗(1) |yis −y js |diUj (Y, Q0 ) αi js0 = , ⎩0 dU (Y, Q ) = 0 or |y − y | = 0 ij
0
is
js
⎧ |yis −y js |+(q +q ) is0 js0 ⎪ |yis − y js | ≥ (qis0 + q js0 ) and |yis − y js | > 0 ⎨wi j |yis −y js | ∗(2) αi js0 = 2wi j |yis − y js | < (qis0 + q js0 ) and |yis − y js | > 0 , ⎪ ⎩ 0 |yis − y js | = 0 ⎧ wi j ξiLj0 max{0, |yis −y js |−(qis0 +q js0 )} |yis −y js |≥(qis0 +q js0 ), ⎪ ⎨ ∗(3) |yis −y js |diLj (Y, Q0 ) diLj (Y, Q0 )>0 and |yis −y js |>0 αi js0 = , |yis −y js |<(qis0 +q js0 ), ⎪ ⎩0 diLj (Y, Q0 )=0 or |yis −y js |=0 ⎧ k U ⎪ ⎨wi j |yis −y js |+(qis0U+q js0 )+∑l=1 (bisl +b jsl ) di j (Y, Q0 , B1 , ..., Bk )>0 and |yis −y js |di j (Y, Q0 , B1 , ..., Bk ) |yis −y js |>0 ∗(1) αi jsk = , diUj (Y, Q0 , B1 , ..., Bk )=0 or ⎪ ⎩0 |y −y |=0 ⎧ (k+1)|y −y |+(q +q )+∑k (b +b ) is js is0 js0 jsl l=1 isl ⎪ ⎪ |yis −y js | ⎨wi j ∗(2) αi jsk = (k + 2)wi j ⎪ ⎪ ⎩ 0
is
js
|yis −y js |≥(qis0 +q js0 )+∑kl=1 (bisl +bisl ) and |yis −y js |>0 |yis −y js |<(qis0 +q js0 )+∑kl=1 (bisl +b jsl ) and |yis −y js |>0 diUj (Y, Q0 ) = 0 or |yis − y js | = 0
,
Multidimensional Scaling with Hyperbox Model for Percentile Dissimilarities
∗(3) αi jsk
=
⎧ w ξ L max{0, |yis −y js |−(qis0 +q js0 )−∑kl=1 (bisl +b jsl )} ⎪ ⎨ i j i jk L ⎪ ⎩0
|yis −y js |≥(qis0 +q js0 )+∑kl=1 (bisl +b jsl ), diLj (Y, Q0 , B1 , ..., Bk )>0 and |yis −y js |>0 |yis −y js |<(qis0 +q js0 )+∑kl=1 (bisl +b jsl ), diLj (Y, Q0 , B1 , ..., Bk )=0, or |yis −y js |=0
|yis −y js |di j (Y, Q0 , B1 , ..., Bk )
785
,
|yis − y js | + (qis0 + q js0 ) (2) (qis0 + q js0 ) (1) βi js0 = wi j , βi js0 = 2wi j , qis0 qis0 ⎧ L ⎨ wi j ξi j0 max{0, |yis −y js |−(qis0 +q js0 )} |y − y | ≥ (q + q ) and d L (Y, Q ) > 0 is js is0 js0 0 (3) ij qis0 diLj (Y, Q0 ) , βi js0 = ⎩0 L |yis − y js | < (qis0 + q js0 ) or di j (Y, Q0 ) = 0 (1)
βi jsk = wi j
|yis − y js | + (qis0 + q js0 ) + ∑kl=1 (bisl + b jsl ) , qis0
2(qis0 + q js0 ) + ∑kl=1 (bisl + bisl ) (2) βi jsk = wi j , qis0 ⎧ w ξ L max{0, |yis −y js |−(qis0 +q js0 )−∑kl=1 (bisl +b jsl )} |yis −y js |≥(qis0 +q js0 )+∑kl=1 (bisl +b jsl ) ⎪ ⎨ i j i jk qis0 diLj (Y, Q0 , B1 , ..., Bk ) and diLj (Y, Q0 , B1 , ..., Bk )>0 (3) βi jsk = , |yis −y js |<(qis0 +q js0 )+∑kl=1 (bisl +b jsl ) ⎪ ⎩0 or diLj (Y, Q0 , B1 , ..., Bk )=0 ⎧ U ⎨ wi j ξi j0 {|yis −y js |+(qis0 +q js0 )} dU (Y, Q ) > 0 0 ∗(1) ij diUj (Y, Q0 ) βi js0 = , ⎩0 diUj (Y, Q0 ) = 0 wi j {|yis − y js | + (qis0 + q js0 )} |yis − y js | ≥ (qis0 + q js0 ) ∗(2) , βi js0 = 2wi j (qis0 + q js0 ) |yis − y js | < (qis0 + q js0 ) ⎧ k U ⎨ wi j ξi jk {|yis −y js |+(qis0 +q js0 )+∑l=1 (bisl +b jsl )} dU (Y, Q , B , . . . , B ) > 0 0 1 ∗(1) k U (Y, Q , B , ..., B ) ij d 0 1 k , βi jsk = ij ⎩0 diUj (Y, Q0 , B1 , . . . , Bk ) = 0 |yis − y js | ≥ (qis0 + q js0 ) + ∑kl=1 (bisl + b jsl ) wi j {|yis − y js | + (qis0 + q js0 )} ∗(2) , βi jsk = k wi j 2(qis0 + q js0 ) + ∑l=1 (bisl + b jsl ) |yis − y js | < (qis0 + q js0 ) + ∑kl=1 (bisl + bisl ) |yis − y js | + (bisl + b jsl ) + ∑l =l (bisl + b jsl ) , bisl (qis0 + q js0 ) + ∑l =l (bisl + b jsl ) (2) γi jskl = wi j , bisl ⎧ w ξ L max{0, |yis −y js |−(qis0 +q js0 )−∑kl=1 (bisl +b jsl )} |yis −y js |≥(qis0 +q js0 )+∑kl=1 (bisl +b jsl ) ⎪ ⎨ i j i jk bisl diLj (Y, Q0 , B1 , ..., Bk ) and diLj (Y, Q0 , B1 , ..., Bk )>0 (3) γi jskl = , |yis −y js |<(qis0 +q js0 )+∑kl=1 (bisl +b jsl ) ⎪ ⎩0 or diLj (Y, Q0 , B1 , ..., Bk )=0 ⎧ U k ⎨ wi j ξi jk {|yis −y js |+(qis0 +q js0 )+∑l=1 (bisl +b jsl )} d L (Y, Q , B , . . . , B ) > 0 ∗(1) 0 1 k ij diUj (Y, Q0 , B1 , ..., Bk ) γi jsk = , ⎩0 L di j (Y, Q0 , B1 , . . . , Bk ) = 0 ⎧ |yis −y js |≥ ⎪ ⎨wi j {|yis − y js | + (qis0 + q js0 )} k ∗(2)
(qis0 +q js0 )+∑l=1 (bisl +b jsl ) , γi jsk = |y −y |< is js ⎪ ⎩wi j (qis0 + q js0 ) + 2(bisl + b jsl ) + ∑l =l (bisl + b jsl ) (qis0 +q js0 )+∑kl=1 (bisl +b jsl ) wi j {|yis − y js | + (qis0 + q js0 )}2 |yis − y js | ≥ (qis0 + q js0 ) (1) , δi js0 = 2wi j {|yis − y js |2 + (qis0 + q js0 )2 } |yis − y js | < (qis0 + q js0 ) ⎧ L ⎨ wi j ξi jk {(qis0 +q js0 ) max{0, |yis −y js |−(qis0 +q js0 )} |y − y | ≥ (q + q ) and d L (Y, Q ) > 0 is js is0 js0 0 (2) ij diLj (Y, Q0 ) , δi js0 = ⎩0 |y − y | < (q + q ) or d L (Y, Q ) = 0 (1)
γi jskl = wi j
is
js
is0
js0
ij
0
786
Y. Terada and H. Yadohisa
(1)
δi jsk =
⎧ k ⎪ ∑ {|yis − y js | + (bisl + b jsl )}2 ⎪wi j {|yis − y js | + (qis0 + q js0 )}2 + l=1 ⎪ ⎨ k wi j {|yis − y js | + (qis0 + q js0 )}2 +
|yis − y js | ≥ (qis0 + q js0 ) +
k
∑ (bisl + b jsl )
l=1
∑ {|yis − y js | + (bisl + b jsl )}2
,
l=1 ⎪ ⎪
⎪ ⎩+ |yis − y js | − (qis0 + q js0 ) − ∑k (bisl + b jsl ) 2
|yis − y js | < (qis0 + q js0 ) +
l=1
⎧ max{0, |yis −y js |−(qis0 +q js0 )−∑kl=1 (bisl +b jsl )} ⎪ ⎪ ⎪ diLj (Y, Q0 , B1 , ..., Bk ) ⎪ ⎨ (2) L δi jsk = ×wi j ξi jk {(qis0 + q js0 ) + ∑kl=1 (bisl + b jsl )} ⎪ ⎪ ⎪ ⎪ ⎩0
k
∑ (bisl + b jsl )
l=1
|yis −y js |≥(qis0 +q js0 )+∑kl=1 (bisl +b jsl ) and diLj (Y, Q0 , B1 , ..., Bk )>0 |yis −y js |<(qis0 +q js0 )+∑kl=1 (bisl +b jsl ) or diLj (Y, Q0 , B1 , ..., Bk )=0
.
We get the following inequality by simplifying Eq. (7): 2 σpercen (X, R, A1 , . . . , AK )
≤
n
K
n
p
∑ ∑ ∑ wi j (ξiLjk + ξiUjk ) + ∑ 2
2
k=0 i=1 j=i
srs − 2rsTg∗s xTs Fsxs − 2xTs Fs∗ys + rsT G
s=1
p n−1 skask − 2aTskh∗sk +∑ ∑ + ∑ aTsk H K
n
K
∑ ∑ (δi jsk + δi jsk ), (1)
(2)
(8)
s=1 i=1 j=i+1 k=0
k=1
∗(1) ∗(2) ∗(1) ∗(2) where g∗s = g∗is n×1 , g∗is = ∑t =i ∑Kk=0 (βi jsk + βi jsk ) + 2 ∑Kk=0 (βiisk + βiisk ), h∗sk = ∗ (2) ∗(1) ∗(2) ∗(1) ∗(2) hisk n×1 , hisk = ∑t =i ∑Kl=k (γi jsl + γi jslk ) + 2 ∑Kl=k (γiisl + γiislk ),
Fs = fi js n×n , s∗ F
=
fi∗js n×n ,
s = gi js G n×n ,
sk = hi jsk H n×n ,
f i js =
(1)
(2)
∑t=i ∑K k=0 (αitsk + αitk ) i = j , (1) (2) i = j − ∑K k=0 (αi jsk + αi jk ) ∗(1)
∗(2)
∗(3)
∑t=i ∑K k=0 (αitsk + αitsk + αitsk ) i = j , ∗(1) ∗(2) ∗(3) i = j − ∑K k=0 (αi jsk + αi jsk + αi jsk ) (1) (2) (3) (1) (2) (3) K ∑t=i ∑K k=0 (βitsk + βitsk + βitsk ) + 2 ∑k=0 (βiisk + βiisk + βiisk ) gi js = 0 (1) (2) (3) (1) (2) (3) K ∑t=i ∑K l=k (γitslk + γitslk + γitslk ) + 2 ∑ l=k (γiislk + γiislk + γiislk ) hi jsk = 0 f i∗js
=
i= j , i = j i= j . i = j
4 Hyperbox Model Percen-Scal Algorithm The right side of Eq. (8) is dependent only on X, R0 and Ak (k = 1, . . . , K, ) and can be minimized by solving the following equations: Fsxs = Fs∗ys ,
srs = g∗s , G
and
skask = h∗sk . H
(9)
Fs∗ys by Fs+ = (11T + Fs )−1 since Fs is singular. If qis0 and Here, we solve Fsxs = aisk are non-negative, Gs and Hsk are non-negative diagonal matrices and g∗s and h∗sk are the corresponding non-negative vectors. We can therefore estimate the values of R0 and Ak (k = 1, . . . , K) that satisfy the constraints qis0 > 0 and aisk > 0.
Multidimensional Scaling with Hyperbox Model for Percentile Dissimilarities
787
Therefore, the hyperbox model Percen-Scal is given by the following: Step 1 Step 2 Step 3 Step 4
Set X0 to a matrix and set R0 and Ak0 (k = 1, . . . , K) to non-negative matrices. t ← 0. Set ε to be a small positive number as the convergence criterion. 2 2 While t = 0 or |σpercen (Xt , R0t , A1t , . . . , AKt ) − σpercen (X(t−1) , R0(t−1) , A1(t−1) , . . . , AK(t−1) )| ≥ ε do 0 ← R0(t−1) ; Bk ← Ak(t−1) . t ← t + 1; Y ← Xt−1 ; Q ∗ s , g∗s , H sk , and h∗ . Compute Fs , Fs , G sk For s = 1 to p do −1 xs ← Fs+ Fs∗ys ; r0s ← G g∗s . s For k = 1 to K do −1h∗ . ask ← H sk sk end for end for Xt ← X; R0t ← R0 ; Akt ← Ak . end while.
5 Numerical Example
10
Frequency
0
0
5
5
10
Frequency
15
15
20
In this section, we demonstrate the utility of the hyperbox model Percen-Scal algorithm. We set convergence criterion ε = 0.0001 and apply the hyperbox model Percen-Scal and Hist-Scal algorithm (modified to deal with self dissimilarities) using a sample of ideal artificial data by 50 random starts and compare their results. Here, the ideal data is the percentile dissimilarities consisting of distances between
0
(a)
10
20
30
(b)
40
50
60
70
(c)
Fig. 2 The ideal artificial data and distributions of stress obtained by 50 random starts and the relation between stress and the number of iterations; (a) the ideal artificial data, (b) distribution of Hist-Stress, (c) distribution of Percen-Stress
788
Y. Terada and H. Yadohisa
two nested rectangles in Fig. 2 (a). In this case, the value of Percen-Stress (HistStress) at the global minimum is 0. Fig. 2 (b) and 2 (c) show the distributions of Hist-Stress and Percen-Stress by 50 random starts, respectively. If the number of objects or percentiles is small, the HistScal algorithm is stable. However, the stability of the Hist-Scal algorithm decreases as the number of objects and percentiles increases. In particular, Fig. 3 (a) shows that in such cases the Hist-Scal algorithm is not stable. On the other hand, Fig. 3 (b) shows that the Percen-Scal algorithm is stable in such cases, and we can obtain at the good solution which is close to the global minimum.
(a)
(b)
Fig. 3 Relation between stress and the number of iterations; (a) relation between Hist-Stress and the number of iterations, and (b) the relation between Percen-Stress and the number of iterations
Acknowledgment I would like to give heartful thanks to anonymous reviewers. This research was partially supported by the collaborative research program 2010, information initiative center, Hokkaido university, Sapporo, Japan.
References 1. Denœux, T., Masson, M.: Multidimensional scaling of interval-valued dissimilarity data. Pattern Recognition Letters 21, 82–92 (2000) 2. Groenen, P.J.F., Winsberg, S., Rodr´ıguez, O., Diday, E.: I-Scal: Multidimensional scaling of interval dissimilarities. Computational Statistics & Data Analysis 51, 360–378 (2006) 3. Groenen, P.J.F., Winsberg, S.: Multidimensional scaling of histogram dissimiralities. In: Batagelj, V., Bock, H.H., Ferligoj, A., Ziberna, A. (eds.) Data Science and Classification, pp. 581–588. Springer, Heidelberg (2006) 4. Terada, Y., Yadohisa, H.: Hypersphere model MDS for interval-valued dissimilarity data. In: Proceedings of the 27th Annual Meeting of the Japanese Classification Society, pp. 35–38 (2010)
Predictive Data Mining Driven Architecture to Guide Car Seat Model Parameter Initialization Sabbir Ahmed, Ziad Kobti, and Robert D. Kent
*
Abstract. Researchers in both government and non government organizations are constantly looking for patterns among drivers that may influence proper use of car seats. Such patterns will help them predict behaviours of drivers that shape their decision in placing a child in the proper car constraint when traveling in an automobile. Previous work on a multi-agent based prototype, with the goal to simulate car seat usage patterns among drivers, has shown good prospects as a tool for researchers. In this work we aim at exploring the parameters that initialize the model. The complexity of the model is driven by a large number of parameters and a wide array of values. Existing data from road surveys are examined using existing data mining tools in order to explore beyond basic statistics what parameters and values can be most relevant for a more realistic model run. The intent is to make the model replicate real world conditions as closely mimicking the survey data as possible. Data mining driven architecture which can dynamically use data collected from various surveys and agencies in real time can significantly improve the quality and accuracy of the agent-model.
1 Introduction Road accidents are one of the most significant health risks all over the world. Car seats are used to protect children from injury during such accidents. Use of car safety seats can significantly reduce injury, yet misuse remains very high even in developed countries like Canada. Many government and non government agencies alike are actively involved in research to investigate how to reduce such misuse in an effort to increase child safety. One approach towards understanding the cause of misuse of car safety seats is to discover patterns among drivers who have higher probability of improper use. For example, does family income play a role or does education have a higher effect? These patterns will help identify the high risk group of drivers and develop the appropriate targeted interventions such as education to effectively reduce the probability of misuse. With this goal in mind health Sabbir Ahmed · Ziad Kobti · Robert D. Kent School of Computer Science, University of Windsor Windsor, ON, Canada, N9B-3P4 e-mail: {ahmedp,kobti,rkent}@uwindsor.ca *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 789–797. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
790
S. Ahmed, Z. Kobti, and R.D. Kent
care researchers and computer scientists at the University of Windsor collaborated to produce a multi-agent model for child vehicle safety injury prevention [1] (from here on referred as the “Car Seat Model”). The computer simulation presents the user with a set of parameters in order to define and control various characteristics and behaviours. This however is tedious manual process that requires the researcher prior knowledge of the case study being simulated. For instance, to simulate a particular population group one would need to carry out targeted surveys and collect useful data that describes the group and subsequently build a set of corresponding parameter values to initialize the Car Seat Model. With the advancement of data collection and survey methods enabling live data generation from the field, an ambitious objective is to enable the rapid automation of simulation parameters in an effort to produce a scalable model constantly seeded with real values. An ambitious objective is to enable the model to use new survey data dynamically as they become available. That is if new survey data reflect new patterns in the usage of car seat or if they introduce new parameters (e.g. education level of drivers), the Car Seat Model should be able to make use of this new knowledge and simulate it accordingly. The simulation itself is used as a decision support tool for life agent analysis, outcome prediction and risk estimation. Furthermore, such agent based model needs to validate agent’s behaviour with more up to date real life data, such as field survey data. In this paper we propose a data mining driven architecture which would provide the Car Seat Model with an interface to initialize its agents based on real world data, predict agent’s behaviour, and validate simulation results. In the next section we discuss the need of real world data in multi-agent simulations. Next, we present why data mining techniques are a good fit to answer such needs. Then we present the proposed architecture, tools and survey data used to implement the system with a sample proof of concept experiment.
2 Multi Agent Simulation Modeling A complex system includes qualities such as being heterogeneous, dynamic, and often agent based. While standard mathematical and statistical models can be used to analyse data and systems, they fall short when the degree of realism increases along with the complexity of the system. Agent based models become more prevalent when it comes to more realistic and complex systems. The current Car Seat Model seeks to examine driver behaviour for selecting a car seat and properly positioning it in a vehicle at the time of driving. Events such as accidents, driving conditions are a dynamic part of the model challenging the driver (agent). One goal of the simulation is to provide the observer a first person eye view of the world as it unfolds in the simulated world [3, 4]. In most cases these simulations begin with values taken from a uniform random distribution [2]. This can be a poor choice as it may not necessarily reflect the real world and in turn may affect the output. In other cases such as the current Car Seat Model the initial values are based on statistical measures taken from one particular survey. The limitation of
Predictive Data Mining Driven Architecture
791
such approach is that when new set of data become available from new surveys, these parameters need to be recalculated and changed accordingly. Moreover, another key issue inhibiting researchers from fields outside of computer science to accept agent-based modeling systems (ABMS) as a tool for the modeling and simulation of real-life systems is the difficulty, or even lack of developing standard verification and validation methodologies. Verification aims to test whether the implementation of the model is an accurate representation of the abstract model. Hence, the accuracy of transforming a created model into the computer program is tested in the verification phase. Model validation is used to check that the implemented model can achieve the proposed objectives of the simulation experiments. That is, to ensure that the built model is an accurate representation of the modeled phenomena it intends to simulate [8].
2.1 Data Driven Models Much work has been done on agent-based models which use real world data to tackle these issues. A popular example is the simulation model of the extinction of Anasazi civilization where data collected from observed history is used to compare with the simulation result [5]. Furthermore another example is the water demand models of [5, 9], in which data about household location and composition, consumption habits and water management policies are used to steer the model, with good effect when the model is validated against actual water usage patterns. In [2], the author encourages agent based designers to adopt the data-driven trend and merge concepts taken from micro simulation into agent based models by taking the following basic steps: 1. 2.
3. 4. 5.
Collection of data from the social world. Design of the model, which should be guided by some of the empirical data (e.g. equations, generalisations and stylised facts, or qualitative information provided by experts) and by the theory and hypotheses of the model. Initialisation of the model with static data (from surveys and the census). Simulation and output of results. Validation, comparing the output with the collected data. The data used in validation should not be the same as that used in earlier steps, to ensure the independence of the validation tests from the model design.
3 Data Mining Potentials Data mining is the process of extracting patterns from data into knowledge. Such extracted knowledge from data will be very useful for multi-agent models, such as the Car Seat Model to address issues discussed in the previous section. Recently data mining driven architecture for agent-based models was presented by [6] [8].
792
S. Ahmed, Z. Kobti, and R.D. Kent
Here the authors proposed several methods for integrating data mining into agentbased models. There are various types and categories of data mining techniques available. Usually these techniques are adopted from various fields of Computer Science, Artificial Intelligence and Statistics. These techniques can be categorized into two major branches of data mining, namely predictive and descriptive. Predictive data mining is used to make prediction about values of data using known results found from different data. Various predictive techniques are available such as Classification, Regression etc. Descriptive data mining is used to indentify relationships in data. Some of the descriptive data mining techniques are clustering, association rule and others. Due to the nature of our goal for this paper we will focus on predictive data mining using decision tree classification. However, other predictive methods can be explored which we intend to do in our future work in this area.
3.1 Classification Using Decision Tree Classification using Decision Tree is one of the most popular data mining techniques used today. A decision tree is a series of questions systematically arranged so that each question queries an attribute (e.g. Sex of the driver) and branches based on the value of the attribute. At leaves of the tree are placed predictions of the class variable (e.g. Type of Car Seat used) [6]. Among various Decision Tree algorithm C4.5 is the most efficient. It is a descendent of ID3 algorithm and was designed by J Ross Quinlan. The algorithm C4.5 calculates entropy for class attribute, which in general is the measure of the uncertainty associated with the selected attribute. The entropy is calculated using the following formula:
ሺሻ ൌ െ ሺ୧ ሻ ୠ ሺ୧ ሻ ୀଵ
Where p(xi) is the probability mass function of outcome xi. b is the number of possible outcome of the class random variable. n is the total number of attributes. Once the entropy of the class variable is known, Information Gain for each attribute is calculated. Information gain, which is calculated using the below formula, is the expected reduction in entropy caused by partitioning the examples according to the attribute. C4.5 algorithm puts the attribute with highest information gain on top of the tree and recursively build the tree using attribute with next highest information gain value [11]. ݊݅ܽܩሺܵǡ ܣሻ ൌ ݕݎݐ݊ܧሺܵሻ െ
௩ఢ௨௦ሺሻ
ȁܵ௩ ȁ ݕݎݐ݊ܧሺܵ௩ ሻ ȁܵȁ
Predictive Data Mining Driven Architecture
793
Where v is the set of possible values. For instance, if we are calculating information of attribute Gender then the value of v will be 2 for Male and Female. S denotes the entire dataset. Sv denotes the subset of the dataset for which the attribute, such as gender have that value. |.| denotes the size of a dataset (in number of instances).
4 Proposed Architecture Our goal here is to design an architecture which will collect survey data from a database and generate a Decision Tree model on the fly. We also need to provide an application program interface which will be used by the Car Seat Model for initialization, prediction and validation. Based on these requirements our proposed architecture presented in Figure 1.
Fig. 1 The proposed architecture.
794
S. Ahmed, Z. Kobti, and R.D. Kent
The system consists of three modules namely 1.
2.
3.
Data pre-processing module – This module will be used pre process (e.g. data cleansing etc.) the data collected from database. User will be able to use this module to select desired attributes before data mining algorithm is applied to it. This module will provide flexibility to the agent based model to select relevant attributes from the real world data. Also this module will be able to split the data into 2 sets namely the training set and the test set. The training set will be used to generate the decision tree model. And the test set data will be used to initialize the agents in the simulation. Data mining module – The data mining module will use the open source Weka library to generate decision tree model using the training set data generated by pre-processing module. API Module – This module will provide a java based application program interface to the Car Seat Model to initialize agents, predict agents behaviour and validate the simulation. This module actually uses the model generated by the data mining module and return information (such as prediction) to the Car Seat Model.
4.1 Tools Used Following tools will be used to implement the proposed architecture: 1. Weka - The Pre processing module and the data mining module used the open source Weka Library[10]. 2. Java using Netbeans to develop the API, and data mining Module. 3. The database we used here is the SQL Server 2008.
4.2 Data Used To be able to implement the proposed architecture we collected some survey data. The survey was conducted all over Canada. Among many fields that were available in the survey following fields were used to build the initial decision tree model: Location Code {ON, NF, BC, QC, AB, YT, MB} Driver's Sex {male, female} Ethnicity {Asian, SE Asian, Aboriginal, Middle Eastern, African Canadian, Caucasian} Child Age {< 1 year, 1-3 years, 4-8 years, 9-14 years} Car Seat Type {Not restrained (N), Rear-facing infant seat (R), Forward-facing child seat (F), Booster seat (B), Seat belt (S)}
Predictive Data Mining Driven Architecture
795
From the above fields the Car Seat Type field was chosen as the class random variable, meaning the decision tree model would be able to predict type of the Car Seat to be used by the driver given values of the other attributes. After initial cleaning and pre processing of the raw survey data we had some 3222 sets of data which we used to feed into the Data Mining module to generate C45 Decision Tree Model.
4.3 Results A total of 3,222 record sets were available after Data Cleaning and Pre-processing. We used 66% to train the model and rest to test. Out of the 34% test cases 836 instances were classified correctly, which is about 74.78% accuracy. Decision tree that was generated is below:
Fig. 2 The decision tree generated by Weka tool.
Based on the decision tree generated from the survey data (figure 2) we can see that age of the child is at the root of the tree. This means that the type of car seat used by a driver is mostly depends on the age of the child. We also noticed that the model looks into other attributes of drivers, such as ethnicity or sex, for the case where the child age is between 4 to 8 years. For this age range the survey data shows variation in the seat selection between provinces, and, taking Alberta as example we find that ethnicity plays a role. For other ranges, the model predicts the outcome without looking into these attributes. For e.g. age 1-3 years, regardless of the values of the other attributes the model predicts the driver will use front facing car seat (F).
796
S. Ahmed, Z. Kobti, and R.D. Kent
Front facing seat choice for the age between 1 and 3 is mandated by law. From this we can see that the problem of car seat usage is more in cases of child’s age between 4-8 years. Obviously more data and more attributes are required to concretely back such hypothesis. This is why our proposed application allows more data to be fed so that it can keep on learning the real world phenomena. Our application can use the above tree model to predict behaviour of new set of driver. A sample java code is below: { CarSeatC45 newmodel = new CarSeatC45(data.arff) .... AgentDriver D = new AgentDriver(); D.setGender(“Male”); D.setChildAge(“<1 year”); ..... String result = D.classify(D, newmodel); } Sample String output: “R:0.96” The output example above is the prediction, in this case for the Driver object D it predicts the driver will use Rear facing child seat (R) with 96% probability.
5 Conclusions and Future Work In this paper we argue the need of data driven agent based model such as the Car Seat Model. We present some relevant work previously done in this area. We highlight that data mining techniques can used in agent based model to overcome the gap between the real world and the simulation. Then we present a proposed architecture which can be integrated to the Car Seat Model for initialization of agents, prediction of agent’s behaviour and validation of the simulation result. The proposed architecture can be easily used in other agent based models which require real world data to be able simulate accurately. We are currently working on integrating this to the agent based model presented in [1]. A couple of future directions can be suggested here. First direction is to use other data mining algorithms such as regression, naive bias etc. and experiment with the Car Seat Model. Another direction is to apply data mining techniques to the resulting data generated by the Car Seat Model. Then we can compare the model generated by simulation data to real world data.
References [1] Kobti, Z., Snowdon, A.W., Kent, R.D., Rahaman, S.: A multi-agent model prototype for child vehicle safety injury prevention. In: Agent 2005 Conference on Generative Social Processes, Models and Mechanisms, Chicago, Illinois, USA, October 13-15, pp. 271–294 (2005)
Predictive Data Mining Driven Architecture
797
[2] Hassan, S., Pavón, J., Antunes, L., Gilbert, N.: Injecting data into Agent-Based simulation. In: Takadama, K., Deffuant, G., Cioffi-Revilla, C. (eds.) Simulating Interacting Agents and Social Phenomena: The Second World Congress. Springer Series on Agent Based Social Systems, pp. 179–191. Springer, Tokyo (2010) [3] Kobti, Z., Reynolds, Z., Kohler, T.: A multi-agent simulation using cultural algorithms: the effect of culture on the resilience of social systems. In: Proceedings of the IEEE Conference on Evolutionary Computation, CEC 2003, December 8-12, vol. 3, pp. 1988–1995 (2003) [4] Macal, C.M., North, M.J.: Agent-Based Modeling and Simulation: DESKTOP ABMS. In: The Proceedings of the 2007 Winter Simulation Conference. IEEE, Los Alamitos (2007) [5] Dean, J.S., et al.: Understanding Anasazi culture change through agent based modeling. Oxford University Press, Oxford (2000) [6] Remondino, M., Correndo, G.: Data-Mining applied to agent based simulation. In: The Proceedings 19th European Conference on Modelling and Simulation, ECMS (2005) [7] Ramakrishnan, N.: C4.5. In: The Top Ten Algorithms in Data Mining. CRC Press, Boca Raton (2009) [8] Baqueiro, O., Wang, Y.J., et al.: Integrating Data Mining and Agent Based Modeling and Simulation (2006) [9] Galan, J.M., Lopez-Paredes, A., del Olmo, O.: Effect of technological diffusion of water conservation measures in an ABMS model. In: 4th International Workshop on Practical Applications of Agents and Multiagent Systems, pp. 169–180. Universidad de Salamanca, Salamanca (2007) [10] Hall, M., Frank, E., Holmes, G., et al.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009) [11] Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Symbolic Hierarchical Clustering for Visual Analogue Scale Data Kotoe Katayama, Rui Yamaguchi, Seiya Imoto, Hideaki Tokunaga, Yoshihiro Imazu, Keiko Matsuura, Kenji Watanabe, and Satoru Miyano
Abstract. We propose a hierarchical clustering in the framework of Symbolic Data Analysis(SDA). SDA was proposed by Diday at the end of the 1980s and is a new approach for analysing huge and complex data. In SDA, an observation is described by not only numerical values but also “higher-level units”; sets, intervals, distributions, etc. Most SDA works have dealt with only intervals as the descriptions. In this paper, we define “pain distribution” as new type data in SDA and propose a hierarchical clustering for this new type data. Keywords: Visual Analogue Scale, Distribution-Valued Data.
1 Introduction Conventional data analysis usually can handle scalars, vectors and matrices. However, lately, some datasets have grown beyond the framework of conventional data analysis. Most statistical methods do not have sufficient power to analyze these datasets. In this study, we attempted to extract useful information from such datasets. Symbolic data analysis (SDA) proposed by Diday [3] is an approach for analyzing new types of datasets. “Symbolic data” consist of a concept that is described by intervals, distributions, etc. as well as by numerical values. The use of SDA enriches data description, and it can handle highly complex datasets. This implies that complex data can be formally handled in the framework of SDA. However, most SDA works have dealt with only intervals as the descriptions and are very few studies Kotoe Katayama · Rui Yamaguchi · Seiya Imoto · Satoru Miyano Human Genome Center, Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan e-mail: [email protected] Hideaki Tokunaga · Yoshihiro Imazu · Keiko Matsuura · Kenji Watanabe Center for Kampo Medicine, Keio University School of Medicine, 35 Shinano-machi, Shinjuku-ku, Tokyo 160-8582, Japan
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 799–805. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
800
K. Katayama et al.
based on this simple idea. The case that concept is described by intervals is simple, but ignores detailed information in the intervals. We propose distribution-valued data to describe the concept. In this study, we focus on the case in which a concept is described by distribution and develop a new method to analyze this dataset directly using SDA.
2 The Visual Analogue Scale The visual analogue scale (VAS) has developed to allow the measurement of individual’s responses to physical stimuli, such as heat. The VAS is a method that can be readily understood by most people to measure a characteristic or attitude that cannot be directly measured. It was originally used in the field of psychometrics, and nowadays widely used to assess changes in patient health status with treatment. A VAS consists of a line on a page with clearly defined end points, and normally a clearly identified scale between the two end points. For guidance, the phrase “no pain” and “worst imaginable pain” are placed at the both side of the line, respectively. Minimum value 0 of the VAS means “no pain” and maximum value 100 means “worst imaginable pain”. These scales are of most value when looking at change within patients, and are of less value for comparing across a group of patients because patient have a different sense of pain. It could be argued that a VAS is trying to produce interval/ratio data out of subjective values that are at best ordinal. Thus, some caution is required in handling such data. Many researchers prefer to use a method of analysis that is based on the rank ordering of scores rather than their exact values, to avoid reading too much into the precise VAS score.
3 Transform the Visual Analogue Scale to Distribution-Valued Data We transform the VAS to distribution-valued data to compare across a group of patients. VAS varies according to patients, because sense of pain varies a great deal depending on people. Changing VAS score within patients means their sense of pain. If they have big change of VAS score, their expression of sense of pain is rough. On the contrary, if they have small change, their expression is sensitive. We suggest that these sense of pain is described by normal distribution and call it “pain distribution(PD)”. Let VAS score of patient’s first time be x1 and second time be x2 . We define the middle point of x1 and x2 as mean of PD μ , and (μ − x1 )2 = (μ − x2 )2 as variance. We describe PD as N(μ , σ 2 ). In case that the number of VAS score is d, PD is ddimensional normal distribution. In this case, a diagonal matrix is used as a variancecovariance matrix of d-dimensional normal distribution.
Symbolic Hierarchical Clustering for Visual Analogue Scale Data
801
Fig. 1 Transform the Visual Analogue Scale to Distribution-Valued data
4 Hierarchical Clustering for PD Cluster analysis groups data objects only on the bases of information found in the data that describes the objects and their relationships. The goal is that the objects within a group should be similar (or related) to one another and different from the objects in other groups. In this section, we propose a hierarchical clustering for distribution-valued data, especially for PD.
4.1 The Clustering Algorithm We extend the idea of a hierarchical clustering in the framework of conventional data analysis. Let n be the number of PD and K be the number of cluster. <Step1> Begin with K clusters, each containing only a single PD, K = n. Calculate distance between PD. <Step2> Search the minimum distance in K clusters. Let the pair the selected clusters. Combine PDs into a new cluster, It is described by mixture distribution of the member, where mixture weight is equal. Let K be K − 1. If K > 1, go to Step3, otherwise Step4. <Step3> Calculate the distance between new cluster and other cluster, and go back to Step2. <Step4> Draw the dendrogram. Kullback-Leibler divergence is the natural way to define a distance measure between probability distributions [8], but not symmetry. We would like to use the symmetric Kullback-Leibler (symmetric KL) divergence as distance between concepts. The symmetric KL-divergence between two distributions s1 and s2 is
802
K. Katayama et al.
D(s1 (xx), s2 (xx)) = D(s1 (xx)||s2 (xx)) + D(s2 (xx)||s1 (xx)) ∞ ∞ s1 (xx) s2 (xx) = s1 (xx) log dxx + s2 (xx) log dxx, s2 (xx) s1 (xx) −∞ −∞
(1)
where D(s1 ||s2 ) is KL divergence from s1 to s2 and D(s2 ||s1 ) is one from s2 to s1 .
4.2 Distance between PDs In section 4.1, we use symmetric KL-divergence as distance between PDs. Let PDs be d dimensional N( μ i , Σ i ) and N( μ j , Σ j ). Symmetric KL-divergence in Step 1 is D(p(xx|μ i , Σ i ), p(xx|μ j , Σ j )) −1 −1 −1 T = tr(Σ i Σ −1 j ) + tr(Σ j Σ i ) + tr((Σ i + Σ j )(μ i − μ j )(μ i − μ j ) ) − 2d. (2)
Let PDs be d = 1, D(p(x|μi , σi ), p(x|μ j , σ j )) σ 2j σi2 + (μi − μ j )2 1 1 σi2 σ 2j + (μ j − μi )2 = log 2 + + log 2 + − 1. (3) 2 2 σi σ 2j σj σi2 After Step2, we need symmetric KL-divergence between Gaussian mixture distributions. However, it cannot be analytically computed. We can use, instead, MonteCarlo simulations to approximate the symmetric KL-divergence. The drawback of the Monte-Carlo techniques is the extensive computational cost and the slow converges properties. Furthermore, due to the stochastic nature of the Monte-Carlo method, the approximations of the distance could vary in different computations. In this paper, we use unscented transform method proposed by Goldberger, et al[5]. We show approximation of D(s1 ||s2 ) in (1). Let cluster c1 contains d-dimensional (1) x) = ∑M distribution Nd (μ m , Σ (1) m )(m = 1, . . . M). Expression formula of c1 is s1 (x m=1 (1) (1) (1) x ωm p(xx|θ (1) ), where ω is a mixture weight, p(x | θ ) is m-th probability density m m m (1) (1) (1) (1) function of Nd (μ m , Σ (1) m ) and θ m = (μ m , Σ m ). Simmilary, cluster c2 contains (2) (2) d-dimensional distribution Nd (μl , Σl )(l = 1, . . . L). Expression formula of c2 is (2) (2) s2 = ∑Ll=1 ωn p(xx|θ l ). Approximation of KL-divergence from s1 to s2 by using unscented transform method is D(s1 ||s2 ) ≈
2d s1 (oom,k ) 1 M ωm ∑ log , ∑ 2d m=1 k=1 s2 (oom,k )
(4)
Symbolic Hierarchical Clustering for Visual Analogue Scale Data
803
where o m,t are sigma points. They are chose as follows: (1) o m,t = μ (1) + d Σ , m m t (1) o m,t+d = μ (1) − d Σ , m m
(5)
t
(1) (1) such that Σm is t-th column of the matrix square root of Σm . Then, t
o m,t o m,t+d
(1) (1) = d λm,t u m,t (1) (1) = μ (1) − d λm,t u m,t , m
μ (1) m +
(1)
(6)
(1)
where t = 1, . . . , d, μ m is mean vector of m-th normal distribution in s1 , λm,t is t-th (1)
eigenvalue of Σ (1) m and u m,t is t-th eigenvector. If p = 1, the sigma points are simply (1)
(1)
μm ± σm . We can calculate approximation of D(s2 ||s1 ). Substituting these approximations into (1), we obtain the symmetric KL-divergence. We set the divergence as distance between cluster c1 and c2 .
5 An Application to the VAS Data In this section, we apply our proposal method to real VAS data from Keio University School of Medicine. This is masked data and is not be tied to any information that would identify a patient. To compare the traditional method, we apply centroid method to same data.
5.1 Medical Questionnaire in Keio University School of Medicine Center for Kampo Medicine, Keio University School of Medicine, have a questionnaire to patients to help medical decision. The questionnaire includes one set of questions about their subjective symptoms. There are 244 yes-no questions and 118 visual analogue scale questions,for example, ”How do you feel pain with urination?”. Patients answer these questions every time when they come to Keio University. Doctors can understand patients’ fluctuate in severity.
804
K. Katayama et al.
5.2 Data Description and Result For our analysis, we deal with a question which ask about how patient feel cold: ”Do you feel cold in your left leg?”. The data contain 435 patients’ first and second VAS value. We transform this data set to PD. Next table show extracts taken from the original data and their translation.
Table 1 VAS value and PD Patient ID
first VAS value
Second Vas Value N(μ , σ 2 )
1 2 .. .
100 0 .. .
78 50 .. .
42
5
435
N(89, 121) N(25, 625) .. . N(23.5, 342.25)
The result of our simulation show in figure2. Vertical axis of this dendrogram means distance between PDs. There seem to be three large cluster, A, B and C. The PDs of cluster A have large variance. The member of cluster B has small variance. The member of cluster C has small variance and large mean. The level that patients’ expression of sense of pain appears in features of clusters.
Fig. 2 Dendrogram for PDs
Symbolic Hierarchical Clustering for Visual Analogue Scale Data
805
The result of centroid method show in figure3.
Fig. 3 Dendrogram of Traditional Method
6 Concluding Remarks In this paper, we defined PD that is from transformation of the VAS to DistributionValued data. We also proposed hierarchical clustering method for it. Comparing across a group of patients by using the VAS is difficult, but our method can do it. Through the simulation, we verified our model. In the future, we will define multidimensional PD and apply our clustering method.
References 1. Billard, L., Diday, E.: Symbolic Data Analysis. Wiley, NewYork (2006) 2. Bock, H.-H., Diday, E.: Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, Berlin (2000) 3. Diday, E.: The symbolic approach in clustering and related methods of Data Analysis, Classification and Related Methods of Data Analysis. In: Bock, H. (ed.) Proc. IFCS, Aachen, Germany. North-Holland, Amsterdam (1988) 4. Diday, E.: The symbolic approach in clustering and related methods of Data Analysis. In: Bock, H. (ed.) Classification and Related methods Of Data Analysis, pp. 673–684. North-Holland, Amsterdam (1988) 5. Goldberger, J., Gordon, S., Greenspan, H.: An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures. In: Proceedings of CVPR, pp. 487–494 (2006) 6. Gowda, K.C., Diday, E.: Symbolic clustering using a new dissimilarity measure. Pattern Recognition 24(6), 567–578 (1991) 7. Katayama, K., Suzukawa, A., Minami, H., Mizuta, M.: Linearly Restricted Principal Components in k Groups. In: Electronic Proceedings of Knowledge Extraction and Modeling, Villa Orlandi, Island of Capri, Italy (2006) 8. Kullback, S.: Information theory and statistics. Dover Publications, New York (1968)
Part IV Miscellanea
Acquisition of User’s Learning Styles Using Log Mining Analysis through Web Usage Mining Process Sucheta V. Kolekar, Sriram G. Sanjeevi, and D.S. Bormane
*
Abstract. Web Usage Mining is a broad area of Web Mining which is associated with the Patterns extraction from logging information produced by web server. Web log mining is substantially the important part of Web Usage Mining (WUM) algorithm which involves transformation and interpretation of the logging information to predict the patterns as per different learning styles. Ultimately these patterns are useful to classify various defined profiles. To provide personalized learning environment to the user with respect to Adaptive User Interface, Web Usage Mining is very essential and useful step to implement. In this paper we build the module of E-learning architecture based on Web Usage Mining to assess the User’s behavior through web log analysis. Keywords: E-learning, Log Mining Analysis, Adaptive Learning styles, Web Usage Mining.
1 Introduction Typically e-learning is Web based educational system which provides the same resources to all learners even though different learners need different information according to their level of knowledge, ways of learning style and preferences. Content sequencing of any course is a technology originated in the area of Intelligent / Adaptive Learning System with the basic aim to provide end user/student with the most suitable sequence of knowledge content to learn, and sequence of Sucheta V. Kolekar Research Scholar National Institute of Technology, Warangal, A.P., India e-mail: [email protected] *
Sriram G. Sanjeevi National Institute of Technology, Warangal, A.P., India e-mail: [email protected] D.S. Bormane JSPM’s Rajarshi Shahu College of Engineering, Pune, Maharashtra, India e-mail: [email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 809–819. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
810
S.V. Kolekar, S.G. Sanjeevi, and D.S. Bormane
learning tasks (examples, exercise, problems, contents etc) to work with. To implement the Adaptive personalized learning system, different types of knowledge required which is related to learner’s behavior, learning material and the representation of learning process [3]. Several kind of research is already addressed in the field of personalized e-learning; still there is a requirement to concentrate on the adaptation based on learning styles of the user. [7] In fact there are two basic classes of adaptation need to consider: Adaptive User Interface also called Adaptive Navigation and Adaptive content presentation. Some of the Learning Systems focusing on static modules of contents, which can be confine the learner to gain knowledge initiatively in some degree. But because of the difference between learners in study purposes, abilities and cognizant of knowledge, it need to build the intelligent and individual learning platform for all learners to highly improve their enthusiasm for learning. Ultimately the research emphasizes on following objectives for an Adaptive User Interface with respect to E-learning: • • • • • •
Create personalized environment Acquisition of user preferences Take control of task from the user Adaptive display management Reduce information overflow Provide help on new and complex function
To achieve above objectives it is very essential to introduce web data mining technique. Web usage mining is dealing with the extraction of knowledge from web server log files. It mines the useful behavior to define accurate user profiles for the intelligent adaptive personalized e-learning system The objectives of research in this paper: 1. Capture learning styles of an individual user/learner using log file method. 2. Improve the performance of web services. 3. Prepare web site structure to deal with users on an individual basis. 4. Provide accurate and complete picture of user’s web activities. 5. Generate sufficient data like server side and client side logs to perform meaning full mining tasks in the phase of Pattern Analysis. The paper is organized as follows; in section II basic architecture of Web Usage Mining is discussed with different kinds of applications. Section III talked about related work directly and indirectly existing on this issue. Section IV discussed about the proposed architecture about Web Usage Mining and detail description of steps.
2 Web Usage Mining Web Mining is divided into three important categories as per the part of Web based system are Web Content Mining, Web Structure Mining and Web Usage Mining. Web Content Mining deals with the discovery of useful information from the web contents. Web Structure Mining tries to discover the model of links structure from typical applications which are based on linked web pages.
Acquisition of User’s Learning Styles Using Log Mining Analysis
811
2.1 Basic Architecture of Web Usage Mining in E-Learning The general framework of Web Usage Mining is shown in fig. 1 for e-learning environment [12]. The first basic step of WUM is to collect and manage data related to users. It is called Data Preprocessing which includes Web Server Log files and some other important information like Student’s registration details and learning information. Second step is Pattern Discovery which utilizes some mining algorithms to generate the rules and modules to extract the learning patterns of the users based on learning styles which is recorded in log files. Pattern Analysis is the third step which is mainly converting the rules and modules into important knowledge by analyzing the user’s usage which is ultimately the input to Interface component manager to change the GUI according to user’s interest.
Fig. 1 Basic architecture of Web Usage Mining
2.2 Applications of Web Usage Mining 1. Personalization Service: Personalization for a user is achieved by keeping track of previously accessed pages e.g. individualized profiling for E-Learning. Making adaptive interface on the basis of her/his profile in addition to usage behavior is very attractive and essential feature of e-learning in the field of Education. Web usage mining is an excellent approach for achieving this objective which is described in next section. It will classify the user’s patterns as per the learning styles captured in log records. It can be used to find the learner’s interests and preferences by mining single learner’s browsing information such as[6] visiting pages, visiting frequency, content length of visit, time spent on each visit and preferences so as to provide each learner with the personalized adaptive pages which are accurate for his learning style and to forecast the learning behavior of each learned and to offer personalized education environment. 2. System Improvement: The improvement factor of the system is totally based on User’s Satisfaction. The performance and quality of web site are the important measures of user’s satisfaction. Web usage mining can provide useful
812
S.V. Kolekar, S.G. Sanjeevi, and D.S. Bormane
knowledge and patterns to design of Web Server in a better way so that sever can focus on special features like [5] Page Caching, Network Transmission, Load Balancing, Data Distribution and Web Site Security. 3. Site Modification: The structure and interface of web site as per interest and contents are the key factors to attract learners to learn. Web Usage Mining can provide site improvement as per the mining knowledge and modify the structure as per the learner’s navigation path and feedback. In adaptive environment of web site, structure and interface of a web site changes automatically on the basis of usage patterns discovered from server logs. 4. Business Intelligence: Business Intelligence service is related to customer’s information captured on web based system. In e-learning customers is nothing but learners whose learning behavior can be identified by web mining technique which will be ultimately used to increase the learner’s satisfaction and to improve the business.
3 Related Work and Discussion Up till now many papers have been suggested techniques related to Web Usage Mining and Log Analysis in E-learning environment. Xue Sun and Wei Zhao introduced how to use WUM in e-learning system which can be more intelligent and individual learning system and promote the interests of learners [2]. Shahnaz Nina et al. [3] propose technique of Pattern Discovery for web log records to find out the hidden information or predictive pattern by the data mining and knowledge discovery. Navin Kumar et al. surveyed about data preprocessing activities like Data Cleaning, Data Reduction. Some research is also done on personalized elearning using different agents related to domains. Our work is differs from above mentioned research in various aspects. As the main focus of research is to design an e-learning system with personalized adaptive interface, this paper is primarily focusing on the first step of personalization of users which is based on Web Usage Mining. The proposed framework of elearning system is shown in fig. 2 [10] where we are implementing the Learning Style Acquisition phase using the advanced log analysis method of Web Usage Mining framework.
Fig. 2 Architecture diagram of E-learning System
Acquisition of User’s Learning Styles Using Log Mining Analysis
813
The approach of architecture is as follows: 1. Learning Style Acquisition: In this phase Web Usage Mining technique is used to analyze the log data for identification of learning styles of different users/students. 2. User Classification Manager: The learning repository is the input for User classification manager where Back Propagation Neural Network algorithm of Classification is performed to identify different kind of users based Learning style of Felder and Silverman. 3. Interface Component Manager: After identifying the categories of users the Interface component manager is changing the graphical representation of user interface as per user’s need. 4. Adaptive Content Manager: This phase generates the adaptive contents based on user classification with the help of administrative activities and Elearning content repository.
4 Proposed Approach of Learning Style Acquisition In the field of web based e-learning, we are mainly emphasizing on the above mentioned to application areas: (i) Personalization and (ii) Site Modification (Adaptive User Interface). When users visit the site, they are interested in some course material, so they visit different pages. The e-learning sever log the information based on their visits. Through the log analysis and mining we can get the user’s interest and behavior towards the pages visited. When users log on to the portal, the system will classify the users to different classes based on the previous behavior and generates the personalized page interface by adjusting the contents continuously and timely. The idea of the architecture implementation: 1. Activity Recorder: Authentication of the user on e-learning portal and capturing of client side information through Activity Recorder. 2. Log Information: Capturing of Server side logs and proxy side log to pass through the data pre-processing with the additional information of user. 3. Data Pre-processing: Perform data cleaning, data integration, and data reduction steps to generate useful data for mining. 4. Clustering: Apply Usage clustering method for patterns discovery. The advanced k-means clustering algorithm is used to find out appropriate clusters based on user’s usage. 5. Profile Generation: Generate user’s profiles and content profiles according to clusters. The user’s profiles are used to generate the learning styles and content profiles used to find out the domain interest of the user. 4.1 Steps of Web Usage Mining: (i) Data Collection: The first step in the Web usage mining process consists of gathering the relevant Web data, which will be analyzed to provide useful information about the user’s behavior.
814
S.V. Kolekar, S.G. Sanjeevi, and D.S. Bormane
Types of log files: 1. Server Side: The Extended Log Format (W3C) [7][1], which is supported by Web servers such as Apache and Netscape, and the similar W3SVC format, supported by Microsoft Internet Information Server, include additional information such as the address of the referring URL to this page, i.e., the Web page that brought the visitor to the site, the name and version of the browser used by the visitor and the operating system of the host machine. The Server side logs should contain the information of Web server and cached pages. 2. Client Side: [5] Client side data are the local activities collected from the host that is accessing the Web site using JAVA Wrapper technique. The local activities include actions of user on desktop like save/print the page, back/forward/stop the browser, email link/page, add a bookmark etc. This information is additional and reliable to understand the accurate behavior of the learner.
Fig. 3 Learning Style Acquisition Approach
3. Intermediary Side: Proxy Server Logs: The advantage of using these logs is that they allow the collection of information about users operating behind the proxy server, since they record requests from multiple hosts to multiple Web servers. (ii) Data Preprocessing: The captured log files are not suitable directly for data mining techniques. Files must be gone through the three data pre-processing steps 1. Data Cleansing: Useless information removal e.g. graphical page content [6]. An algorithm for clearing the entries of log information: (i) Removal of picture files associated with request for particular pages: (ii) Remove status of error or failure on different pages. (iii) Automatically generated access records should be identified and removed. (iv) Entries with unsuccessful HTTP status code should be removed. Codes in between 200 to 299 are successful entries.
Acquisition of User’s Learning Styles Using Log Mining Analysis
815
2. Data Integration: Integration of cleaned data is the process of identification and reconstruction of user’s sessions from log files. This phase of prediction is divided into two basic steps: User Identification: The identification of different users based on three ways: (i) By converting IP address to domain name exposed some knowledge. (ii) Cookies help to easily identify the individual visitors, which gives information regarding the usage of website. (iii) Records of cached pages are used to find out the profiles. Session Identification: The need of session identification is to separate out the different sessions of same user by checking threshold value. Usually threshold of each session considered as 30 min. time interval [3]. 3. Data Reduction: Need to reduce the Dimension of data to decrease the complexity. Access log files on the server side and proxy side consist of log information of user’s sessions. These logs include the list of pages that a user has accessed in one single session. The log format of the file is in Extended Log File Format which includes special records. The information in this record is sufficient to obtain session information. The set of URLs of particular pages are forming a session which should satisfy the requirement that the time of elapsed between two consecutive requests is smaller than a given t, which is accepted as 30 minutes threshold value. After preprocessing of log files the following fields are used for research: 1. Users: In e-learning system it refers as a learner who visits the e-learning portal with different learning styles. 2. Page view: A page view can get users by clicking on the page once which can be used to represent as one learning behavior. 3. No. of click streams per session: Click streams are nothing but the user’s page requests which can be considered as a learning sequence. 4. User Session: All sequence of clicks from that user visits from whole website i.e. aggregation behavior of the user. Evaluation of Parameters for Method: 1. 2. 3.
4.
Topics (T) are related to the contents of the web site and are defined by the owner of the portal. Weight (W) defines the importance of the topic based on the actions. Actions (A) is nothing but the clicks of the student on particular content type like text links, video lectures, downloadable link etc. Each action can be defined by the weight as per the importance. E.g. A1=PageRequest and weight WA1=1 or A2=VideoLecture and weight WA2= 4 or A3= Download PDF and weight AW3= 8. Action A1 is default action for any other action. Duration (D) is the time; a student spends on a page which will give us the interest area based upon actions on pages. To calculate the exact duration, there is one problem related to the time a student spends. You cannot predict whether the student is really reading the page or might be sleeping. To solve this problem we are considering the time duration up to timeout of the login.
816
S.V. Kolekar, S.G. Sanjeevi, and D.S. Bormane ∑ ∑
(1)
: : Different pages visited as per Learning Styles: The Web Usage Mining architecture we propose aims to find a mapping of student’s actions on the browsers to learning styles they fit. Based on the formula derived in equation (i) we can find the duration spent on number of pages of E-learning portal. The observed actions are as follows: 1.
Access of contents and Reading Material:
2.
Access of Examples:
3.
Exercises or Quiz
4.
Chat Usage/Forum Usage/Email Usage: Student may use the chat or forum or email service for social communication based on contents. (iii) Clustering: There is a need of clustering the user’s profiles and the contents profiles based on the log information. Clustering is an unsupervised classification method which groups the objects together based on the similarity feature into the same cluster [2]. Clustering can be possible by two ways which are partitioned based and hierarchical based. Partitioned clustering is to separate the records of n objects into k clusters which include most similar objects into n groups; the separation depends on distance measure. In this approach we are using popular k-means clustering method with some advance features to find out most frequent pages access by the user on specific contents of domain.
Fig. 4 Web Log Record
Acquisition of User’s Learning Styles Using Log Mining Analysis
817
Fig. 5 Web page contents and frequency
Fig. 6 Histogram of calculated durations of different pages and contents.
5 Experimental Details and Results Although user registered his or her favorite domains at the time of registration, user’s identity or work will influence the style of content reference of each visit, for example user submits the domain of interest called “database management systems” but to understand the concepts of the database user always prefer the contents based on his/her individual learning style. The styles of user we can get by analyzing the log records. It's sure that the definition of interests is not easy. In order to get satisfied criteria for interests, we have done a great deal of experimentation. We employ the extended data which is a kind of client level Web log data and server level log data that are recorded, since the data contains behaviors of a large numbers of users among the investigation. Based on the mentioned steps of web usage mining we built a system on extracting users' Interests. The experiment on used web log data, collected from www.e-learning.rscoe.in web server (see in Figure 4). This record is used get web pages access the by user. The accessed web pages contain different type of contents for the particular search topic. According to user’s interest they can access different links. We have recorded 600 users log
818
S.V. Kolekar, S.G. Sanjeevi, and D.S. Bormane
records and the frequency of the contents (see in figure 5). After Clustering we can define the 7 types of clusters as per web page contents to decide user profiling and content profiling. The graph (see in figure 6) shows the number of users accessing and spending time on the different pages of portal which are useful input for Neural Network Algorithm to classify different Learning Styles.
6 Conclusion In this paper we proposed approach of Web Usage Mining by surveying data preprocessing activities and different kinds of log records. Web Usage Mining for elearning environment mines the log records to find the user’s usage patterns to provide users with personalized and adaptive session. The next phase of research is to use the effective User’s profiles as input parameters to Neural Network based algorithm to classify the users as per the Felder & Silverman Learning style model. According to classified users the interface components can be changed adaptively on website by using adaptive contents and administrative activities.
References 1. Extended Log File Format, http://www.microsoft.com/technet/prodtechnol/ WindowsServer2003/Library/IIS/ 676400bc-8969-4aa7-851a-9319490a9bbb.mspx?mfr=true 2. Sun, X., Zhao, W.: Design and Implementation of an E-learning Model based on WUM Techniques. In: IEEE International Conference on E-learning, E-business, Enterprises Information Systems and E-government (2009) 3. Nina, S., Rahaman, M.M., Islam, M.K., Ahmed, K.E.U.: Pattern Discovery of Web Usage Mining. In: International Conference on Computer Technology and Development. IEEE Computer Society, Los Alamitos (2009) 4. Oskouei, R.J.: Identifying Student’s Behaviors Related to Internet Usage Patterns. In: T4E 2010. IEEE, Los Alamitos (2010) 978-1-4244-7361-8/ 2010 5. Li, X., Zhang, S.: Application of Web Usage Mining in e-learning Platform. In: International conference on E-business and E-government. IEEE Computer Society, Los Alamitos (2010) 6. Tyagi, N.K., Solanki, A.K., Tyagi, S.: An algorithmic approach to data preprocessing in Web Usage Mining. International Journal of Information Technology and Knowledge Management 2(2), 269–283 (2010) 7. Khiribi, M.K., Jemni, M., Nasraoui, O.: Automatic Recommendations for E-learning Personalization based on Web Usage Mining Techniques and Information Retrieval. In: Eight IEEE International Conference on Advanced Learning Technologies. IEEE Computer Society, Los Alamitos (2009) 8. Chanchary, F., Haque, I., Khalid, M.S.: Web Usage Mining to evaluate the transfer of learning in a Web-based Learning Environment. In: Workshop on Knowledge Discovery and Data Mining. IEEE Computer Society, Los Alamitos (2008)
Acquisition of User’s Learning Styles Using Log Mining Analysis
819
9. Guo, L., Xiang, X., Shi, Y.: Use Web Usage Mining to Assist Online E-learning Assessment. In: Proceeding of the IEEE International Conference on Advanced Learning Technologies (2004) 10. Kolekar, S., Sanjeevi, S.G., Bormane, D.S.: The Framework of an Adaptive User Interface for E-learning Environment using Artificial Neural Network. In: 2010 WORLDCOMP/ International Conference on e-Learning, e-Business, Enterprise Information Systems, and e-Government (EEE 2010), USA, July 12-15 (2010) 11. Markellou, P., Mousourouli, L., Spiros, S., Tsakalidis, A.: Using Semantic Web Mining Technologies for Personalized E-learning Experiences. International Journal in Elearning (2009) 12. Das, R., Turkoglu, I.: Creating mining data from web logs for improving the impressiveness of a website by using path analysis method. Journal of Elsevier, Expert Systems with Applications 36, 6635–6644 (2009), Science Direct
An Agent Based Middleware for Privacy Aware Recommender Systems in IPTV Networks Ahmed M. Elmisery and Dmitri Botvich
*
Abstract. IPTV providers keen to use recommender systems as a serious business tool to gain competitive advantage over competing providers and attract more customers. As indicated in (Elmisery,Botvich 2011b) IPTV recommender systems can utilize data mashup to merge datasets from different movie recommendation sites like Netflix or IMDb to leverage its recommender performance and predication accuracy. Data mashup is a web technology that combines information from multiple sources into a single web application. Mashup applications created a new horizon for different services like real estate services, financial services and recommender systems. On the other hand, mashup applications bring about additional requirement related to the privacy of data used in the mashup process. Moreover, privacy and accuracy are two contradicting goals that need to be adjusted for the spread of these services. In this work, we present our efforts to build an agent based middleware for private data mashup (AMPM) that serve centralized IPTV recommender system (CIRS). AMPM is equipped with two obfuscation mechanisms to preserve privacy of the dataset collected from each provider involved in the mashup application. We present a model to measure privacy breaches. Also, we provide a data mashup scenario in IPTV recommender system and experimentation results. Keywords: privacy, clustering, IPTV networks, recommender system, Multiagent.
1
Introduction
Data mashup (Trojer et al. 2009) is a web technology that combines information from more than one source into a single web application for specific task or request. Data mashup can be used to merge datasets from external movie recommendation sites to leverage the IPTV recommender system from different perspectives like providing more precise predictions and recommendations, improving the reliability toward customers, alleviating cold start problem Ahmed M. Elmisery · Dmitri Botvich Telecommunications Software & Systems Group, Waterford Institute of Technology, Waterford, Ireland J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 821–832. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
822
A.M. Elmisery and D. Botvich
(Gemmis et al. 2009) for new customers, maximizing the precision of target marketing and finally improve the overall performance of the current IPTV network by building up an overlay to increase content availability, prioritization and distribution based on customers’ preferences. Due to that, Providers of the next generation of IPTV services keen to gain accurate recommendations systems for their IPTV networks. However, privacy is an essential concern for mashup application in IPTV recommender system as the generated recommendations obviously require the integration of different customers’ preferences form different providers. This might reveals private customers’ preferences that were not available before the data mashup. Most of movie recommendation sites refrain from joining a mashup process to prevent discloser of the raw preferences of their customers to other sites or to the IPTV recommender system itself. Moreover, divulgence their customer preferences represent infringement against personal privacy laws that might be applied in some countries where these providers operate. In this work, we present our ongoing work to build an agent based middleware for private data mashup (AMPM) that bear in mind privacy in mashup different datasets from movie recommendation sites. We focus on stages related to datasets collection and processing and omit all aspects related to recommendations, mainly because these stages are critical with regard to privacy as they involve different entities. We present a model to measure privacy breaches as an inference problem between the real dataset and obfuscated dataset. We derived a lower bound for the amount of fake items to achieve optimal privacy. The experiments show that our approach reduces privacy breaches. In the rest of this work, we will generically refer to news programs, movies and video on demand contents as Items. In section 2, we describe some related work. In section 3, we introduce the scenario landing our AMPM middleware. In section 4, we give an overview of our proposed obfuscation algorithms used in AMPM. In section 5, we present a model to measure privacy breaches in the obfuscated dataset .In section 6, we present some experiments and results based on our obfuscation algorithms. Finally, Section 7 includes conclusions and future work.
2
Related Works
The majority of the literature addresses the problem of privacy on recommendation systems, Due to it is a potential source of leakage of personally identifiable Information, However a few works have studied the privacy for mashup services. The work in (Trojer et al. 2009) discussed private data mashup, where the authors formalize the problem as achieving a k-anonymity on the integrated data without revealing detailed information about this process or disclosing data from one party to another. Their infrastructure ported to web-based mashup applications. In (Esma 2008) it is proposed a theoretical framework to preserve privacy of customers and the commercial interests of merchants. Their system is a hybrid recommender that uses secure two party protocols and public key infrastructure to achieve the desired goals. In (Polat,Du 2003, 2005) it is suggested another method for privacy preserving on centralized recommender systems by adding uncertainty
An Agent Based Middleware for Privacy Aware Recommender Systems
823
to the data by using a randomized perturbation technique while attempting to make sure that necessary statistical aggregates such as mean don’t get disturbed much. Hence, the server has no knowledge about true values of individual rating for each user. They demonstrate that their method does not decrease essentially the obtained accuracy of the results. But the research work in (Huang et al. 2005; Kargupta et al. 2003) pointed out that randomized perturbation techniques don’t provide levels of privacy as it was previously thought. In (Kargupta et al. 2003) it is Pointed out that arbitrary randomization is not safe because it is easy to breach the privacy protection it offers. They proposed a random matrix based spectral filtering techniques to recover the original data from perturbed data. Their experiments revealed that in many cases random perturbation techniques preserve very little privacy.
3
Data Mashup in IPTV Recommender System Scenario
This work uses the scenario proposed in (Elmisery,Botvich 2011b) that extends previously proposed scenarios in (Elmisery,Botvich 2011d, c, a). The scenario in (Elmisery,Botvich 2011b) proposed a data mashup service (DMS) that integrates datasets from different movies recommendations sites for the recommender system running at the IPTV provider, Figure (1) illustrates this scenario. We assume all the involved parties follow the semi-honest model, which is realistic assumption because they need to accomplish some business goals and increase their revenues. Also we assume all parties involved in the data mashup have similar items set (catalogue) but the customer sets are not identical. The data mashup process based on AMPM can be summarized as follows; The CIRS sends a query to the DMS to start gathering customers’ preferences for some genres to leverage its recommendation service. At DMS side, the coordinator determines based on providers’ cache, the movie recommendation sites could satisfy that query. The coordinator transforms CIRS query into appropriate sub-queries languages suitable for each provider’s database. The manager unit sends each sub-query to the candidate providers to incite them about the data mashup process. The manager agent at the provider side rewrites the sub-query considering the privacy preference for its host. The manager agent produces a modified sub-query for the data can be accessed by DMS. This step allows the manager agent to audit all issued sub-queries and prevent ones that can extract sensitive information. The resulting dataset is obfuscated by the local obfuscation agent (LOA) using clustering based obfuscation (CBO) algorithm. The synchronize agents sends the results to the coordinator which in turn integrates these results then performs global perturbation using random ratings generation (RRG) algorithm on them. Finally, the output from global perturbation process is delivered to CIRS to achieve its business goals. The movie recommendation sites earn revenues from the usage of their databases and in the same time assure that the mashup process does not violating the privacy of their customers. DMS uses anonymous pseudonyms identities to alleviate the providers’ identity problems, as the providers do not want to reveal their ownership of the data to competing providers. Moreover the data mashup service will keen to hide the identities of the participants as a business asset.
824
A.M. Elmisery and D. Botvich
Update Coordinator DMS Catalog Refine Rules
Coordination Plans
Providers pseudonyms
Obfuscated Ratings
Providers cache
index
Manager Unit
Delivery Agent
Synchronize Agent
AMPM
Local Obfuscation Agent Manager Agent Learning Agent Local Catalog
Service Users
Ratings
Metadata
Fig. 1 Data Mashup in IPTV Recommender System
4
Proposed Obfuscation Algorithms
In this section, we give an overview of our proposed algorithms in (Elmisery,Botvich 2011b) that used to preserve the privacy of the datasets with minimum loss of accuracy. The core idea for our obfuscation algorithms is to alleviate the attack model proposed in (Narayanan,Shmatikov 2008), The authors state that if the set of user preferences is fully distinguishable from other users’ preferences in the dataset with respect to some items. This user can be identified if an attacker correlates the published preferences with data from other publicly-accessible databases. We belief, The current anonymity models might fail to provide an overall anonymity as they don’t consider matching items based on their features vector. A key success for any obfuscation algorithms is to create homogeneous groups inside the published datasets based on features vectors in order to make user preferences indistinguishable from other users. A typical feature vector for a real item includes genres, directors, actors and so on. We proposed our obfuscation algorithms as two stages processes taking advantage of group formation done by DMS to accomplish each query of CIRS. We used this group to attain privacy for all participants such that each provider obfuscates its dataset locally then release it to the DMS to perform global perturbation. The first algorithm called CBO, it runs at provider side and it aims to create clusters of fake items that have a similar features vector to each real item preferred by provider’s customers. The algorithm consists of following steps: 1. CBO splits the dataset D into two subsets D and D . Where D is subset if highly rated items in the dataset and D is the rest of items in the dataset. 2. For each real item I D , CBO adds K 1 fake items have similar features vector to that item. The process continues until we get new subset D .
An Agent Based Middleware for Privacy Aware Recommender Systems
825
3. For each real item I D , CBO selects I with probability α or select a fake item I from the candidate fake item set D with probability 1 α. We denote the selected item by I that added as a record to the obfuscated set D . P I
αP I
1
α P I
1
is merged with D to obtain the final obfuscated set D .
4. Finally, D
The second algorithm called RRG, it runs at DMS side and it aims to mitigate the data sparsity problem that can be used to formulate some attacks as shown in (Narayanan,Shmatikov 2008). The main aim for RRG is to pre-process the merged dataset by filling the unrated cells in such a way to improve recommendation accuracy and increase the attained privacy. The algorithm consists of following steps: 1. RRG determines the majority rated items and partially rated items by all providers’ customers in the merged datasets. 2. Then, it selects a percent of the partially unrated items in merged datasets and use the KNN to predicate the values of the unrated cells in that subset. the remaining unrated cells are filled by random values chosen using a distribution reflecting the ratings in the merged datasets.
5
Measuring Privacy Breaches
As indicated in (Evfimievski et al. 2003), the distribution of D may allow the attacker to learn some real items’ preferences. Different D distributions represent different datasets release whereas the attacker exploits their properties to reveal real items’ preferences. The attacker can only learn the subset D inside D . Thus, to prevent any privacy breaches, we aim to minimize the amount of information in D which can be inferred through D . We use mutual information I D ; D as a measure for the notion of privacy breach between D and D . Let I ,I ,…..I be the subset of real items’ preferences then we have: D ; ∑∑
|
,
log
P
log
, |
(2)
; ; is better than Given , if , we can deduce that for privacy protection. Therefore, our aim is to find fake items’ preferences set ; . Based on our previous discussion, we can conclude which minimizes the following conditional probability: |
1 1
| |
. ,
(3)
826
5.1
A.M. Elmisery and D. Botvich
Privacy Guarantees
Enlighten by (Xiao et al. 2009) In this subsection, we measure the lower bound for the amount of fake items that can be added to the real items’ preferences to achieve optimal privacy level. In the rest of this work, we will generically refer to real items’ preferences as real items and added fake items’ preferences as fake items. Our derived bound is a function of α and the total number of all real items | |. In CBO algorithm, given α, the more real items in the dataset, the more fake items can be added. The amount of | | in the obfuscated dataset is decided by the provider, thus we need an upper bound for α that can given by the following theorem: ;
Theorem 1:
0
To prove that there exits
|
|
satisfies theorem 1, let 0 1 | | 1
|
|
|
and
, 4
,
This can be explained, given a real item , the fake item is equally likely to be any other item except . Based on equation (3), we have |
1
| |
1
0
| |
| | | |
|
| |
, |
, (5)
is uniformly distributed in the entire real items space. Replacing Therefore | into equation (2), we get ; 0 . This proofs the correctness of the bound given by theorem 1. In order to increase privacy gain, we need a lower bound for expected number | | | | 1 | |. Based of fake items | | that can be given by | | on the previous analysis we can deduce, it is expensive to achieve optimal privacy level.
5.2
Minimizing Number of Fake Items
The number of fake items has to be minimized to decrease computation complexity and dataset size. In order to achieve that, we need to simply the obfuscation and to be independent. With this new assumption, process by allowing equation (3) will be |
1 1
, ,
6
An Agent Based Middleware for Privacy Aware Recommender Systems
827
Then equation (2) will be ;
,
1
P
1
i
P
1 1
log
P
1
log 1
i
P
log
j,j i
1
log
u, P
∑ u log
n and
∑ n log
1
1 7
To simply this equation assume P ;
log
∑ u n log
(8)
Then our task will be w. r. t
arg min
n
1
i
n
0 where n
, ,….,
9
We show first that is a convex function of . Since is a continuous twice differentiable function over n, we will prove that its second derivate is positive. θ n n n 1 n log u n log u log θu n u n n 10 n n 1 n u n 1 log u n 1 u log n u 1 n 1 1 u log n u n 1 1 u n 11 n 1
u log
n 1
u log
1
1 1 n
log
u u
1 log n
1
n
1
u
1
u
u
log
u
n
log n
1
(12)
Therefore n
1
u
1 n
u
n
n
0
13
828
A.M. Elmisery and D. Botvicch
Using Lagrange multip pliers to solve for equation (9), we use constrains to de∑ ni 1 . Let Λ , Υ fine a function Υ , solving equatio on (13) for critical values of Λ we get: n Υ
1
og u lo
Υ=0 n
log
u
n
(144) 1
u
log n
(155)
Choosing different values v to solve equation (15), we can get the optimal fakke set that is independen nt of and minimize privacy breaches. It is computaationally expensive to solv ve the equation (15) for large number of items, so we onlly select a portion of the dataaset as mentioned before.
6
Experimental Results R
The proposed algorithms are implemented in C++, we used message passing inteerface (MPI) for a distributeed memory implementation of RRG algorithm to mimicc a distributed reliable netwo ork of peers. In order to evaluate the effect of our prooposed algorithms on the mashuped data used for recommendation, we used tw wo a accuracy of results. The experiments presented herre aspects: privacy breach and were conducted using g the movielens dataset provided by grouplenns (Lam,Herlocker 2006), it contains users’ ratings on movies in a discrete scale bet mean average error (MAE) metric proposed in (Heertween 1 and 5. We used the locker et al. 2004) to meeasure the accuracy of results calculated using . Also, as a measure for the notion of privaccy we used mutual information ; through , so the larger value of ; indicates a higheer breach of privacy breach.
Fig. 2 Privacy breach for opttimal and uniform fake sets
An Agent Based Middlewaree for Privacy Aware Recommender Systems
8229
In the first experiment,, we want to measure the relation between the quantity oof real items in the obfuscaated dataset D and privacy breach, we select α in rangge from 1.0 to 5.5, and we in ncreased the number of real items from 100 to 1000. W We select a fake set using un niform distribution as a baseline. As shown in figure (22), our generated fake set red duces the privacy breach and performs much better thaan uniform fake set. As num mber of real items increases, the uniform fake set geets worse as more informatio on is leaked while our optimal fake set does not affeect with that attitude. Our reesults are promising especially when dealing with largge number of real items. In second experiment, we want to measure the relation between the quantity oof fake items in the subset D (which based on value) and the accuracy of reccommendations. We selectt a set of real items from movielens, then we split it intto two subsets D and D .W We obfuscate D as described before with fixed value foor α to obtain D . We appeend D with either items from optimal fake set or unniform fake set. Thereafter, we gradually increase the percentage of real items in D selected from the movieleens dataset from 0.1 to 0.9. For each possible obfuscatioon rate value), we measu ured MAE for the whole obfuscated dataset D . Figure (33) shows MAE values as a function f of the obfuscation rate. The provider selects obbfuscation rate based on th he desired accuracy level required from the recommendation process. We can ded duce that with a higher value for the obfuscation rate a higher accurate recommeendation the CIRS can attain. Adding items from the opptimal fake set has a min nor impact of MAE of the generated recommendationns without having to select higher h value for obfuscation rate.
Fig. 3 MAE of the generated d predications vs. obfuscation rate
However, as we can seee from the graph, MAE rate slightly decreases in roughlly linear manner with high values for obfuscation rate. Especially, the change iin MAE is minor in the range 40% to 60% that confirms our assumption that accuurate recommendations can be provided with less values for the obfuscation ratte.
830
A.M. Elmisery and D. Botvicch
The optimal fake items arre so similar to the real items in the dataset, so the obfuuscation does not significan ntly change the aggregates in the real dataset and it havve small impact of MAE.
Fig. 4 MAE of the generated d predications for ratings groups
In third experiment, wee seek to measure the impact of adding fake items on thhe predications accuracy of the t various types of ratings. We partitioned the movielenns dataset to 5 rating groupss. For each rating group, a set of 1300 rating were sepaarated. CBO was applied using u optimal and uniform fake sets then the ratings werre pre-processed using RRG G. The resulting datasets were submitted to CIRS to peerform predications for diffferent rating group. We repeat the predication experimennt with different values for α and then we compute MAE for these predictionns. E values for generated predications for each rating groupp. Figure (4) shows the MAE We can clearly see the im mpact of adding fake items on the predications of variouus types of ratings is differen nt. For the optimal fake set, the impact is minor as MA AE roughly remains unchangeed regardless of the values of α and .
7
Conclusions and d Future Work
In this work, we presenteed our ongoing work on building an agent based middleware for private data maashup that serve centralized IPTV recommender system m. We gave a brief overview w over the mashup process. Also we gave overview oon our novel algorithms thatt give the provider complete control over the privacy oof their datasets using two stages processes. We presented a model for privaccy breach as an inference prroblem between the real dataset and obfuscated dataseet. We derived a lower boun nd for the amount of fake items to achieve optimal prrivacy. The experiments sh how that our approach reduces privacy breaches. We neeed to investigate weighted feeatures vector methods and its impact in forming homoogeneous groups. Such thaat, the provider not only expresses what kinds of item feaatures can be used to creaate fake items dataset, but also expresses the degree tto which those features sho ould influence the selection of items for the fake item ms
An Agent Based Middleware for Privacy Aware Recommender Systems
831
dataset. We realize that there are many challenges in building a data mashup service. As a result we focused in IPTV recommendation services scenario. This allow us to move forward in building an integrated system while studying issues such as a dynamic data release at a later stage and deferring certain issues such as schema integration, access control ,query execution and auditing to future research agenda. We believe that given the complexities of the problem, we focus on simpler scenarios and a subset of issues at the beginning. Then we will go a head in solving the remaining issues in our future work. Acknowledgments. This work has received support from the Higher Education Authority in Ireland under the PRTLI Cycle 4 programme, in the FutureComm project (Serving Society: Management of Future Communications Networks and Services).
References [1] Elmisery, A., Botvich, D.: Agent Based Middleware for Maintaining User Privacy in IPTV Recommender Services. In: 3rd International ICST Conference on Security and Privacy in Mobile Information and Communication Systems, ICST, Aalborg, Denmark (2011a) [2] Elmisery, A., Botvich, D.: Agent Based Middleware for Private Data Mashup in IPTV Recommender Services. In: 16th IEEE International Workshop on Computer Aided Modeling, Analysis and Design of Communication Links and Networks, Kyoto, Japan. IEEE, Los Alamitos (2011b) [3] Elmisery, A., Botvich, D.: Privacy Aware Recommender Service for IPTV Networks. In: 5th FTRA/IEEE International Conference on Multimedia and Ubiquitous Engineering, Crete, Greece. IEEE, Los Alamitos (2011c) [4] Elmisery, A., Botvich, D.: Private Recommendation Service For IPTV System. In: 12th IFIP/IEEE International Symposium on Integrated Network Management, Dublin, Ireland. IEEE, Los Alamitos (2011d) [5] Esma, A.: Experimental Demonstration of a Hybrid Privacy-Preserving Recommender System. In: Gilles, B., Jose, M.F., Flavien Serge Mani, O., Zbigniew, R. (eds.), pp. 161–170 (2008) [6] Evfimievski, A., Gehrke, J., Srikant, R.: Limiting privacy breaches in privacy preserving data mining. Paper Presented at the Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, San Diego, California [7] Gemmis, M.d., Iaquinta, L., Lops, P., Musto, C., Narducci, F., Semeraro, G.: Preference Learning in Recommender Systems. Paper Presented at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), Slovenia [8] Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22(1), 5–53 (2004), doi: http://doi.acm.org/10.1145/963770.963772 [9] Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. Paper Presented at the Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland (2005)
832
A.M. Elmisery and D. Botvich
[10] Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the Privacy Preserving Properties of Random Data Perturbation Techniques. Paper Presented at the Proceedings of the Third IEEE International Conference on Data Mining [11] Lam, S., Herlocker, J.: MovieLens Data Sets. Department of Computer Science and Engineering at the University of Minnesota (2006), http://www.grouplens.org/node/73 [12] Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Sparse Datasets. Paper Presented at the Proceedings of the 2008 IEEE Symposium on Security and Privacy (2008) [13] Polat, H., Du, W.: Privacy-Preserving Collaborative Filtering Using Randomized Perturbation Techniques. Paper presented at the Proceedings of the Third IEEE International Conference on Data Mining [14] Polat, H., Du, W.: SVD-based collaborative filtering with privacy. Paper Presented at the Proceedings of the 2005 ACM symposium on Applied computing, Santa Fe, New Mexico (2005) [15] Trojer, T., Fung, B.C.M., Hung, P.C.K.: Service-Oriented Architecture for PrivacyPreserving Data Mashup. Paper Presented at the Proceedings of the 2009 IEEE International Conference on Web Services (2009) [16] Xiao, X., Tao, Y., Chen, M.: Optimal random perturbation at multiple privacy levels. Proc. VLDB Endow. 2(1), 814–825 (2009)
An Intelligent Decision Support Model for Product Design Yang-Cheng Lin and Chun-Chun Wei
*
Abstract. This paper presents a consumer-oriented design approach to determining the optimal form design of character toys that optimal matches a given set of product images perceived by consumers. 179 representative character toys and seven design form elements of character toys are identified as samples in an experimental study to illustrate how the consumer-oriented design approach works. The consumer-oriented design approach is based on the process of Kansei Engineering using neural networks (NNs). Nine NN models are built with different momentum, learning rate, and hidden neurons in order to examine how a particular combination of form elements matches the desirable product images. The NN models can be used to construct a form design database for supporting form design decisions in a new character toy design. The result provides useful insights that help product designers best meet consumers’ specific feelings and expectations.
1 Introduction In an intensely competitive market, it is an essential issue that how to design highly-reputable and hot-selling products [1]. Whether consumers choose a product depends largely on their emotional feelings of the product image, which is regarded as something of a black box [7]. Consequently, product designers need to comprehend the consumers’ feelings in order to design successful products [6, 12]. Unfortunately, the way that consumers look at product appearances or images is usually different from the way that product designers look at product elements or characteristics [4]. Moreover, it is shown that “aesthetics” plays an important role in new product development, marketing strategies, and the retail environment [1, 2]. The Apple product (e.g. iPod or iPhone) is a good example to illustrate the visual appearance has become a major factor in consumers’ purchase decisions, called the “aesthetic Yang-Cheng Lin Department of Arts and Design, National Dong Hwa University, Hualien, 970, Taiwan e-mail: [email protected] *
Chun-Chun Wei Department of Industrial Design, National Cheng Kung University, Tainai, 701, Taiwan e-mail: [email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 833–842. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
834
Y.-C. Lin and C.-C. Wei
revolution” [13]. Yamamoto and Lambert [14] also find that aesthetically pleasing properties have a positive influence on consumers’ preferences of a product and their decision processes when they purchase it [3]. In product design, the “visual appearance” (or visual aesthetics) is usually concerned with “product form” [5]. In order to help product designers work out the optimal combination of product forms for matching consumers’ psychological feelings, a consumer-oriented approach, called Kansei Engineering [9, 10], is used to build a design decision support model. Kansei Engineering is as an ergonomic methodology and design strategies for affective design to satisfy consumers’ psychological feelings [9]. The word “Kansei” indicates the consumers’ psychological requirements or emotional feelings of a product. Kansei Engineering has been applied successfully in the product design field to explore the relationship between the consumers’ feelings and product forms [4, 5, 6, 7, 8]. To illustrate how the consumer-oriented approach works, we conduct an experimental study on character toys (dolls or mascots) for their great popularity in eastern Asia (particularly in Taiwan, Japan, and Hong Kong). In subsequent sections, we first present an experimental study with character toys to describe how Kansei Engineering can be used to extract representative samples and form elements as numerical data sets required for analysis. Then we construct and evaluate nine NN models based on the experimental data. Finally we discuss how the NN models can be used as a design decision support model to help product designers meet consumers’ emotional feelings for the new product design.
2 Experimental Procedures of a Consumer-Oriented Approach The experimental procedures involve three main steps: (a) extracting representative experimental samples, (b) conducting morphological analysis of design form elements, and (c) assessing product images.
2.1 Extracting Representative Experimental Samples In the experimental study, we investigate and categorize various character toys with local and aboriginal cultures in Taiwan. We first collect 179 character toys and then classify them based on their similarity degree by a focus group that is formed by six subjects with at least two years’ experience of product design. The focus group eliminates some highly similar samples through discussions. Then the hierarchy cluster analysis is used to extract representative samples of character toys. The 35 representative character toy samples are selected by the cluster tree diagram, including 28 samples as the training set and 7 samples as the test set for building the NN models.
2.2 Conducting Morphological Analysis of Design Form Elements The product form is defined as the collection of design features that the consumers will appreciate [5]. The morphological analysis, concerning the arrangement of
An Intelligent Decision Support Model for Product Design
835
objects and how they conform to create a whole of Gestalt, is used to explore all possible solutions in a complex problem regarding a product form [7]. The morphological analysis is used to extract the product form elements of the 35 representative character toy samples. The five subjects of the focus group are asked to decompose the representative samples into several dominant form elements and form types according to their knowledge and experience. Table 1 shows the result of the morphological analysis, with seven product design elements and 24 associated product form types being identified. The form type indicates the relationship between the outline elements. For example, the “width ratio of head and body (X2)” form element has three form types, including “head body (X21)”, “head=body (X22)”, and “head body (X23)”. A number of design alternatives can be generated by various combinations of morphological elements.
<
>
2.3 Assessing Product Images In Kansei Engineering, emotion assessment experiments are usually performed to elicit the consumers’ psychological feelings about a product using the semantic differential method [9]. Image words are often used to describe the consumers’ feelings of the product in terms of ergonomic and psychological estimation [6]. With the identification of the form elements of the product, the relationship between the consumers’ feelings and the product forms can be established. The procedure of extracting image words includes the followings four steps: Step 1: Collect a large set of image words from magazines, product catalogs, designers, artists, and toy collectors. In this study, we collect 110 image words which are described the character toys, e.g. vivid, attractive, traditional, etc. Step 2: Evaluate collected image words using the semantic differential method. Step 3: Apply factor analysis and cluster analysis according to the result of semantic differential obtained at Step 2. Step 4: Determine three representative image words, including “cute (CU)”, “artistic (AR)”, and “attractive (AT)”, based on the analyses performed at Step 3. To obtain the assessed values for the emotional feelings of 35 representative character toy samples, a 100-point scale (0-100) of the semantic differential method is used. 150 subjects (70 males and 80 females with ages ranging from 15 to 50) are asked to assess the form (look) of character toy samples on a image word scale of 0 to 100, for example, where 100 is most attractive on the AT scale. The last three columns of Table 2 show the three assessed image values of the 35 samples, including 28 samples in the training set and 7 samples in the test set (asterisked). For each selected character toy in Table 2, the first column shows the character toy number and Columns 2-8 show the corresponding type number for each of its seven product form elements, as given in Table 1. Table 2 provides a numerical data source for building neural network models, which can be used to develop a design decision support model for the new product design and development of character toys.
836
Y.-C. Lin and C.-C. Wei
Table 1 The morphological analysis of character toys
Length ratio of head and body (X1) Width ratio of head and body (X2)
Type 1
Type 2
Type 3
≧ 1:1
1:1~1:2
<1:2
>body
head
head=body
head
Type 4
Type 5
<body
Costume style (X3) one-piece
two-pieces
robe
simple
striped
geometric
mixed
tribal
ordinary
flowered
feathered
eyes only
partial features
entire features
cute style
semi-personi fied style
personified style
Costume pattern (X4)
Headdress (X5)
Appearance of facial features (X6)
Overall appearance (X7)
arc-shaped
An Intelligent Decision Support Model for Product Design
837
Table 2 Product image assessments of 35 representative character toy samples No.
X1
X2
X3
X4
X5
X6
X7
CU
AR
AT
1 2 3 4* 5 6 7* 8 9 10 11 12 13 14 15 16* 17 18 19 20 21* 22 23 24 25 26* 27 28 29 30* 31 32* 33 34 35
3 1 2 2 2 2 2 2 2 2 1 1 3 3 3 3 3 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 1
2 1 2 3 2 2 2 3 2 2 1 1 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 3 1
1 1 1 2 1 2 2 2 3 1 2 3 2 1 2 1 2 3 1 2 2 2 2 1 2 2 1 2 1 1 3 1 1 2 1
1 1 3 4 1 4 4 4 2 3 3 2 4 4 2 2 4 2 1 1 2 3 1 3 2 2 2 2 3 3 2 4 4 4 3
4 1 3 2 4 3 5 4 2 2 4 2 4 4 2 3 2 2 2 1 3 2 2 2 2 4 4 1 5 3 2 4 5 2 5
3 2 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 2 3 3 3 3 3 3 1 3 3 1 3 3 3 1 1 2 2
3 1 1 2 1 2 2 2 2 2 1 1 3 3 3 3 3 2 3 3 3 2 3 1 1 2 2 1 2 2 2 1 1 2 1
73 72 70 63 68 65 52 53 63 55 70 57 48 62 54 62 55 71 41 39 41 44 43 54 63 58 57 62 76 68 71 61 72 38 78
61 45 64 52 59 66 66 61 59 63 69 54 69 68 63 74 68 65 52 53 50 74 59 60 52 71 61 56 67 59 60 49 59 48 59
64 43 71 54 55 69 61 60 59 65 67 61 76 78 68 72 66 61 75 63 58 62 74 62 62 68 66 73 74 65 70 51 57 49 79
3 Neural Network Models With the effective learning ability, the neural network model has been widely used to examine the complex and non-linear relationship between input variables and output variables [11].
3.1 Building NN Models In this study, we use the multilayered feed-forward NN trained with the back-propagation learning algorithm, as it is an effective and popular supervised learning algorithm [8].
838
Y.-C. Lin and C.-C. Wei
(a) The Number of Neurons To examine how a particular combination of product form element matches the CU, AR, and AT images, we use three most widely used rules [11] for determining the number of neurons in the single hidden layer respectively, given below: (a) (The number of input neurons + the number of output neurons) / 2 (b) (The number of input neurons + the number of output neurons) (c) (The number of input neurons + the number of output neurons) * 2 The seven design elements in Table 1 are used as the seven input variables for the NN models. If the character toy has a particular design element type, the value of the corresponding input neuron is 1, 2, 3, 4 or 5. The assessed average values of the CU, AR, and AT feelings are used as the output neurons. Table 3 gives the neurons of the NN models, including the input layer, hidden layer, and output layer. Table 3 Neurons, learning rate, and momentum of NN models
Input neuron
Hidden Output Learning Momentum neuron neuron rate
NN-a-S
7
5
3
0.9
0.6
NN-a-C
7
5
3
0.1
0.1
NN-a-N
7
5
3
0.05
0.5
NN-b-S
7
10
3
0.9
0.6
NN-b-C
7
10
3
0.1
0.1
NN-b-N
7
10
3
0.05
0.5
NN-c-S
7
20
3
0.9
0.6
NN-c-C
7
20
3
0.1
0.1
NN-c-N
7
20
3
0.05
0.5
Note Research issue is very simple Research issue is more complicated Research issue is complex and very noisy Research issue is very simple Research issue is more complicated Research issue is complex and very noisy Research issue is very simple Research issue is more complicated Research issue is complex and very noisy
(b) The Momentum and Learning Rate In this study, we conduct a set of analyses by using different learning rate and momentum factors for getting the better structure of the NN model. Three pairs of learning rate and momentum factors are used for different conditions based on the complication of the research problem [11]. For example, if the research issue is very simple, a large learning rate of 0.9 and momentum of 0.6 are recommended. On more complicated problems or predictive networks where output variables are continuous values rather than categories, use a smaller learning rate and momentum, such as 0.1 and 0.1 respectively. In addition, if the data are complex
An Intelligent Decision Support Model for Product Design
839
and very noisy, a learning rate of 0.05 and a momentum of 0.5 are used. To distinguish between the NN-a, NN-b, and NN-c models using different input neurons and hidden neurons, all three models are associated with the learning rate and momentum mentioned above, such as -S, -C, -N, as shown in Table 3. As a result, there are totally nine NN models (3*3) built in this study.
3.2 Training NN Models The learning rule used is Delta-Rule and the transfer function is Sigmoid for all layers. All of input and output variables (neurons) are normalized before training [8]. The experimental samples are separated into two groups: 28 training samples and 7 test samples. The training process of each model is not stopped until the cumulative training epochs are over 25,000. Table 4 shows the corresponding root of mean square (RMS) error of each model. The lowest RMS error of the nine models is asterisked. As shown in Table 4, the RMS error of the nine NN models using the (c) rule is the lowest, as compared to the other two rules (i.e. (a) and (b) rules mentioned in Section 3.1). This result indicates that the more the hidden neurons, the lowest the RMS error. Furthermore, we find that the RMS errors of the NN model using the same rule are almost the same, no matter what the momentum and learning rate are used. In other words, the RMS errors of NN-a-S, NN-a-C, and NN-a-N are slightly different (0.0478, 0.0473, and 0.0481), as well as the NN-b models (NN-b-S, NN-b-C, and NN-b-N), and the NN-c models (NN-c-S, NN-c-C, and NN-c-N). The result shows that the momentum and learning rate have no significant impact for training the NN models in this study. Table 4 RMS errors of the NN models for the training set RMS errors
NN-a NN-b NN-c
-S 0.0478 0.0312 0.0205*
-C 0.0473 0.0406 0.0290
-N 0.0481 0.0426 0.0297
3.3 Testing NN Models To evaluate the performance of the nine NN models in terms of their predictive ability, the 7 samples in the test set were used. Rows 2-4 of Table 5 show the average assessed values of the CU, AR, and AT images on the 7 test samples given by the 150 subjects, and Rows 5-31 show the predicted values for the three images by using these nine NN models trained in the previous section. The last column of Table 5 shows the RMS errors of the NN models for the test set. As indicated in Table 5, the RMS error (0.0931) of the NN-a-N model is the smallest among the nine models, thus suggesting that the NN-a-N model has the highest predictive consistency (an accuracy rate of 91.69%, 100%-9.31%) for predicting the values of the CU, AR, and AT images of character toys. This suggests that the NN-a-N model is most promising for modeling consumers’ feelings on
840
Y.-C. Lin and C.-C. Wei
product images of character toys. However, the other eight models also have a quite similar performance, as the difference between the RMS errors of the nine models is almost negligible. This seems to suggest that the different variables or factors (the momentum, learning rate, or the number of neurons in the hidden layer) have no significant impact on the predictive ability of the NN models. Table 5 Predicted image values and RMS errors of the NN models for the test set
Sample No. CU Consumer AR feelings AT CU NN-a-S AR Predicted AT CU NN-a-C AR Predicted AT CU NN-a-N AR Predicted AT CU NN-b-S AR Predicted AT CU NN-b-C AR Predicted AT CU NN-b-N AR Predicted AT CU NN-c-S AR Predicted AT CU NN-c-C AR Predicted AT CU NN-c-N AR Predicted AT
4 63 52 54 37.94 48.78 48.57 41.65 48.79 49.07 38.08 50.56 47.26 37.84 48.12 49.01 38.27 50.02 46.73 37.80 48.63 48.82 38.01 48.01 49.00 38.03 48.76 49.47 38.31 48.58 48.43
7 52 66 61 73.96 63.14 69.27 67.84 64.30 66.94 76.87 68.48 77.38 69.51 64.92 77.89 73.22 67.58 75.43 61.38 69.31 73.05 54.13 65.73 73.25 64.32 64.75 43.64 68.12 68.43 73.97
16 62 74 72 48.50 61.18 71.77 53.84 66.32 72.20 52.09 65.38 71.99 50.12 62.35 59.45 49.96 64.96 71.93 52.57 67.79 72.26 71.92 72.53 59.84 50.16 66.76 78.13 50.41 67.42 78.82
21 41 50 58 38.51 53.24 76.14 59.87 62.26 71.39 56.24 67.06 73.56 37.47 67.27 66.15 67.75 65.07 72.47 54.37 70.69 77.76 57.35 68.91 79.29 53.77 65.55 76.59 42.97 70.74 77.23
26 58 71 68 69.56 59.54 58.30 67.44 61.91 66.43 55.04 65.56 69.57 52.45 73.63 58.01 67.75 60.93 61.15 54.24 69.47 75.82 50.48 66.53 61.77 51.37 60.39 74.30 43.17 71.08 75.77
30 68 59 65 51.78 62.80 74.14 60.26 65.64 68.22 66.46 59.37 64.61 64.27 73.07 79.35 59.79 57.21 58.99 63.35 52.76 50.60 57.55 71.00 61.84 59.59 64.98 39.81 56.28 65.83 59.35
32 61 49 51 73.43 62.88 69.23 69.28 60.58 65.25 69.29 49.18 52.30 70.79 64.15 77.12 64.96 52.92 52.53 61.44 52.65 49.46 70.83 58.62 58.73 75.13 69.76 54.75 66.79 51.11 45.89
RMSE
0.1243
0.0995
0.0931*
0.1274
0.1091
0.0937
0.1069
0.1288
0.1065
4 The Decision Support Model for New Product Forms The NN models enables us to build a design decision support database that can be used to help determine the optimal form design for best matching specific product
An Intelligent Decision Support Model for Product Design
841
images. The design decision support database can be generated by inputting each of all possible combinations of form design elements to the NN models individually for generating the associated image values. The resultant character toy design decision support database consists of 4,860 (=3×3×3×4×5×3×3) different combinations of form elements, together with their associated CU, AR, and AT image values. The product designer can specify desirable image values for a new character toy form design, and the database can then work out the optimal combination of form elements. In addition, the design support database can be incorporated into a computer aided design (CAD) system to facilitate the form design in the new character toy development process. To illustrate, Table 6 shows the optimal combination of form elements for the new character toy design with the most “cute + artistic + attractive” image (the CU value being 75, the AR value being 63, and the AT value being 70). The product designer can follow this design support information to match the desirable product images and satisfy the consumers’ emotional feelings. Table 6 The optimal combination of form elements for the most “cute + artistic + attractive”
X1
X2
X3
X4
X5
X6
X7
Length ratio of head and body
Width ratio of head and body
Costume style
Costume pattern
Headdress
Appearance of facial features
Overall appearance
one-piece
mixed
arc-shape d
entire features
personifie d style
<1:2
>body
head
two-pieces
partial features
robe
5 Conclusion In this paper, we have demonstrated how NN models can be built to help determine the optimal product form design for matching a given set of product images, using an
842
Y.-C. Lin and C.-C. Wei
experimental study on character toys. The consumer-oriented design approach has been built a character toy design decision support model, in conjunction with the computer-aided design (CAD) system, to help product designers facilitate the product form in the new product development process. Although character toys are used as the experimental product, the consumer-oriented design approach presented can be applied to other consumer products with a wide variety of design form elements. Acknowledgments. This research is supported in part by the National Science Council of Taiwan, ROC under Grant No. NSC 99-2410-H-259-082.
References 1. Cross, N.: Engineering Design Methods: Strategies for Product Design. John Wiley and Sons, Chichester (2000) 2. Jonathan, C., Craig, M.V.: Creating Breakthrough Products- Innovation from Product Planning to Program Approval, pp. 1–31. Prentice Hall, New Jersey (2002) 3. Kim, J.U., Kim, W.J., Park, S.C.: Consumer perceptions on web advertisements and motivation factors to purchase in the online shopping. Computers in Human Behavior 26, 1208–1222 (2010) 4. Lai, H.-H., Lin, Y.-C., Yeh, C.-H., Wei, C.-H.: User Oriented Design for the Optimal Combination on Product Design. International Journal of Production Economics 100, 253–267 (2006) 5. Lai, H.-H., Lin, Y.-C., Yeh, C.-H.: Form Design of Product Image Using Grey Relational Analysis and Neural Network Models. Computers and Operations Research 32, 2689–2711 (2005) 6. Lin, Y.-C., Lai, H.-H., Yeh, C.-H.: Consumer-oriented product form design based on fuzzy logic: A case study of mobile phones. International Journal of Industrial Ergonomics 37, 531–543 (2007) 7. Lin, Y.-C., Lai, H.-H., Yeh, C.-H.: Consumer Oriented Design of Product Forms. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3174, pp. 898–903. Springer, Heidelberg (2004) 8. Lin, Y.-C., Lai, H.-H., Yeh, C.-H.: Neural Network Models for Product Image Design. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) KES 2004. LNCS (LNAI), vol. 3215, pp. 618–624. Springer, Heidelberg (2004) 9. Nagamachi, M.: Kansei engineering: A new ergonomics consumer-oriented techology for product development. International Journal of Industrial Ergonomics 15, 3–10 (1995) 10. Nagamachi, M.: Kansei engineering as a powerful consumer-oriented technology for product development. Applied Ergonomics 33, 289–294 (2002) 11. Negnevitsky, M.: Artificial Intelligence. Addison-Wesley, New York (2002) 12. Petiot, J.F., Yannou, B.: Measuring consumer perceptions for a better comprehension, specification and assessment of product semantics. International Journal of Industrial Ergonomics 33, 507–525 (2004) 13. Walker, G.H., Stanton, N.A., Jenkins, D.P., Salmon, P.M.: From telephones to iPhones: Applying systems thinking to networked, interoperable products. Applied Ergonomics 40, 206–215 (2009) 14. Yamamoto, M., Lambert, D.R.: The impact of product aesthetics on the evaluation of industrial products. Journal of Product Innovation Management 11, 309–324 (1994)
Compromise in Scheduling Objects Procedures Basing on Ranking Lists Piech Henryk and Grzegorz Gawinowski
Abstract. In our work possibility of support of ranking objects (tasks) is analyzed on base of a group of lists. We can get these lists both from experts or with help of approximating and simple (according to complexity) algorithms. To support analysis we can use elements of neighborhood theory [13], preferential models [5], and rough sets theory [16]. This supporting process is used for creation final list of tasks sequence. Usually, these problems are connected with distribution, classification, prediction, strategy of games as well as compromise searching operations. The utilization preference and domination models permits to crisp inferences and to force the chronological location of object. In some situations we have dealt with dynamic character of filling lists resulting from continuous tasks succeeding and continuous their assigning to executive elements. The utilization of the theory of neighborhood permits to locate objects in range of compromised solutions resulting in getting close to dominating proposal group. Main task for us is to find the best compromise in final objects location. We want to defined advantages and drawbacks of methods basing on mention theories and analyze possibilities of their cooperation or mutual completions. Keywords: discrete optimization, ranking lists, compromise estimation.
1 Introduction There are many application of preference theory solving the problems of decision supporting [5, 8, 15]. The dynamic scheduling using preferential models and rough sets theory does not introduce essential changes adoption in algorithms based on this theory but only adjusts parameters of data [14, 17]. Piech Henryk · Grzegorz Gawinowski Czestochowa University of Technology, Dabrowskiego 73 e-mail: [email protected]
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 843–852. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
844
P. Henryk and G. Gawinowski
Preferences and dominations [7]are used to compare sequences of assigning tasks which need to be run. This however needs selecting data and defining profiles which represent tasks (objects) in aspect of execution preference (final location) before we start running the process [9].The domination in Pareto and Lorenze sense permits to settle basic relations between sequences of well ordered objects. The preferences of type ”at least as good as” estimated as interval (by low and upper bounds) permit to define zone of uncertain solution. In such situations, for decision making we use additional criteria (for example the costs of reorganization [2]). For defining location of tasks we can use elements of neighborhood theory [13] as well as cooperation, toleration and collision in range of neighborhood [9]. They are named according to researched problems. They were connected, among others, with supporting or rejecting the thesis about task location in centre of given neighborhood. The closed neighborhoods confirm and support the decision (the thesis) about assigning the task to specific location. The relation of tolerance have reflective and symmetrical character [13]. The cooperating neighborhoods intensify the strength of domination and reduce the influence of passivity or small influence of tolerance. The cooperation (the supports of thesis) and the collision relation (the postponement of supporting the thesis, which means indirectly, the support of antithesis) strengthens inference mechanisms. Cooperation has reversible character. This kind of dependence between relations should simplify creation of conclusion. According to theory of neighborhood, which we engage in procedure of establishing sequent, we increase the autonomy of studied tasks groups with reference to their distribution. The symmetry of inference increases power of decision support at the same time [13]. The next problem is connected with dynamic scheduling, and appointing the objective solutions (independent on sequent or set of criteria or experts opinions). Obviously, it is not always possible, but is comfortable to use interval solutions, particularly in situation, when solutions are on border of location classes according to given criterion.
2 Compromise Estimation after Creating Final Ranking List Process Compromise is formed between ingredient judgment lists which were built with help of algorithms or on base of experts’ opinions. There is a possibility of creating several types of compromise, for example: 1. minimum concessions and similar of their levels (minimum variance of concessions); {cmp1 =
n m (loc(i, j) − locf (i))2 } → min}, j=1 i=1
Compromise in Scheduling Objects Procedures Basing on Ranking Lists
var{
n
(loc(i, j) − locf (i))2 } → min,
845
j = 1, 2, ..., m,
(1)
i=1
or var{
n
(loc(i, j) − locf (i))2 } → min,
i = 1, 2, ..., n,
i=1
where var - variance of concession according to ingredient list or to tasks. 2. minimum distances between center of neighborhoods with maximum powers (or concentration) and final tasks location: {cmp2 =
n
(centr max pow(i) − locf (i))2 } → min},
i=1
or
{cmp2 =
n
(centr max concentration(i) − locf (i))2 } → min},
(2)
i=1
where centre max pow - centre of maximum power neighborhood, centre max concentration - centre of maximum concentration (numbering) neighborhood, 3.minimum correction on final list according Lorenze preference location.
{cmp3 =
n
(Lorenze loc(i) − locf (i))2 } → min}.
(3)
i=1
Generally we can describe compromise as follows:
{cmp =
n
(criterion loc(i) − locf (i))2 } → min},
(4)
i=1
where criterion loc(i)−location of i-th object suggested by chosen criterion. We can use different criteria or their composition for estimation of compromise. In result of using these criteria we often obtain the same location for different objects. In this case it needs to use auxiliary criteria, methods or
846
P. Henryk and G. Gawinowski
heuristic rules. Sometimes we decided to use different criteria for compromise estimation and resign from based on creating final lists method (fig.1).
Fig. 1 Distinguished criteria set for creating final list and compromise: A ∩ B = 0
In our convention (1)-(4) the best compromise refers the smallest value of parametr cmp. To compare compromises for several final lists we should keep the same criteria in set B.
3 Sets of Criteria for Creating Final List It is necessary to define several criteria because often results from using single criteria aren’t unambiguous. It means that we have several objects pretending to one location on final list. We propose several compositions of criteria: 1.sup(ϕ → ψ) → max centre(ϕ, i) → min 2.cnbh(ϕ, i) → max zone(ϕ, i) → min centre(ϕ, i) → min 3.sup(∗ → ψ) + sup(ϕ ← ∗) → min cnbh(ϕ, i) → max,
(5)
where sup(ϕ → ψ) → max - maximal number of object in one placement in ingredient lists, where object ϕi is placed on position ψj , centre(ϕ, i) → min - minimal position of neighborhood centre; we chose object ϕi from this neighborhood, which is closest to the beginning of the list and located in center of its neighborhood, cnbh(ϕ, i) → max - maximal concentration neighborhood, we chose object ϕi with maximal neighborhood concentration (numbering) and locate them in its center, zone(ϕ, i) → min minimal neighborhood distance from begin of list, we chose object ϕi with minimal neighborhood distance and locate them in its center,
Compromise in Scheduling Objects Procedures Basing on Ranking Lists
847
sup(∗ → ψ) + sup(ϕ ← ∗) → min - minimal number of objects pretending to position ψj and minimum positions to which pretended objects ϕi , we chose object ϕi and locate them on position ψj (intuition criterion) We often obtain the same value of criteria estimators. In this case we should go to next criterion in hierarchy, considering the same object and next searching the best location for it. Similar situation appears when chosen location is occupied by previously located objects.
4 Methods and Examples of Creating Final Lists of Scheduled Objects For scheduling objects we can use rules from the theories of: – neighborhoods – preferences – rough sets Beside the criteria set we can use specific methods used traditionally for classification, categorization and ordering objects [16].We try to enrich every proposed method by showing an example. We described exploitation neighborhoods theory to define criteria set above. It is possible to combine elements of quoted theories in different way: 1) neighborhoods + rough sets We can create lower approximation P (O) [16] as set of maximal concentration (or power) neighborhoods and upper approximation as set of all objects locations . In this case main structure (O) is defined by sum of all neighborhoods. 2) neighborhoods + preferences. We can define preferences relation between neighborhoods (or maximal neighborhoods) using their characteristics (concentration, power). 3) neighborhoods + preferences + rough sets From set of upper approximation we chose and remove extreme located neighborhoods and locate adequate objects in their neighborhoods center. Researched objects distribution can be exploited by rough sets theory (Pawlak theory). Using Pawlak theory [16] we can adapt semantically dependence on physical sense of terminology, e.g. relative zone (O). In our case (in ordering objects by several algorithms simultaneously) we can define relative zone as a range of positions in which the most important neighborhoods representing all objects (lower approximation) are included. Relative zone has common part with less important neighborhoods. nbh(i,max⊆(O)
nbh(i, max) = P (O)
(lower approximation)
(6)
848
P. Henryk and G. Gawinowski
nbh(i, ∗ < max) = P (O)
(upper approximation) (7)
nbh(i,∗<max∩(O)=0
So, in our case relative zone (O) can be named representative zone and it contained objects on all positions (O) = (1) + (2) + ... + (8). This zone will be systematically cut off (from both sides) during extracting objects to final list ( tables in fig.2). So, this zone has dynamic length.
Fig. 2 Stages-tables of creation final list
Example 1 1 3 2 6 4 5 8 7 final list - last stage The drawback of presented method above is that it prefers center neighborhoods location over their numbering. When we use Lorenze preference rules [6] we can simply calculate average locations for all objects. In our example (fig.2) we get: pL(1) = aver(loc(j1)) = (1 + 1 + 3 + 8)/4 = 3, 25 pL(2) = aver(loc(j2)) = (1 + 2 + 4 + 7)/4 = 3, 5 pL(3) = aver(loc(j3)) = (1 + 2 + 2 + 3)/4 = 2 pL(4) = aver(loc(j4)) = (1 + 1 + 3 + 8)/4 = 4, 5 pL(5) = aver(loc(j5)) = (4 + 6 + 6 + 8)/4 = 6 pL(6) = aver(loc(j6)) = (2 + 3 + 3 + 5)/4 = 3, 25
Compromise in Scheduling Objects Procedures Basing on Ranking Lists
849
pL(7) = aver(loc(j7)) = (6 + 7 + 8 + 8)/4 = 7, 25 pL(8) = aver(loc(j8)) = (5 + 6 + 7 + 7)/4 = 6, 25 where m pL(i) = aver(loc(ji)) = 1/m j=1 (loc(ϕ(i, j))− the strength of Lorenze preference characteristic After ordering we have final list of rankings pL(3)pL(1)pL(6)pL(2)pL(4)pL(5)pL(8)pL(7) or in form Example 2 3 1 6 2 4 5 8 7 final list - preference in Lorenze sens The drawback of this approach is taking into account less important data (such as single object locations). In order to cover essential information we should use neighborhoods only (not single object locations) and prepare their characteristics:
Fig. 3 Characteristics of neighborhoods for all tasks
ln(i)
pn(i) =
j=1
ln(i)
numb(nbh ϕ(i, j)) ∗ centre(ϕ(i, j))/
numb(nbh ϕ(i, j)),(8)
j=1
where ln(i)− number of neighborhoods for i-th object, numb(nbh(i, j))− numbering (concentration) of j-th neighborhood for i-th object (table in fig.3), centre(ϕ(i, j))− centre of of j-th neighborhood for i-th object (fig.3), pn(1) = 2 ∗ 1/2 = 1pn(5) = 2 ∗ 6/2 = 6 pn(2) = 2 ∗ 1/2 = 1pn(6) = 6 ∗ 3/6 = 3 pn(3) = 4 ∗ 2/4 = 2pn(7) = 4 ∗ 8/4 = 8 pn(4) = 4 ∗ 4/4 = 4pn(8) = 4 ∗ 7/4 = 7 Example 3 1 2 3 6 4 5 8 7 final list - gravity points for every object To analyze and compare chosen methods we propose set of choices (1), for example:
850
P. Henryk and G. Gawinowski
cnbh(ϕ, i) → max zone(ϕ, i) → min centre(ϕ, i) → min and their help formulate the final list. It gives us solution with structure: Example 4 1 3 6 4 2 5 8 7 final list - set of criteria In this case we have next sequence of tasks joining to final list: 1)ϕ4 → ψ4 2)ϕ8 → ψ7 3)ϕ7 → ψ8 4)ϕ3 → ψ2 1)ϕ6 → ψ3 2)ϕ1 → ψ1 3)ϕ5 → ψ6 4)ϕ2 → ψ5 According this method we use essential date and omit single object placement and deviation.
5 The Example of Exploitation Compromise to Judgment of Set of Final List For choosing compromise criteria we can go by the quantity of information, which was used to define in estimation process. Such approach suggested exploiting Lorenze preferences as compromise criterion. In next step we estimate scale of differences between final lists and list created on base of Lorenze preference. According to (4) we will do it for all solutions. n 1)cmp = i=1 (criterion loc(i) − locf (i))2 = (3 − 1)2 + (1 − 3)2 + (6 − 2)2 + (2 − 6)2 +(4 − 4)2 + (5 − 5)2 + (8 − 8)2 + (7 − 7)2 = 48 n 3)cmp = i=1 (criterion loc(i) − locf (i))2 = (3 − 1)2 + (1 − 2)2 + (6 − 3)2 + 2 (2 − 6) +(4 − 4)2 + (5 − 5)2 + (8 − 8)2 + (7 − 7)2 = 30 4)cmp = ni=1 (criterion loc(i) − locf (i))2 = (3 − 1)2 + (1 − 3)2 + (6 − 6)2 + (2 − 4)2 + (4 − 2)2 + (5 − 5)2 + (8 − 8)2 + (7 − 7)2 = 16 min{cmp(1); cmp(3); cmp(4)} = min{48; 30; 16} = 16 To find the nearest to compromise solution final list we have named additional parameter for defining method code. For example we extend location attribute name to form locfk (i), where k is a code used for creating final list method (which are adequate to examples above). Compromise expression stays simple and might have the following form:
{cmp =
lm n (criterion loc(i) − locfk (i))2 } → min, k=1 i=1
where lm− number of ordering methods basing on ingredient lists analysis.
(9)
Compromise in Scheduling Objects Procedures Basing on Ranking Lists
851
The best compromise according to Lorenze criterion we can find in example 4. Obviously when we chose different compromise criterion the best criterion will vary. Sometime we dispose set of compromise criteria. In this case rules of searching compromise can be expressed by: {cmp =
lc lm n
(criterion loc(i) − locfk (i))2 } → min,
(10)
j=1 k=1 i=1
where lc - compromise criteria number. If we use the same methods (criteria)for creating both final lists and n compromise stencil list, than components i=1 (criterion loc(d) − locfd (i))2 , where d refers to choosen the same method (or criteria) for both tasks, will be obviously equal to zero, and it doesn’t influence at all, on final compromise estimator level.
6 Conclusions The experiments show that combining methods of neighborhoods, preference and rough set for analysis ranking list is very comfortable and permits to exploit reach part of information for creating final list and compromise solution. The situation doesn’t become more difficult even when we dispose the same set of methods for creating final lists and compromise list. In neighborhood theory we use tools for eliminating inessential information opposite to some variant of preference rules, but using preference methods we can create reference stencils. Specific character of rough sets theory description permits not only to reject objects of inessential attribute values, but at the same time to dislocate objects using current compromise decisions. Neighborhood estimators are less unambiguously but don’t regard inessential date.
References 1. Blazewicz, J., Lenstra, J.K., Rinnooy Kan, A.H.G.: Scheduling subject to resource constrains: Classification and complexity. Discrete Appl. Math. 5, 11–24 (1983) 2. Brzezi˜ nska, I., Greco, S., Slowinski, R.: Mining Pareto-optimal rules wth respect to support and anti-suport. Engeniering Applications of Artificial Inteligence 20(5), 587–600 (2007) 3. Conway, R.W., Maxwell, W.L., Miller, L.W.: Theory of Scheduling. AddisionWesley, Reading (1954) 4. Crupi, V., Tentori, K., Gonzalez, M.: On Bayesian confirmation measures of evidential support. Theoretical and Empirical Issues. Philosophy of science
852
P. Henryk and G. Gawinowski
5. Finch, H.A.: Confirming Power of Observations Metricized for Decisions among Hypotheses. Philosophy of Science 27, 391–404 (1999) 6. Greco, S., Matarazzo, B., Slowinski, R., Stefanowski, J.: An algorithm for induction of decision rules with dominance principle. In: Rough Sets and Current Trends in Computing. LNCS (LNAI), pp. 304–313. Springer, Berlin (2005) 7. Greco, S., Matarazzo, B., Slowinski, R.: Axiomatic characterization of a general utility function and its particular cases in terms of conjoint measurement and rough sets decision rules. European J. of Operational Research (2003) 8. Greco, S., Matarazzo, B., Slowinski, R.: Extension of rough set approach to multicriteria decision support. Infor. 38, 161–196 (2000) 9. Greco, S., Matarazzo, B., Slowinski, R.: Rough sets theory for multicriteria decision analysis. European J. of Operational Research 129, 1–47 (2001) 10. Greco, S., Pawlak, Z., Slowinski, R.: Can Bayesian confirmation measures be useful for rough set decision rules? Engineering Applications of Artificial Intelligence 17, 345–361 (2004) 11. Greco, S., Slowi´ nski, R., Szcz¸ech, I.: Assessing the quality of rules with a new monotonic interestingness measure Z. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 556–565. Springer, Heidelberg (2008) 12. Hilderman, R., Hamilton, H.: Knowledge Discovery and Measures of Interest. Kluwer Academic Publishers, Dordrecht (2001) 13. Jaro˜ n, J.: Systemic Prolegomena to Theoretical Cybernetics, Scient. Papers of Inst. of Techn. Cybernetics 25 (1975) 14. Kent, R.E.: Rough concept analysis: A synthesis of rough sets and formal concept analysis. Fundamanta Informaticae 27, 169–181 (1996) 15. Kleinberg, J.: Navigation in a small words. Nature 406, 845 (2000) 16. Kohler, W.H.: A preliminary evolution of the critical path method for scheduling tasks on multiprocessor systems. IEEE Trans. Comput. 24, 1235–1238 (1975) 17. Nikodem, J.: Autonomy and Cooperation as Factors Dependability in Wireless Sensor Network. IEEE Computer Society P3179, 406–413 (2008) 18. Pawlak, Z., Sugeno, M.: Decision Rules Bayes, Rule and Rough, New Decisions in Rough Sets. Springer, Berlin (1999) 19. Pawlak, Z.: Rough Sets. Present State and the Future, Foundations, vol. 18(3-4) (1993) 20. Piech, H. (ed.): Analysis of possibilities and effectiveness of combine rough set theory and neibourhood theories for solving dynamic scheduling problem, vol. P3674, pp. 296–302. IEEE Computer Society, Washington (2009) 21. Skowron, A.: Extracting lows from decision tables. Computational Intelligence 11(2), 371–388 (1995) 22. Slowi´ nski, R., Brzezinska, I., Greco, S.: Application of bayesian confirmation measures for mining rules from support-confidence pareto-optimal set. In: ˙ Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 1018–1026. Springer, Heidelberg (2006) 23. Szwarc, W.: Permutation flow-shop theory revised. Math Oper. Res. 25, 557– 570 (1978) 24. Syslo, M.M., Deo, N., Kowalik, J.S.: Algorytmy optymalizacji dyskretnej. PWN, Warszawa (1995) 25. Talbi, E.D., Geneste, L., Grabot, B., Previtali, R., Hostachy, P.: Application of optimization techniques to parameter set-up in scheduling. Computers in Industry 55(2), 105–124 (2004)
Decision on the Best Retrofit Scenario to Maximize Energy Efficiency in a Building Ana Campos and Rui Neves-Silva
*
Abstract. Building owners, or investors, and facility managers, or building technical consultants, have the difficult task of maintaining an infrastructure by selecting the most adequate investments. Nowadays, this maintenance means in most countries to update a building to current regulations regarding energy efficiency. The decision to retrofit a building involves several actors and a diverse set of criteria, covering technical, economical, social and financial aspects. This paper presents a novel approach to support investors and technical consultants in selecting the most appropriate energy-efficient retrofit scenario for a building. The proposed approach uses the actual energy consumption of the building to predict energy profiles of introducing new control strategies to increase energy efficiency. Additionally, the approach uses the Analytic Hierarchy Process combined with benefits, opportunities, costs and risks and a sensitivity analysis to support actors in selecting the best scenario to invest.
1 Introduction In the last decade, increased attention has been given to energy efficiency and greenhouse gas emissions. According to the International Energy Agency, global energy demand will grow 55% by 2030. In the period up to 2030, the energy supply infrastructure worldwide will require a total investment of USD 26 trillion, with about half of that in developing countries. If the world does not manage to green these investments by directing them into climate-friendly technologies, emissions will go up by 50% by 2050, instead of down by 50%, as science requires (United Nations Framework Convention on Climate Change 2011). The European Commission has prioritized climate change and energy in its new Europe 2020 strategy, establishing ambitious goals: 20% reduction of greenhouse gas emissions, meeting 20% of energy needs from renewable sources, and reducing energy consumption by 20% through increased energy efficiency. Ana Campos UNINOVA, FCT Campus, 2829-516 Caparica, Portugal *
Rui Neves-Silva DEE-FCT, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 853–862. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
854
A. Campos and R. Neves-Silva
In order to meet EU targets, the European Commission and the countries have approved legislation to be applied. One of the areas where regulations have been significant is related to buildings. The European Union has now European and national laws that have to be complied particularly by new buildings. However, there is also a growing concern about existing buildings, especially about the possibilities to modernize them, making them more efficient from an energy perspective. Many recent reports on energy efficiency in buildings stress that occupants behavior is one of the most important aspects to achieve energy efficient buildings: “The behavior of building’s occupants can have as much impact on energy consumption as the efficiency of equipment”, (Parker and Cummings 2008) (Vieira 2006), “a heightened energy consumption awareness is expected to stimulate behavioral changes both at household and enterprise level” (European Commission 2008), “a smart building is only as smart as the people running it” (Powell 2009). Recently, several European research projects were approved addressing building’s occupants behavior, such as BeAware - Boosting Energy Awareness with mobile interfaces and real-time feedback (BeAware 2009), DEHEMS - Digital Environmental Home Energy Management System (DEHEMS 2009) or Beywatch - Building Energy Watcher (Beywatch 2009). From the perspective of the building user, all these projects focus on metering energy consumption and advising users in real-time about energy consumptions that are higher than the expected ones. The concept proposed on this paper follows a novel approach by measuring energy consumptions in the building, outdoor environmental data (weather conditions, urban context etc.) and indoor environmental data (luminance, temperature, humidity, CO2 concentration, etc.). These data are collected and processed identify the cause-effect relations between building’s occupant’s behavior and energy consumption in the building. The objective is to identify the most adequate control technologies to be used in retrofitting the building to increase energy efficiency, without disturbing or reducing the comfort level of the building’s occupants. The approach supports the human actors in selecting the best retrofit scenario, considering technical, social, economic and financial criteria.
2 Concept and Objectives When the owner of a building decides to renovate an existing infrastructure, with energy efficiency in mind, would it help to know exactly where energy is being spent, i.e. the real use of the infrastructure? The work here proposed takes this assumption as truth. The key hypothesis is that the data gathered on how an infrastructure is being used may serve to improve the accuracy on prediction of future energy consumption impact of installing alternative sets of available technologies, including controllers. This will also serve to justify the necessary renovation investment based on a financial return-on-investment calculation. The objective of this work is to develop a reliable method to support decisionmaking on energy-efficient investments in building renovations. This objective is critically important to convince building owners to renovate with energy-saving, energy-generating, and energy-storing solutions.
Decision on the Best Retrofit Scenario to Maximize Energy Efficiency in a Building
855
se
ns o
rn
et w or
k
The proposed approach gathers data on energy consumption of an existing infrastructure, crossing it with the building’s use, to define a baseline energy consumption model of the building. This baseline scenario can then be used to predict consumption of different scenarios, comprehending energy-efficient technologies and control solutions. The system monitors the usage of a building, models the building’s energy consumption, and uses these two elements to predict energy consumption under alternative scenarios based on available market solutions and provide recommendations for a best solution, taking into consideration the decision-makers’ criteria and restrictions, as presented in Fig. 1.
Fig. 1 The proposed concept.
The exploration of this hypothesis requires solving the key problem: how to define and prepare the energy consumption software to make use of all this data, and return a coherent answer? The solution proposed includes a detachable sensor network and an energy prediction decision support system. The sensor network is installed to audit the building usage and provide all the necessary information to specify an energy consumption model of the building, which can be extrapolated to cover a pre-defined period of time (usually the models are annual). The building audit data and the information about the infrastructure are then used by the system to identify potential control technologies that could be applied to improve the building. The system constructs renovation scenarios, using one or more control technologies, with the objective of increasing energy efficiency. Furthermore, the system uses the data of the baseline scenario to predict the energy consumption of the proposed solutions. The retrofit scenarios identified can have different complexity levels. One building may only need to exchange luminaries in several rooms to improve light
856
A. Campos and R. Neves-Silva
provided, while another building may have to combine that with presence sensors and heat and ventilation control.
3 Decision-Making Process The decision-making process starts with defining the objective of the renovation (e.g. invest capital, increase comfort, decrease energy costs) and ends with selecting the most appropriate renovation scenario to be implemented. The purpose of the decision-making process is to select a renovation scenario that best addresses the requirements of a specific building and the expectations of the investors. Within this process, two actors have been identified: the technical consultant who usually manages and/or maintains the building and the investor who owns the building and controls the financial resources. However, from organizational point of view these two roles can even be played by the same person. The decision-making process comprehends several steps, as represented in Fig. 2. The complete process includes four decision points, and additional steps.
Fig. 2 Decision-making process.
Decision on the Best Retrofit Scenario to Maximize Energy Efficiency in a Building
857
Select Renovation Solution: The technical consultant and investor identify the need to renovate a specific building and define the objective to be achieved. The technical consultant defines assessment parameters and requests an audit to the building, to gather data on actual energy consumption. The audit data is extrapolated to a baseline scenario that represents the energy profile of the building before renovation. The technical consultant identifies renovation solutions based on the technical knowledge of the building and the retrofit objectives. The result of this step is a collection of renovation solutions that address the retrofit objectives. Check Technical Criteria: The technical consultant examines the specification of each selected renovation solution and tries to identify if it is compatible with the existing building infrastructure. The result of this step is the list of renovation solutions annotated on their individual applicability for the specific building. Build a scenario based on solution i? The technical consultant studies the renovation solutions and the retrofit objectives, and selects the renovation solutions that should be considered for implementation. Elaborate Renovation Scenario: The technical consultant has expertise to build different scenarios, which are the several alternatives for the final decision. It is recommended to elaborate simple and perhaps cheaper scenarios, but also more complex and more expensive ones. This divergence allows an enlarged decision space for the investor. The result of this step is a collection of renovation scenarios that comprise the renovation solutions selected and filtered in the previous steps. Check Regulations: The technical consultant studies each renovation scenario and checks it against legislation and regulation applicable to the specific location. If one of the scenarios fails to comply with all necessary regulation, it should be reworked, or ultimately disregarded. The result of this step is a list of renovation scenarios annotated on regulatory applicability in the specific building location. Approve scenario j? This second decision point is to select renovation scenarios that will be simulated to calculate envisaged energy consumption. Simulate Scenarios: Each of the scenarios includes detailed technical specification data that can be used to estimate the energy consumption of installed control technologies. This information is aggregated to the baseline scenario that represents the current building infrastructure. The result is an energy consumption profile of the building with the renovation scenario already implemented. The result of this step is a list of renovation scenarios with information about the respective energy consumption pattern in the current building. Approve simulated scenarios? This third decision point is to approve the renovation scenarios, including the simulations of the respective energy consumption for the current building. The technical consultant analyses each scenario and checks if it fits the technical and financial objectives of the current situation. Score and order scenarios: This step elaborates a benefit, opportunity, cost and risk (BOCR) analysis of each renovation scenario and presents it to the investor to select the most appropriate scenario for implementation in the specific building. Moreover, the investor should have the option to prioritize the decision criteria and merits used to order the alternatives.
858
A. Campos and R. Neves-Silva
Re-define and Approve Criteria/Parameters: The investor, alone or with the support of the technical consultant, defines the decision criteria (e.g. aesthetics, comfort, corporate reputation, brand ambitions) and parameters (e.g. interest rates, energy prices, credit conditions, tax incentives). By iterating between this step, the previous one and the following one, the investor has the possibility of performing a sensitivity analysis of the scenarios being considered. The result of this step is a complete set of decision criteria and parameters to be used by the previous step when realizing the cost-benefit analysis if each renovation scenario. Select one of the renovation scenarios? This is the final decision of choosing the renovation scenario to be implemented for the current building. The investor analyses the scenarios ordered in relation to the decision criteria and parameters defined. One of the scenarios should be the baseline scenario built from the auditing data, so that the investor can assess the cost of “doing nothing” to the building.
4 Benefits, Opportunities, Costs and Risks Analysis The main objective of the proposed presented in this paper is to support an investor or technical consultant in selecting the best investment scenario to increase energy efficiency in a specific building. The support given to the users is in the form of benefits, opportunities, costs and risks (BOCR) analysis, using the analytic hierarchy process (AHP). This combination is used to order the retrofit scenarios being considered and provide a clear financial viability. The Analytic Hierarchy Process is a theory of measurement concerned with deriving dominance priorities from paired comparisons of homogeneous elements with respect to a common criterion or attribute (Saaty 1994). This process uses a series of one-on-one comparisons to rate a series of alternatives to arrive at the best decision. The problem here is how to identify and formulate the alternatives to be considered. AHP has been used to performed BOCR analysis with diverse discussion (Wedley et al. 2001) and successful results (Saaty 2005) (Lee et al. 2009) (Longo et al. 2009). This paper uses the latest developments in the field, re-using results from critiques made to the use of AHP and BOCR. The AHP is especially suited for complex decisions, involving several actors, even from diverse backgrounds. The hierarchical representation of the problem is very suited to identify the criteria to consider. The current decision problem is the choice of the most appropriate technological solution to improve a building’s energetic efficiency. In order to use AHP to perform a BOCR analysis, the approach proposes four hierarchies, where each one represents one of the merits being considered. This means that criteria are separated in four groups, and the users will compare the criteria and alternatives using the four established hierarchies. The authors have elaborated a list of criteria that can be used for this application, with the support of several industrial companies (Campos et al. 2010) (Campos 2010). The list is quite general and can cover different decisions. In each situation, it is possible to discard and of the criterions or add new ones. The method developed is
Decision on the Best Retrofit Scenario to Maximize Energy Efficiency in a Building
859
not in any way tied to this list. The suggested criteria includes, on top of technical performance (which equals energy efficiency performance): • Benefits o Energy savings o Comfort level o Use of government program (e.g. tax incentive) o Occupants satisfaction • Opportunities o Flexibility for reconfiguration o Compliance with regulations o Impact on company’s image • Costs o Equipment costs o Operating costs o Training costs o Personnel costs o Expertise (outsourcing costs) • Risks o Sensitivity on future energy prices o Sensitivity on future user behavior o Technological obsolescence o Efficiency degradation o Visual impact on building o Noise and vibrations impact Each hierarchy has criteria (in one or several levels) and a lower level with the alternatives to be considered. In the current approach the alternatives are the renovation scenarios identified and simulated, as described in the previous section. The decision-making process can be made individually (e.g. by the investor alone) or in a group involving several actors (investor, technical consultant, facility manager etc.). The process comprehends the following steps: • actors provide their judgments on the criteria of the four hierarchies, resulting in four matrixes for each actor; • the priorities of each criteria provided by each actor are calculated using the eigenvalue of each matrix; • the priorities of the several actors are combined in group priorities using a weighted arithmetic mean, resulting in four vectors (one for each of the BOCR aspects); • the actors judge the alternatives regarding the several criteria of each hierarchy, resulting again in four matrixes per actor; • the priorities of the alternatives provided by each actor are calculated using the eigenvalues of each matrix and then normalized according to the ideal AHP mode, resulting in one alternative scoring 1 for each criterion;
860
A. Campos and R. Neves-Silva
• the priorities of the several actors are again combined using a weighted arithmetic mean, resulting in four matrixes; • the actors define the merits of each of the BOCR aspects, i.e. assigning a weight; • the alternatives are ordered using the subtractive formula to calculate the overall priorities. The criteria are judged by the different actor using the scale proposed by Saaty, of numbers 1-9 and reciprocals. This scale uses 1 to define items of equal importance and differences until 9, representing absolute importance. The use of the ideal AHP mode is important, and has been proven to establish the priority of an alternative independent of any scale used. In this mode, the most relevant alternative always ranks 1 and all the others are related to that. This achieved independence of scale, which will be necessary when combining priorities, and particularly different hierarchies. The final priority of each alternative is calculated using the subtractive BOCR formula, defined as Pi = bBi + oOi – cCi – rRi,
(1)
where b, c, o and r represent the merits (weights) of each aspect and Bi, Oi, Ci and Ri represent the priorities given for alternative i in each of the four hierarchies. This formula has provided successful results in the works referenced before, unlike others, such as the additive or multiplicative formulas. The objective of the formula used is to provide a positive result for alternatives that have more positive aspects (benefits and opportunities) than negative (costs and risks) and a negative result for alternatives that do not reach a breakeven point. One important aspect of the approach is how to identify the merits of each of the four aspects being considered, i.e. how to determine b, c, o and r. Actors use a fifth hierarchy, designated the control hierarchy, containing strategic criteria to rate the merits of benefits, opportunities, costs and risks. The authors suggest the following control criteria: • • • • •
Performance Sustainability Company’s strategy Time to implement decision Growth
The authors are studying and developing these control criteria, and each company and/or situation should specify its own. The objective is to use a hierarchy that relates these criteria with the BOCR aspects, to derive priorities. Users should consider the highest ranking alternative in each of the four aspects and rate it against the criteria identified. These ratings are combined in a matrix and used to derive the priorities.
Decision on the Best Retrofit Scenario to Maximize Energy Efficiency in a Building
861
Once the merits of each aspect are identified and the final priorities of the alternatives calculated, the users can perform a sensitivity analysis by adjusting the merits in about 10%.
5 Applications The approach presented in this paper will be applied and validated in real infrastructures in the scope of the following two scenarios: • The owner of the infrastructure requests an evaluation to a construction company on possible solutions to reduce the infrastructure energy intensity. The construction company uses the proposed system to assess the infrastructure usage and offer possible solutions supported by the results of the tool. In case of viable solutions, the owner accepts (or requests further iterations) and the solution is installed in the infrastructure. The end-user of the system is the construction company responsible for the installation of the solution, with the collaboration of the owner that will decide on the investment. • A company, responsible for managing the infrastructure, uses the EnPROVE tool to evaluate possible cost reduction scenarios through the installation of alternative energy-efficient control systems technologies. In case of viable solutions, the infrastructure manager decides for a solution and installs it. The end-user of the EnPROVE tool is the infrastructure manager responsible for deciding the investment and installing the solution.
6 Conclusions and Future Work This paper presents a novel approach to support renovation investment decisions on existing building, aiming at increasing energy efficient. The work has been developed in the scope of the research project EnPROVE, which started in the beginning of 2010. The approach is based on the monitoring of the building usage by a wireless sensor network to build adequate energy consumption models. These models are then used to predict the impact on energy consumption of the eventual installation of several energy efficient technologies. Finally, the decision-support model suggests the best investment alternative taking into consideration the investor’s criteria and possible restrictions. The project is currently on the phase of developing and specifying algorithms, as the ones presented in this paper. This work will be implemented as a serviceoriented software system, which will be tested within the two applications described. Compared to current available energy auditing services and prediction tools, it is foreseen that this approach will increase the cost-effectiveness of renovation investments by 15 to 30%.
862
A. Campos and R. Neves-Silva
Acknowledgments. Authors express their acknowledgement to the consortium of the project EnPROVE, Energy consumption prediction with building usage measurements for software-based decision support. EnPROVE is funded under the Seventh Research Framework Program of the European Union (contract FP7-248061).
References 1. BeAware (2009), http://energyawareness.eu 2. Beywatch (2009), http://www.beywatch.eu 3. Campos, A.R.: Intelligent Decision Support Systems for Collaboration in Industrial Plants. PhD Thesis (2010) 4. Campos, A.R., Marques, M., Neves-Silva, R.: A decision support system for energyefficiency investments on building renovations. In: Energycon 2010: IEEE Energy Conference and Exhibition, Bahrain (2010) 5. DEHEMS (2009), http://www.dehems.eu 6. European Commission. Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions - Addressing the challenge of energy efficiency through information and communication technologies (2008) 7. Lee, A.H.I., Chen, H.H., Kang, H.Y.: Multi-criteria decision making on strategic selection of wind farms. Renewable Energy 34, 120–126 (2009) 8. Longo, G., Padoano, E., Rosato, P., Strami, S.: Considerations on the Application of AHP/ANP Methodologies to Decisions Concerning a Railway Infrastructure. In: International Symposium on the Analytic Hierarchy Process (2009) 9. Parker, D., Cummings, J.: Pilot Evaluation of Energy Savings from Residential Energy Demand Feedback Devices. Florida Solar Energy Center, USA (2008) 10. Powell, K.: Energy Smart Buildings. In: Fourth Annual Green Intelligent Buildings Conference, Santa Clara, USA (2009) 11. Saaty, T.: Fundamentals of Decision Making and Priority Theory with the Analytic Hierarchy Process: Vol. VI of the AHP Series. RWS Publications, USA (1994) 12. Saaty, T.: Theory and Applications of the Analytic Network Process. RWS Publications, Pittsburgh (2005) 13. United Nations Framework Convention on Climate Change. Fact Sheet: The need for strong global action on climate change. United Nations Framework Convention on Climate Change (February 2011), http://unfccc.int/2860.php (accessed 2011) 14. Vieira, R.: The Energy Policy Pyramid - A Hierarchical Tool for Decision Makers. In: Fifteenth Symposium on Improving Building Systems in Hot and Humid Climates, Orlando, USA (2006) 15. Wedley, W.C., Choo, E.U., Schoner, B.: Magnitude adjustment for AHP benefit/cost ratios. European Journal of Operational Research 133, 342–351 (2001)
Developing Intelligent Agents with Distributed Computing Middleware Christos Sioutis and Derek Dominish
*
Abstract. Intelligent agents embody a software development paradigm that merges theories developed in Artificial Intelligence (AI) research combined with computer science. The power of agents comes from their intelligence and also their communication. Current agent development methodologies and resulting frameworks have been developed from an AI perspective. From a developer’s point of view they introduce new programming concepts and provide a specialised execution environment. Considerable emphasis is placed on hiding away the underlying complexity of how agents actually operate. However, the fact is; agent systems are inherently distributed software systems and this brings significant implications in their application and more importantly integration. This has been largely underestimated by the agent community resulting in increased development risk in large production systems. The Distributed Object Computing (DOC) development methodology on the other hand has been used to successfully build large scale distributed software systems using standards-based middleware. In this context Objects encapsulate behaviour and are inherently integrated with any system utilising compatible middleware. This paper explores the possibility of leveraging the power of both approaches through a proposed Agent Architecture Framework (AAF) that implements generic agent behaviours and algorithms with DOC middleware using well understood software design patterns.
1 Introduction The term “agent” is an overloaded term in literature, the meaning of which depends on how the concept is applied in different application domains. In this paper an agent refers to software of considerable complexity that exhibits behaviour dictated by AI algorithms which are in-turn directed by business logic. The process of design and implementation of agents is referred to as agent-oriented development. There are a number of commercial and open source software frameworks available which are specifically designed for agent-oriented development. Typical Christos Sioutis · Derek Dominish Air Operations Division, Defence Science and Technology Organisation e-mail: {Christos.Sioutis,Derek.Dominish}@dsto.defence.gov.au *
J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 863–872. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
864
C. Sioutis and D. Dominish
applications utilise multiple agents running autonomously with each agent executing multiple behaviours in parallel. There are also provisions for managing an agent‘s knowledge and inter-agent communication. However, interfacing with the environment is usually left entirely up to the developer; this is often described as a strength of the technology. Agents are able to work as a part of any system that can be accessed through its underlying software architecture. The above statement also alludes to a problem observed across all agent development frameworks regardless of their maturity. Specifically, there is little emphasis placed on the fact that agents are inherently complex, distributed software systems. Moreover, there is little support to help developers deal with the integration issues of such systems. This results in reluctance to use agents in large production systems and increased integration risk if chosen to do so. With computer networking now commonplace and the rise of multi-processor hardware architectures there has been a steady increase in interest of Distributed Object Computing (DOC). DOC developers build software components that are logically interconnected and are impervious to how communication with other components is achieved. For example, communication could be routed through system memory, via a local area network, or even the internet. This paper argues that agent-oriented development fits well within the DOC paradigm. Agents could be built using DOC techniques and operate through a middleware architecture. Furthermore, it is argued that agents are themselves built by extending the same middleware utilised by a target system thereby reducing their integration risk considerably.
2 Agent Technology The general concept of an agent involves an entity that is situated within an environment. The environment generates sensations triggering reasoning in the agent causing it to take actions. These actions in-turn generate new sensations, hence forming a sense-reason-act loop. This is re-enforced by Wooldridge’s widely referenced definition [1] “An agent is a computer system that is situated in some environment, and that is capable of autonomous action in this environment in order to meet its design requirements”. Wooldridge continued to define a number of properties that an agent-based system should exhibit: Autonomy, operating without the direct intervention of humans; Social ability, interacting with other agents; Reactivity, perceiving and responding to changes in the environment; Proactiveness, initiating actions to achieve long term goals. Agent-oriented development is concerned with techniques for software development that are specifically suited for developing agent-based systems. These are useful because generic software engineering processes are unsuitable for agents because they fail to capture autonomous behaviour and complex interactions. A very good agent-oriented development methodology is described in [2].
Developing Intelligent Agents with Distributed Computing Middleware
865
3 Distributed Object Computing Modern computer software systems are inherently complex with their many layered interactions and patterned idioms combining to form a structured cohesive whole. With the advances of software technologies over the past 20 years it has become increasingly necessary to utilise architectures based on frameworks that adopt a pattern orientated approach and not just class libraries to achieve this; frameworks that are purpose built and focused on a particular problem domain. A class library is a collection of re-usable functions and classes where functionality is invoked from application code. A framework provides not only cohesive reusable components but also re-usable behaviour [3]. It is through the adoption of well designed and easy to use architectures that are based on pattern orientated implementation frameworks that significant reductions in overall system complexity can be achieved. Application function and component interactions can also be defined through pattern orientated framework mechanisms. Furthermore, when a pattern orientated approach to both architecture and design is followed it allows for a higher order of re-use and enhanced application developer productivity and reliability. It can serve to facilitate and ‘guide’ developers into adopting a solid and reliable patterned approach to the software artefacts of systems development. Middleware is software that connects application components as shown in Fig 1a. It is in essence those mechanistic elements of software that are “in-themiddle” between application components. Through these elements components can be concurrently hosted on different operating systems, platforms and environments and commonly consists of a set of services that allows multiple processes running on one or more machines to interact. Middleware technologies have evolved significantly over the past 10 years in support of the move to coherent distributed architectures and standardisation. One example is the Common Object Request Broker Architecture (CORBA) [4] standard published by the Object Management Group (OMG). Coupled with this standardisation is the capability to simplify the complex problem of integrating disparate distributed application components and systems within a heterogeneous system-of-systems context. In many cases the middleware itself is comprised of different layers. Each layer is responsible for providing a new level of functionality. Typically, lower layers provide generic services and capabilities which are utilised by higher layers that are increasingly more domain or application specific. Applications can be developed to integrate with a particular layer that they support and interoperate with other applications operating on other layers. As a result, most of the work in such systems happens in the underlying layers and the applications sitting on top contain only business logic commonly called services. This is the essence of the Service Oriented Architecture (SOA) approach. Fig 1b illustrates how SOA differs to DOC systems. The dotted horizontal lines in the middleware indicate logical connections that can be routed in different ways. The difference in sizes of the services alludes to the fact that ones built on higher layers need less development because they leverage on the functionality of the layers underneath them.
866
a) Distributed Computing Architecture
C. Sioutis and D. Dominish
b) Service Oriented Architecture
Fig. 1 Connecting software components with middleware
Data distribution is an example of a domain specific layer. It involves the asynchronous dissemination of data between components. The key challenges addressed by data distribution technology are: Real-time, meaning that the right information is delivered at the right place at the right time, all the time; Dependable, thus ensuring availability, reliability, safety and integrity in spite of hardware and software failures; High-Performance, hence able to distribute very high volumes of data with very low latencies. The Data Distribution Service for Real-Time Systems (DDS) [5] is the formalisation through standardisation of traditional Publish/Subscribe (Pub/Sub) capabilities common to many application environments. These Pub/Sub capabilities are expressed through the service as an abstraction for one-to-many communication that provides anonymous, decoupled, and asynchronous communication between a publisher and its subscribers. Different implementations of the Pub/Sub abstraction standard have emerged for supporting the needs of different application domains.
4 Related Research The DOC methodology provides standardised ways for different applications to communicate and also provides some services (eg. Naming, Trading) which can be very useful when utilised in conjunction with agents. For example, in [6] middleware is utilised for inter agent communication as well as access to additional databases. CORBA-based agents operate the same as traditional CORBA components. When plugged into a large DOC system they instantly have access to other agents, information and services. Legacy systems can also be wrapped with CORBA interfaces and become available to the agents. Extending the above concept, the CORBA Component Model (CCM) is a component modelling architecture built on top of CORBA middleware. Researchers
Developing Intelligent Agents with Distributed Computing Middleware
867
have already identified the CCM as a possible approach for merging agents with DOC concepts. Melo [7] describes a CCM-based agent as a software component that exposes a defined interface; has receptacles for external plan components; and utilises sinks/sources for asynchronous messaging. Similarly, Otte [8] describes an agent architecture called MACRO built on CCM that introduces additional algorithms for agent planning and tasking in the application of sensor networks. The main advantage of this application is noted as the abstraction of the details of the underlying communication and system configuration from the agents. Specific limitations identified in this application include overheads imposed by the middleware (due to the limited processing capacity in the embedded systems employed for the sensor network) and having limited control data routed around the network [8]. A number of very capable agent frameworks have been developed to aid in implementing agent based systems, for examples see [9-12]. These frameworks primarily provide new constructs and algorithms that aid a developer in designing an agent based system. Each framework however provides its own version of what an agent is and how behaviours are implemented, brings its own advantages and disadvantages and is suited to different applications. This means that developers must choose and learn to use the appropriate framework for their application. After exploring a number of frameworks it quickly becomes evident that there are conceptual similarities but they are implemented differently. This hints at the existence of patterns that describe the mechanisms of agent behaviour in an implementation independent way. An understanding of these patterns allows switching between frameworks and knowing the expected design elements; albeit implemented differently. Researchers have already discovered the possible advantages of using patterns for designing agents. Weiss [13] describes a hierarchical set of patterns for agents. For example, the “Agent as a Delegate” pattern begins to describe how a user delegates tasks to an agent. An attempt is also made to classify agent patterns based on their intended purpose in [14] and a twodimensional classification scheme is proposed with the intent that it is problem driven and logically categorises agent-oriented problems. Although not specifically mentioning patterns, related work by [15] has attempted to describe agent behaviour as a model. Key behavioural elements utilised by the popular Belief Desire Intention (BDI) reasoning model are defined as: Goals, a desired state of the world as understood by the agent; Events, notifications of a certain state of the internal or external environment of the agent; Triggers, an event or goal which invokes a plan; Plans, respond to predefined triggers by sequentially executing a set of steps; Steps, primitive actions or operations that the agent performs; Beliefs, storing the agents view of its internal state and external environment, these can trigger goal completion/failure events as well as influence plan selection through context filtering; Agent, a collection of the above elements designed to exhibit a required behaviour [15]. This work could be further extended by understanding how the BDI relates to well known cognitive reasoning models like Boyd’s Observe Orient Decide Act (OODA) loop and Rasmussen’s decision ladder and employing additional constructs [16].
868
C. Sioutis and D. Dominish
a) AAF building blocks
b) AAF layer in a SOA system
Fig. 2 Agent Architecture Framework (AAF)
5 Agent Architecture Framework It can be concluded that patterns have as far been applied at a very high level in the agent arena. Examples like the agent delegate pattern provide a use case rather than a generic solution to implement an agent. The proposed Agent Architecture Framework (AAF) will implement generic agent behaviours and algorithms through well understood software design patterns. The aim of the framework is threefold as shown in Fig 2a. First, it will link to specific base support libraries used to implement the required algorithms. Second, it will merge the library APIs with the workflow and conventions of larger SOA architecture. Third, it will be built using templates in order to capture the algorithmic logic but allow it to work against any given type. The intent is for the AAF to be used to implement agents utilising the same middleware architecture as larger SOA systems. This way agents and services will be able to exchange data and operate seamlessly; this concept is illustrated in Fig 2b.
6 Concept Demonstrator In order to demonstrate the viability of this approach a concept demonstrator system has been developed. A simple agent program was written using the JACK Intelligent Agents system and then translated to equivalent constructs using middleware. The agent prints out (draws) shapes in the console based on given commands. Only two shapes have been implemented, squares and triangles. When executing the program it constructs an agent and signals it to print out a small number of shapes of different types and sizes.
Developing Intelligent Agents with Distributed Computing Middleware 1 2 3 4 5 6 7 8 9 10 11 12
869
agent DrawingAgent extends Agent { #posts event DRAWEvent ev; #handles event DRAWEvent; #uses plan SquarePlan; #uses plan TrianglePlan; public DrawingAgent(String name){ super(name); } public void drawShape(String type, int size){ postEvent(ev.drawShape(type,size)); } }
Fig. 3 Code Listing of DrawShape agent using JACK
A source code extract for the JACK agent is shown in Fig 3. The agent is declared by extending the JACK Agent class and calling it DrawingAgent. In lines 2 and 3 it declares that it posts and handles an event of type DRAWEvent. This event is defined in a separate source file and contains two internal variables. A string is used to indicate what type of shape and an integer is used to indicate the size of the shape. In lines 4 and 5 it declares that it uses the plans SquarePlan and TrianglePlan. These are also defined in separate source files. They similarly declare that they handle DRAWEvent and contain an implementation that the agent uses to do the drawing. When the event is posted it is concurrently handled by both plans and a special JACK relevance construct is used to limit its execution. As a result, the SquarePlan plan draws a square event and the TrianglePlan draws a triangle event. In line 6 the agent is constructed using a name. The name is not important in this application but is useful for agent discovery in a multi-agent system. In line 9 a method is defined that is called by the main program to post the draw events to the agent. Considering the above it is easy to deduce that at the very core JACK (similarly to most agent frameworks) exhibits a Reactor pattern [17]. That is, JACK agents respond to events that are triggered by external or internal stimuli and spawn a thread of control to handle these events. Specifically, the agent itself is acting as a reactor of specific events or signals. The JACK kernel is acting as the total system event de-multiplexer and the plans are acting as event handlers. The JACK framework complements the reactor pattern with additional agent constructs and logic. DOC middleware is also very much reactive in nature. When components are “activated” they open themselves to the world via their Object Request Broker (ORB). This means that they expose a defined interface that many external components are able signal for a service request at any time. A concept demonstrator of the AAF has been implemented mirroring agent constructs as seen in the simple JACK DrawShape agent using a combination of CORBA and DDS middleware to provide the reactor functionality. Only a minimal set of features have been implemented at this stage with the intention to grow the AAF to encapsulate all
870
C. Sioutis and D. Dominish
constructs and behaviour patterns encountered in JACK as well as other agent frameworks. The agent initialises a domain participant, binds a topic for the draw event and initialises an associated publisher. Similarly, each plan initialises a subscriber for the specific event. Therefore, posting an event means publishing a topic to the domain and handling an event means subscribing to its topic. DDS allows one-tomany connectivity in arbitrary configurations of subscribers and publishers. This means that when an event is published all plans that have subscribed to it are activated with an independent thread of control. 1 2 3 4 5 6 7 8 9 10 11 12 13
class DrawingAgent : public virtual Agent { DRAW::Event ev; SquarePlan splan; TrianglePlan tplan; public: DrawingAgent(DRAW::Portal &portal, const string &name) : Agent(name), ev(portal), splan(ev) ,tplan(ev) {} void drawShape(const string &shape, const int &size) { DRAW::Data data(shape, size); ev.post(data); } };
Fig. 4 Code Listing of DrawShape agent using preliminary AAF
The source code for the DrawShape agent using the AAF is shown in Fig 4, when executed it behaves identically to the JACK agent. The agent is defined similarly by extending an Agent class which internally initialises a DDS DomainParticipant. The DRAW::Event declared in line 2 is generated by a combination of C++ macros and templates binds the DDS data and topic mechanism with the concept of an event. The content of the data itself is previously defined in another file using the Interface Definition Language (IDL) syntax (part of the CORBA standard) and compiled to generate the appropriate programming structures. The SquarePlan and TrianglePlan are implemented using DDS DataReader objects. Elements have to be explicitly initialised within the DrawingAgent constructor as shown in line 7. There is a two step process in posting an event. The event data is firstly initialised separately in line 10 and subsequently posted in line 11. There is a subtle but important difference here between the JACK and AAF code. In the JACK code the postEvent method is part of the agent’s scope, this can be interpreted as “the agent posts an event”. In the AAF code on the other hand, the post method is in the event’s scope and this can be interpreted as “the agent asks the event to post itself”. This implies that the event can potentially encapsulate behaviour and post itself differently depending on the content and temporal properties of the data passed to it.
Developing Intelligent Agents with Distributed Computing Middleware
871
7 Conclusions Agent development frameworks are very good at guiding developers in specifically creating agents. They introduce a number of behavioural concepts (eg. goals, beliefs) and provide support in the event processing, resource management and structure of their implementation. There is a great emphasis placed on hiding the complexity of the underlying AI algorithms upon which the agents operate. Their integration however within larger systems remains a challenge and is risk prone due to the fact that they provide little support to help the developers deal with the complexities of integration. In the context of DOC and specifically SOA, agents can be viewed as autonomous services with specialised algorithms utilised for intelligent behaviour. The DOC middleware can provide the infrastructure upon which agents communicate with one-another, as well as sense and act upon the environment. When developing agents it is possible to recognise and decompose the patterns of behaviour that agent frameworks implement. These behaviours can then be described using a combination of software design patterns. This allows the implementation of a generic architecture framework built on DOC middleware that implements the agent patterns in a generic way. A developer can subsequently utilise the same middleware employed in their SOA systems and at the same time introduce agents with very little integration risk.
References 1. Wooldridge, M.: Reasoning About Rational Agents. The MIT Press, Massachusetts (2000) 2. Padgham, L., Winikoff, M.: Developing Intelligent Agent Systems A Practical Guide. John Wiley and Sons Ltd., West Sussex (2004) 3. Johnson, R.E., Foote, B.: Designing Reusable Classes. Journal of Object-Oriented Programming 1(2), 22–35 (1988) 4. Object Management Group (2008) Common Object Request Broker Architecture (CORBA) Specification 5. Object Management Group (2007) Data Distribution Service for Real-time Systems (DDS) Specification 6. Cheng, T., Guan, Z., Liu, L., Wu, B., Yang, S.: A CORBA-Based Multi-Agent System Integration Framework. In: IEEE International Conference on Engineering of Complex Computer Systems, pp. 191–198 (2004) 7. Melo, F., Choren, R., Cerqueira, R., Lucena, C., Blois, M.: Deploying Agents with the CORBA Component Model. In: Emmerich, W., Wolf, A.L. (eds.) CD 2004. LNCS, vol. 3083, pp. 234–247. Springer, Heidelberg (2004) 8. Otte, W.R., Kinnebrew, J.S., Schmidt, D.C., Biswas, G.A.: flexible infrastructure for distributed deployment in adaptive sensor webs. In: Aerospace Conference, March 714, pp. 1–12. IEEE, Los Alamitos (2009) 9. Agent Oriented Software (2011) JACK Intelligent Agents User Manual 10. Bellifemine, F., Caire, G., Trucco, T., Rimassa, G.: JADE programmer’s guide. CSELT, TILab and Telecom Italia (2010)
872
C. Sioutis and D. Dominish
11. Laird, J.E., Congdon, C.B.: The SOAR User’s Manual. University of Michigan (2009) 12. Macal, C.M., North, M.J.: Agent-based modelling and simulation. In: Rossetti, M.D., Hill, R.R., Johansson, B., Dunkin, A., Ingalls, R.G. (eds.) Proceedings of the 2009 Winter Simulation Conference (2009) 13. Weiss, M.: Patterns for Motivating an Agent-Based Approach. In: Jeusfeld, M.A., Pastor, Ó. (eds.) ER Workshops 2003. LNCS, vol. 2814, pp. 229–240. Springer, Heidelberg (2003) 14. Oluyomi, A., Karunasekera, S., Sterling, L.: Description templates for agent-oriented patterns. Journal of Systems and Software 81(1), 20–36 (2008) 15. Jayatilleke, G.B., Padgham, L., Winikoff, M.: A model driven component based development framework for agents. International Journal of Computer Systems Science and Engineering 20(4), 273–282 (2005) 16. Sioutis, C.: Reasoning and learning for intelligent agents. University of South Australia (2006) 17. Schmidt, D.C., Stal, M., Rohnert, H., Buschmann, F.: Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects. Wiley & Sons, West Sussex (2000)
Diagnosis Support on Cardio-Vascular Signal Monitoring by Using Cluster Computing Ahmed M. Elmisery, Martín Serrano, and Dmitri Botvich
*
Abstract. The support for remote data processing and analysis is a necessary requirement in future healthcare system. Likewise interconnect/manage medical devices and distributed processing of data collected through these devices are crucial processes for supporting personalised healthcare systems. This work introduces our research efforts to build a monitoring application hosted on a cluster computing environment supporting personalised healthcare systems (pHealth). The application is based on a novel distributed clustering algorithm that is used for medical diagnosis of cardio-vascular signals. The algorithm collects different statistics from the cardiac signals and uses these statistics to build a distributed clustering model automatically. The resulting model can be used for diagnosis purposes of cardiac signals. A cardio-vascular monitoring scenario in cluster computing environment is presented and experimental results are described to demonstrate the accuracy of cardio-vascular signals diagnosis. Advantages of using data analysis techniques and cluster computing in medical diagnosis also discussed in this work. Keywords: Personalised Health Systems, ICT enabled Personal Health, Health Monitoring, Pervasive Computing on eHealth.
1
Introduction
Trends in the next generation of healthcare systems demand applications that can allow prevention of diseases even before they are apparent by using assisted sensors and networks (Yanmin et al. 2009; Lupu et al. 2008). Personalised healthcare systems (pHealth) (Gatzoulis,Iakovidis 2008) is one application that can achieve this objective by presenting a personalized healthcare services. With personalised healthcare services people can receive more accurate diagnostics and early medical assistance. Designing these systems according to individual requirements and based on health data being collected from wearable sensors is challenging task. These systems demands a local processing for the health data and capabilities for Ahmed M. Elmisery · Martín Serrano · Dmitri Botvich Telecommunications Software & Systems Group, Waterford Institute of Technology, Waterford, Ireland J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 873–883. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
874
A.M. Elmisery, M. Serrano, and D. Botvich
distributed data analysis (Herzog et al. 2006) as well as a network infrastructure with high performance (ICT’s) to be able to react in real-time to variations in the data. Cluster Computing and other distributed computing environments have demonstrated their advantages in pHealth systems by offering scalability, availability as well as ability to process massive amount of data (Neves et al. 2008). However, privacy of health data is a main requirement that must be taken into consideration when developing pHealth systems in these environments. Modern medicine can benefits from pHealth systems by building user’s health profiles that can offer personalised support, early assistance, accurate diagnostics and quick response when symptomatic diseases are detected during the local and remote analysis of these profiles. Also, pHealth systems provide procedures to support monitoring the progress of diseases as well as their therapeutic intervention. A key goal in pHealth systems is the ability to perform analysis on either data taken during normal activities or data based on regular medical checks. As a consequence, the people activity/freedom is not affected and accurate results can be attained. Modern pHealth systems allow people to continue their activities and envisage a real time and interactive environment for patient-doctor information exchange. A clear advantage when using these systems is to offer accurate diagnostics for remote healthcare subscribers. We concentrated on distributed clustering as an analysis tool to support healthcare services. This work presents our efforts to build a framework for personalised healthcare applications management. The main objective for this research is to introduce an application for distributed learning clustering (DLC) algorithm (Elmisery,Huaiguo 2010) in the diagnosis of cardiovascular signals. The rest of the work is organized as follow: Section 2 discusses cluster computing as a processing environment to support personalised healthcare applications. Section 3 describes research work results as part of integral cardio-vascular monitoring system in the framework for personalised healthcare applications management introduced in this work. And finally Section 4 summarizes the research advances and concludes this work.
2
Cluster Computing Environment Supporting Personalised Healthcare Applications
This research introduces a framework for personalised healthcare applications management that can manage different healthcare applications running in the same computing environment. This framework is hosted in a cluster computing environment to support massive health data analysis, distributed data storage and health communication networks see Figure (1). Cluster computing play an important role as a processing environment for health data; as it empowers the execution of different health application and the exchange of the data between them. The end-user (i.e. patient or healthy people) has a main role to supply the applications’ databases with his/her health data. This allows these applications to build accurate models for diagnosis and monitoring of health status. Also, the end-user has an important role in the evaluation and enhancement of these applications.
Diagnosis Support on Cardio-Vascular Signal Monitoring
875
The development of user centred systems is crucial and highlights the end-user role in healthcare research and technological development practices. Personalised healthcare applications require an active role for the end-user, as he/she submits health data to the health applications then he/she implies the correct understanding of the medical information provided by the health application. This feature acts as a playground for the healthcare applications to develop a new applications and services.
Fig. 1 Cluster computing as to support personalised healthcare applications
We assume that patients keen to build a local knowledge in order to deal with the alternative solutions of their health problems. The information obtained by end-users can help to enrich health knowledge and research activities. For example if a drug is being consumed by mid-long term period of time. It is difficult and expensive to track the side-effects for it in order to improve or change that drug, but if the patients play the role of self monitoring assisted with ICT’s, they can provide valuable data to assist medical professionals in this task.
3
Personalized Medical Support for Cardio-Vascular Monitoring
This section describes related interdisciplinary application for cardio-vascular monitoring in the framework of personalised healthcare systems. In this application, we employ data clustering techniques to group different cardiovascular signals in order to assign a patient to a physiological condition using no prior knowledge about disease states. We used a new clustering algorithm called distributed learning clustering algorithm (DLC). DLC is based on the idea of stage clustering and offers many advantages than current clustering algorithms, as following: • The algorithm produces clusters with acceptable accuracy; these clusters have different shapes, sizes and densities. • The algorithm was designed with the goal of enabling a privacy preserving version of the data.
876
A.M. Elmisery, M. Serrano, and D. Botvich
• The algorithm helps the user to select proper values for its parameters, and tune parameters for better results. • The algorithm present different statistics for clustering validity in each stage, and use these statistics to enhance the resulting clusters automatically. • The applicability in the algorithm to work in networked environments (p2p, cluster computing or grid systems). Figure (2) depicts the different processes inside our proposed personalized medical application that is used for supporting the diagnosis of cardiovascular signals. In order to enhance the model building process in that application, we proposed an adaptive strategy that utilizes both patient cardiovascular signals and established ECG medical databases, that is more suitable for remote diagnosis. The process described as following:
Fig. 2 Personalized Medical Application for Early Cardio-vascular Diagnostics
1. Use the MIT BIH Arrhythmia database (Moody,Mark 1990) to build an initial clustering model. 2. Test the model on the patient. 3. Collect the new ECG data from this patient. 4. Store the records that achieve high error values beyond a predefined threshold in different Database. 5. Send these data to cardiologists for detailed analysis. This process is done offline. 6. Collect the cardiologists’ annotation and use these data in the model tuning process. ECG recordings carry significant information about the overfull behavior of cardiovascular system and physiological patient conditions. The ECG signal is pre-processed to remove noise and abnormal features, extract features and select
Diagnosis Support on Cardio-Vascular Signal Monitoring
877
certain features that will have high influence on our DLC clustering algorithm. The relevant information is encoded in the form of feature vector that is used as input for DLC algorithm. The key goal for the DLC algorithm is to be able to find patterns in the ECG signals that effectively discriminate between different conditions or categories under investigation.
3.1
ECG Signal Analysis
This section introduces the formalism used for data analysis (Clifford et al. 2006). In the start, each signal is pre-processed by normalization process which is necessary to standardize all the features to the same level. After that, we adjust the baseline of the ECG signal at zero line by subtracting the median of the ECG signal (Yoon et al. 2008). ECG signals can be contaminated with several types of noise, so we need to filter the signal to remove the unwanted noise. ECG signals can be filtered using Low pass filter, high pass filter and Notch filter (Chavan et al. 2008). As shown in figure (3), the ECG signal consists of P-wave, PR-interval, PR-segment, QRS complex, ST-segment, and T-wave. The QRS complex is very important signal that is useful in the diagnosis of arrhythmias diseases. In general the normal ECG rhythm means that there is a regular rhythm and waveform. Correct detection of QRS-complexes forms the basis for most of the algorithms used in automated processing and analysis of ECG (Kors,Herpen 2001).
Fig. 3 ECG Signal Analysis Process Using QRS Metrics (Atkielski 2006)
However, the ECG rhythm in a patient with arrhythmia will not be regular in certain QRS complex (Dean 2006). Our QRS detection algorithm must be able to detect a large number of different QRS morphologies in order to be clinically useful and able to follow sudden or gradual changes of the prevailing QRS morphology. Also it should help to avoid errors related to false positives due either to artifacts or high amplitude T waves. On the other side, false negatives may occur due to low amplitude R waves.
878
3.2
A.M. Elmisery, M. Serrano, and D. Botvich
Clustering Analysis for ECG Signal
Clustering analysis aims to group collection of signals or cases into meaningful clusters without need to prior information about the classification of patterns. There is no general agreement about the best clustering algorithm (Xu,Wunsch 2005); different algorithms reveal certain aspects of the data based on the objective function used. The clustering algorithm learns by discovering relevant similarity relationships between patterns. The result of applying such algorithms is groups of signals evince recurrent QRS complexes and /or novel ST Segments; where each group can be linked to significant disease or risk. Detecting relevant relationships between signals addressed in the literature using different clustering algorithms. For example, the work in (Iverson et al. 2005) applied point wise correlation dimension to analysis of ECG signals from patients suffer from depression. The results obtained in this study indicate that clustering analysis able to discriminate clinically meaningful clusters with and without depression based on ECG information. Authors in (Dickhaus et al. 2001; Bakardjian 1992) cluster collected ECG data into clinically relevant groups without any prior knowledge. This emphasized the advantage of clustering in different classification problems especially in exploratory data analysis or when the distribution of the data is unknown. For detecting the R-peaks in the ECG signal y k , we use an algorithm proposed in(S. et al. 1997). It starts searching for local modulus maxima at large scale then at fine ones. This procedure reduces the effect of high frequency noise; also it uses adaptive time amplitude threshold and refractory period information and rejects isolated and redundant maximum lines (artifacts, high amplitude T wave or low amplitude R waves). Detecting R-peaks starts with calculating zero crossing of the wavelet between a positive maximum- negative minimum that is marked as R-peak m . Once R-peaks are found, the RR-interval between each two consecutive heartbeats is computed by: RR e
m
e
1
m
e
(1)
Where e refers to heartbeat sequence index. For heartbeat segmentation purposes, starting and ending points are obtained as follows: yk
ym
e
0.25RR e : m
e
0.75RR e
(2)
The length of this interval is different for each heartbeat; figure (4) illustrate the detection of RR-interval. The length variability is removed by means of trace segmentation. Following that, Feature extraction is performed using WT decomposition. The heartbeats will represented as an array of time-varying duration, In order to compare the heartbeat morphologies it is necessary to use a proper dissimilarity measure for DLC algorithm. In this work, we used dynamic time warping (DTW) used in (Cuesta-Frau et al. 2007) to find an optimal alignment function between two sequences of different length. The heartbeat is considered if its dissimilarity measure with other elements in the resulting set is higher than a specific threshold. The DLC clustering can be expressed as following:
Diagnosis Support on Cardio-Vascular Signal Monitoring
879
Fig. 4 Illustration for the detection of RR-interval in ECG Signal
Consider is the set of heartbeats, the goal of local learning and analysis LLA step is to find , with beats, where . All dissimilar heartbeat is , … . and similar ones are omitted. Then in distributed represented in , … . is partitioned to a set of clusters clustering step (DC) step the set ,…. ,where each cluster contains proportionate heartbeats. Table 2, shows the resulting heartbeats after the execution of LLA step. Table 1 Heartbeat used Set of heartbeats used in experiment label No.beats
Normal Lbbb Rbbb PVC Ap P Total 9870 7361 6143 8450 2431 7340 41595
Table 2 Resulting heartbeats after LLA Resulting heartbeats after Pre-processing and LLA label No.beats
Normal Lbbb Rbbb PVC Ap P Total 1730 1320 861 1763 935 843 7452
Table 3 Abbreviations Used Label
Meaning
Normal Lbbb Rbbb PVC Ap P
Normal beat Left bundle branch block beat Right bundle branch block beat Premature ventricular contraction Atrial premature beat Paced rhythm
880
A.M. Elmisery, M. Serrano, and D. Botvicch
Our first experiment done on DLC to measure its accuracy in determining diifferent heartbeat clusters, The T figure (5) shows the relation between merge error iin DC stage and the numbeer of clusters. As shown in figure (5), the merge erroor (LET) decreases which indicates only equivalent heartbeat clusters are beinng merged. In order to evaluate th he performance of our algorithm, we used two error meetrics defined in (Cuesta-F Frau et al. 2003) . The first metric is clustering error (CR R) which is the percentage of heartbeats in a cluster that do not correspond to thhe class of such cluster. Seco ond metric is the critical error (CIE) which is the numbeer of heartbeats in a class thaat do not have a cluster and are therefore included in othher’s classes’ clusters.
Fig. 5 Relation between Diffferent Clusters and Merge Error
(a)
(b)
Fig. 6 (a) The Values of CR R for Different No. of Clusters. (b) The Values of CIE for Diifferent No. of Clusters
In the second experim ment, we want measure the relation between different no. of clusters and the valuess of clustering error (CR) and critical error (CIE). Baseed on figure 6(a) and (b), we w can deduce that both CR and CIE for DLC algorithm m decrease with the increasse in no. of clusters till reaching correct number of cluusters. In The third experim ment, we compare the results of DLC with other clusterinng algorithms, here we selecct BIRCH and k-means; we tune the parameters in eacch algorithm to get the samee number of clusters. Figure 6(a) and (b) contain both C CR
Diagnosis Support on Cardio-Vascular Signal Monitoring
881
and CIE values for each algorithm for different number of clusters. The results show the accuracy of the results achieved using DLC compared to other algorithms.
3.3
Privacy in Clustering Cardiovascular Data
Privacy aware users consider ECG signals sensitive information, as these signals allow the health application providers to infer different mental condition for the patients (depressed, afraid, walking or running… etc). As a consequence, they require certain levels of privacy and anonymity in handling their signals. Our aim is to permit clustering of ECG signals without learning any private information about the patient. In reality, these signals do not need to be fully disclosed to the healthcare provider in order to build an accurate model. We preprocess the wavelets coefficients using LLA step to build up sets of initial clusters where the enduser patters are compared with each other locally, then we take the representatives of each initial cluster as an input to the distributed clustering (DC) step. These representatives used as pattern reference to associate clusters patters with same diseases. Also LLA uses wavelets transformation to preserve privacy for ECG signals by decomposing wavelet coefficients .These two steps affect on both accuracy of the results and privacy level attained.
4
Conclusions
This work has introduced our vision for a personalized health systems based on monitoring ECG signals as an application example. Research efforts have been conducted to promote cluster algorithm as an alternative solution for finding out data similarities between cardio-vascular patterns and clusters previously diagnosed/detected. We have introduced a novel solution using DLC algorithms to cluster morphological similar ECG signals and enforcing privacy when matching these patterns. Experimental results were done in set of ECG recordings from MIT database. DLC yielded 99.9% clustering accuracy considering pathological versus normal heartbeats. Both clustering error and critical error percentage was 1%. We will continue investigating computing techniques to map cardiac patterns for different heart diseases and produce reactive solutions in the communications systems.
References Atkielski, A.: Electrocardiography. In. Wikipedia (2006) Bakardjian, H.: Ventricular beat classifier using fractal number clustering. Medical and Biological Engineering and Computing 30(5), 495–502 (1992), doi:10.1007/bf02457828
882
A.M. Elmisery, M. Serrano, and D. Botvich
Chavan, M.S., Agarwala, R.A., Uplane, M.D.: Interference reduction in ECG using digital FIR filters based on rectangular window. WSEAS Trans. Sig. Proc. 4(5), 340–349 (2008) Clifford, G.D., Azuaje, F., McSharry, P.: Advanced Methods And Tools for ECG Data Analysis (2006) Cuesta-Frau, D., Biagetti, M., Quinteiro, R., Micó-Tormos, P., Aboy, M.: Unsupervised classification of ventricular extrasystoles using bounded clustering algorithms and morphology matching. Medical and Biological Engineering and Computing 45(3), 229–239 (2007) Cuesta-Frau, D., Pérez-Cortés, J.C., Andreu-Garcıa, G.: Clustering of electrocardiograph signals in computer-aided Holter analysis. Computer Methods and Programs in Biomedicine 72, 179–196 (2003), doi:10.1016/s0169-2607(02)00145-1 Dean, G.: How Web 2.0 is changing medicine, vol. 333. vol. 7582. British Medical Association, London, ROYAUME-UNI (2006) Dickhaus, H., Maier, C., Bauch, M.: Heart rate variability analysis for patients with obstructive sleep apnea. In: Proceedings of the 23rd Annual International Conference of the Engineering in Medicine and Biology Society, vol. 501, pp. 507–510 (2001) Elmisery, A.M., Huaiguo, F.: Privacy Preserving Distributed Learning Clustering of HealthCare Data Using Cryptography Protocols. In: 2010 IEEE 34th Annual Computer Software and Applications Conference Workshops (COMPSACW), July 19-23, pp. 140–145 (2010) Gatzoulis, L., Iakovidis, I.: The Evolution of Personal Health Systems. Paper Presented at the 5th pHealth Workshop on Wearable Micro and Nanosystems for Personalised Health, Valencia-Spain Herzog, R., Konstantas, D., Bults, R., Halteren, A.V., Wac, K., Jones, V., Widya, I., Streimelweger, B.: Mobile Patient Monitoring - applications and value propositions for personal health. Paper Presented at the pHealth 2006, the International Workshop on wearable Micro- and Nanosystems for Personalized Health, Luzern, Switzerland (2006) Iverson, G., Gaetz, M., Rzempoluck, E., McLean, P., Linden, W., Remick, R.: A New Potential Marker for Abnormal Cardiac Physiology in Depression. Journal of Behavioral Medicine 28(6), 507–511 (2005), doi:10.1007/s10865-005-9022-7 Kors, J.A., Herpen, G.: The Coming of Age of Computerized ECG Processing: Can it Replace the Cardiologist in Epidemiological Studies and Clinical Trials?, pp. 1161– 1165 (2001) Lupu, E., Dulay, N., Sloman, M., Sventek, J., Heeps, S., Strowes, S., Twidle, K., Keoh, S.L., Schaeffer-Filho, A.: AMUSE: autonomic management of ubiquitous e-Health systems. Concurr. Comput.: Pract. Exper. 20(3), 277–295 (2008), doi:10.1002/cpe.v20:3 Moody, G.B., Mark, R.G.: The MIT-BIH Arrhythmia Database on CD-ROM and software for use with it. In: Proceedings of the Computers in Cardiology 1990, September 23-26, pp. 185–188 (1990) Neves, P.A.C.S., Fonsec, J.F.P., Rodrigue, J.J.P.C.: Simulation Tools for Wireless Sensor Networks in Medicine: a Comparative Study. Paper Presented at the International Joint Conference on Biomedical Engineering Systems and Technologies, Funchal, MadeiraPortugal
Diagnosis Support on Cardio-Vascular Signal Monitoring
883
S., S.J., N., T.S., P., B.R.K.: Using Wavelet Transforms for ECG Characterization : an OnLine Digital Signal Processing System, vol. 16, vol. 1. Institute of Electrical and Electronics Engineers, New York (1997) Xu, R., Wunsch, D.: Survey of Clustering Algorithms. IEEE Transactions on Neural Network 16(3), 645–678 (2005), doi: citeulike-article-id:469342 Yanmin, Z., Sye Loong, K., Sloman, M., Lupu, E.C.: A lightweight policy system for body sensor networks. IEEE Transactions on Network and Service Management 6(3), 137–148 (2009) Yoon, S.W., Min, S.D., Yun, Y.H., Lee, S., Lee, M.: Adaptive Motion Artifacts Reduction Using 3-axis Accelerometer in E-textile ECG Measurement System. J. Med. Syst. 32(2), 101–106 (2008), doi:10.1007/s10916-007-9112-x
Multiple-Instance Learning via Decision-Based Neural Networks Yeong-Yuh Xu and Chi-Huang Shih
Abstract. Multiple Instance Learning (MIL) is a variation of supervised learning, where the training set is composed of many bags, each of which contains many instances. If a bag contains at least one positive instance, it is labelled as a positive bag; otherwise, it is labelled as a negative bag. The labels of the training bags are known, but that of the training instances are unknown. In this paper, a Multiple Instance Decision Based Neural Networks (MI-DBNN) is proposed for MIL, which employs a novel discriminate function to capture the nature of MIL. The experiments were performed on MUSK1 and MUSK2 data sets. In comparison with other methods, MI-DBNN demonstrates competitive classification accuracy on MUSK1 and MUSK2 data sets, which are 97.8% and 98.4%, respectively.
1 Introduction Over the last few decades, considerable concern has arisen over learning from examples in machine learning research. Supervised learning attempts to learn a concept from the labelled training examples. Unsupervised learning attempts to learn the structure of the underlying sources of examples, where the training examples are with no labels. In Multiple Instance Learning (MIL), the training set is composed of many bags, each of which contains many instances. If a bag contains at least one positive instance, it is labelled as a positive bag; otherwise, it is labelled as a negative bag. The labels of the training bags are known, but that of the training instances are unknown. The task is to learn the concept from the training set for correctly labelling unseen bags. MIL is first analyzed by Dietterich et al.[4]. They investigated the drug activity prediction problem, trying to predict that whether a new molecule was Yeong-Yuh Xu Department of Computer Science and Information Engineering, Hungkuang University, Taichung, Taiwan e-mail: [email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 885–895. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com
886
Y.-Y. Xu and C.-H. Shih
qualified to make some drug, through analyzing a collection of known molecules. They proposed three axis-parallel rectangle (APR) algorithms to search the appropriate axis-parallel rectangles constructed by the conjunction of the features extracted from molecules. After Dietterich et al., numerous MIL algorithms have been developed, such as Diverse Density[8], Bayesian-kNN and Citation-kNN algorithms[14], EM-DD algorithm[17], etc., and successfully applied to many applications [9, 18, 3, 5, 1, 7, 15]. More works on MIL can be found in [19]. The robustness, adaptation, and ability to automatically learn from examples make neural network approaches attractive and exciting for MIL. When the notion of MIL was proposed, Dietterich et al.[4] indicated that a particular interesting issue in this area is to design multiple-instance modifications for neural networks. Ramon and De Raedt [12] presented a neural networks framework for MIL. Zhang and Zhou proposed a multi-instance neural network named BP-MIP[20, 16], which extended the popular BP [13] algorithm with a global error function defined at the level of bags instead of at the level of instances. How to construct a neural model structure is crucial for successful recognition. All the above neural networks for MIL are based on the all-class-in-one-network (ACON) structure, where all the classes are lumped into one super-network. The supernet has the burden of having to simultaneously satisfy all the teachers, so the number of hidden units tends to be large. In this paper, a Multiple Instance Decision Based Neural Networks (MI-DBNN) is proposed for MIL. The proposed MI-DBNN is a probabilistic variant of the Decision Based Neural Networks (DBNN) [6]. The MI-DBNN inherits the one-class-inone-network (OCON) structure from the DBNN. For each concept to be recognized, MI-DBNN devotes one of its subnets to the representation of that particular concept. Pandya and Macy [11] compared the performance between the ACON and OCON structures, and observed that OCON model achieves better training and generalization accuracies. Beside, the discriminant function of the proposed MI-DBNN is in a form of probability density, which yields high accuracy rates compared to other approaches, as discussed in Section 4. The reminder of this paper is organized as follows. In the next section, the proposed discriminant function is presented in detail. Then, in Section 3, the implementation of the proposed MI-DBNN and its learning scheme are introduced. Experimental results are presented and discussed in Section 4. Finally, Section 5 draws some conclusions and future works.
2 Discriminant Function One major difference between DBNN and MI-DBNN is that MI-DBNN follows the MIL constraint. That is, the discriminant function of MI-DBNN is designed to capture the nature of MIL. Given a set of i.i.d. feature patterns x = {x(t);t = 1, 2, · · · , N} extracted from the instances in a bag B, we assume that the likelihood function p(x(t)|ωi ) for the concept ωi is a linear combination of component densities p(x(t)|ωi , Θri ) in the form
Multiple-Instance Learning via Decision-Based Neural Networks
p(x(t)|ωi ) =
887
Ri
∑ P(Θri |ωi )p(x(t)|ωi , Θri ),
ri =1
where Ri is the number of clusters, Θri represents the ri th cluster, P(Θri |ωi ) denotes the prior probability of the ri th cluster, and p(x(t)|ωi , Θri ) is a D-dimensional Gaussian-like distribution with uncorrelated features 1 T −1 p(x(t)|ωi , Θri ) = exp − (x(t) − μri ) Σri (x(t) − μri ) , (1) 2 where x(t) = [x1 (t), x2 (t), · · · , xD (t)]T is the input pattern, μri = [μri 1 , μri 2 , · · · , μri D ]T is the mean vector, and diagonal matrix Σri = diag[σr2i 1 , σr2i 2 , · · · , σr2i D ] is the covari-
ance matrix. By definition, ∑Rrii=1 P(Θri |ωi ) = 1. Given that B is a positive bag and x(n) is a feature pattern extracted from a positive instance in B, the desired value of p(x(n)|ωi ) is 1, and − log(p(x(n)|ωi )) is adopted to measure the error between p(x(n)|ωi ) and 1. Suppose that (h1 , h2 , · · · , hN ) is a decreasing sequence obtained by sorting {p(x(t)|ωi );t = 1, 2, · · · , N}, and B contains k positive instances. Then, we define the similarity between B and ωi as S(x, ωi , k) = g(x, ωi , k) log(g(x, ωi , k)),where g(x, ωi , k) = (∏kn=1 (− log hn ))−1 . Clearly, if B contains at least one positive instance (i.e. h1 → 1), no matter what values of h2 , · · · , hk are, S(x, ωi , k) has a large value. On the other hand, if all the instances in B are negative, that is, h1 , · · · , hk are far from 1, the value of S(x, ωi , k) is small. Consequently, it is S(x, ωi , k) that captures the nature of MIL. Since S(x, ωi , k) contains a division, in order to translate division into subtraction, we apply the logarithm to S(x, ωi , k). Accordingly, the discriminant function of each subnet in MIDBNN is defined as φ (x, wi , k) = log(S(x, ωi , k)), and can be further derived as follows: k
k
n=1
n=1
φ (x, wi , k) = log( ∑ (− log(− log hn ))) + ∑ (− log(− log hn ))
(2)
where wi = { μri , Σri , P(Θri |ωi ), Ti }. Ti is the output threshold of the ith subnet in MI-DBNN. Use the data set X = {xb ; b = 1, 2, · · · , M}, where xb = {xb (t);t = 1, 2, · · · , Nb } is a feature vector extracted from instances in the bth bag. Then, the energy function for MI-DBNN is defined as E(X, wi ) =
M
∑ φ (xb, wi , Nb ),
(3)
b=1
If X is a positive set, (3) should have a large value; otherwise, (3) should have a small value. In order to verify the proposed discriminant function, we created an artificial data set: five positive and five negative bags, each with 100 instances. Each instance was chosen uniformly at randomly from a [0, 1] × [0, 1] ∈ R2 domain. The concept was located at two 0.1 × 0.1 squares in the Cartesian plane, one with corners at
888
Y.-Y. Xu and C.-H. Shih
(0.15, 0.75), (0.25, 0.75), (0.15, 0.85), and (0.25, 0.85) and the other with corners at (0.75, 0.15), (0.85, 0.15), (0.75, 0.25), and (0.85, 0.25). A bag was labelled positive if at least one of its instances fell within the square, and negative if none did. Each of the squares contains at least one instance from every positive bag and no negative instances. The created data were drawn in Fig. 1, where instances in negative bags are dots, and in positive bags are numbers.
1
4
35 24
5
4
1 1 5 411 5 3 2 3 1 4 23 51 4 2 2 54 3
5 2 1 2 2 2 34 4 5 5 4 34 3 1 45 1 3 54 5 5 33 3 51 21 2 5
22
3
3
1 51 2 4 4 2 2 11153 3 5 24253 5355 2 4 2 1 0.8 4 11 14 5 24 35 4 3 1 5 3 2 4 1 15 42 1 1 4 3 1 2 4 2 4 4 3 55 3 4 5 2 1 3 3 4 1 2 2 2 5 0.7 2 4 5 4 4 4 4 4 2 12 2 55 43 5 2 3 4 34 1 5 4 3 4 51 2 3 5 1 4 34 1 3 51 5 0.6 4 1 3 2 3 4 3 3 2 43 1 1 3 1 4 532 1 2 511 5 4 1 1 4 4 2 4 1 2 5 4 253 23 21 4 2 3 5 2 1 143 3 3 3 3 43 2 3 14 5 2 3 2 52 0.5 4 5 1 4 5 5 4 3 4 25 3 5 1 4 5 14 5 1 2 1 3 4 4 3 2 1 1 41 4 1 5 5 3 3 3 5 4 2 0.4 5 5 5 2 5 2 5 5 2 2 3 1 51 1 3 12 45 5 4 2 23 51 3 1 2 3 1 34 3 1 3 5 3 3 3 55 2 4 2 2 3 5 0.3 2 34 4 1 2 2 1 1 23 2 5 55 15 5 2 4 3 1 1 4 2 3 4113 541 3 4 2 5544 1 1 3 2 4 2 3 5 5 3 1 0.2 244 5 12 2 51 3 2 53 5 43 4214 5 53 3 3 14 3 5 1 22 52 3 42 3 4 4 3 5 4 23 4 4 1 12 4 4 55 33 1 1 2 1 2 25 4 2 4 5 5 1 5 0.1 25 1 1 11 5 13 2 4 4 2 22 1 4 2 3 51 3 11 53 2 2 2 1 3 21 3 0 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.9
3
Fig. 1 The artificial data contains five positive and five negative bags. The instances in negative and positive bags are dots and numbers, respectively. The concept was located at two 0.1 × 0.1 squares. Each square contains at least one instance from every positive bag and no negatives.
In order to highlight the advantages of finding the concept squares shown in Fig. 1, we plotted the proposed energy surface, the regular log-likelihood surface, and the corresponding contour plots with gradient vectors across the domain in Fig. 2. It is clear that picking out the global maximum (the desired concept) in Fig. 2(a) is easier than in Fig. 2(b). This phenomenon can be more clearly found if we compare the gradient vectors in Fig. 2(c) to those in Fig. 2(d).
3
Multiple-Instance Decision Based Neural Networks
The proposed MI-DBNN has a modular network structure. One subnet is designated to represent one object concept. For an m concepts MIL problem, MI-DBNN consists of m subnets. The structure of MI-DBNN is depicted in Fig. 3. To approximate the density function in (1), we apply the elliptic basis functions to serve as the basis function for each cluster,
ϕ (x(t), ωi , Θri ) = −
1 D (xd (t) − μri d )2 . ∑ 2 d=1 σr2d i
After passing an exponential activation function, exp{ϕ (x(t), ωi , Θri )} can be viewed as the same distribution as described in (1).
Multiple-Instance Learning via Decision-Based Neural Networks
200
889
500
150 100 50 0
0
−50 −100 −150 −200 1
−500 1 0.8
0.8
1 0.6
1 0.6
0.8
0.8
0.6
0.4
0.6
0.4
0.4
0.2
0.2
0.2 0
0.4 0.2 0
0
0
(a)
(b) 0
10
0.8
00
−1
0
50
0.9
−5
−50
0
50
0.9
−3
00
1 0
1
00
−2
50
−1
−2
50
0.8 50
0
−5 0
0
−1
00
−5
−50
0
0
−50
0.5
−10
−100
0.6 0
0.6
0
−150
0.7 0
0.7
−20
−150
−50
−50
0.5
50 0
0
0.4
0
0
10
−50
50
0.3
0.3
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(c)
0.8
250
0 0
10
0
50
50
0
−50
0 10
0
0.1
200
0
150
−100
0.1
200
0.2
50
0.2
0
15
50
100
15
0.4
0.9
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(d)
Fig. 2 Energy surfaces over the example data of Fig. 1. (a) is the the proposed energy surface, and (b) is the log-likelihood surface over the example data of Fig. 1. (c) and (d) are the contour plots with gradient vectors of the the proposed energy and the log-likelihood surfaces, respectively. It is clear that finding the peak which is within the desired concept using the proposed energy function is easier than using the regular log-likelihood function.
The training examples for each concept are from a set of bags with predefined labels (i.e., positive or negative). MI-DBNN adopts the decision based learning rules to learn the concepts. Unlike the approximation neural networks, where exact target values are required, the teacher in MI-DBNN only tells the correctness of the classification for each training bag. The detailed description of the learning phase is given in the follows.
3.1 The Learning Phase As described in Section 2, (3) should be maximized if the training pattern is from the positive set; otherwise, it has to be minimized. Given a positive training set X+ and a negative training set X− , the following reinforced and antireinforced learning techniques are applied to the corresponding subset.
890
Y.-Y. Xu and C.-H. Shih 5HFRJQLWLRQ5HVXOW
'(&,6,211(7 ORJ\ \
ORJ\ \ y
y I
I
I
I
I
I
K KN K %XIIHUDQG6RUWLQJ
K KN K %XIIHUDQG6RUWLQJ
p(x(t ) ω1 )
p(x(t ) ωm )
P(Θ1 ω1 )
P(Θ1 ωm )
P(Θ R1 ω1 )
(
p(x(t ) ω1 , Θ1 ) p x(t ) ω1, Θ R1
)
P (Θ Rm ω m )
(
p (x (t ) ωm , Θ1 ) p x (t ) ωm , Θ R m
FRQFHSWFODVV
FRQFHSWFODVV
)
P
LQSXWXQODEHOHGEDJ[ ^[W W 1` I
IK ORJORJK
Fig. 3 Structure of MI-DBNN. Each subnet is designated to recognize one concept.
Reinforced Learning:
(m+1)
= wi
(m+1)
= wi
wi
Antireinforced Learning: wi
(m)
+ η ∇E(X+ , wi ),
(m)
− η ∇E(X− , wi ),
where 0 < η ≤ 1 is a user defined learning rate, and ∇E are the gradient vectors computed as follows: ∂ E(X, wi ) 1 = −1 Nb ∂ μri d wi =w(m) log(− log p(xb (t)|ωi )) ∑t=1 i ⎛ ⎞ Nb (m) (Θ |ω , x (t)) (xbd (t) − μ (m) ) p ri i b ri d ⎠ ×∑⎝ , (m) 2 log p(x (t)| ω ) i b (σri d ) t=1 ∂ E(X, wi ) 1 = −1 Nb ∂ σr2i d (m) log(− log p(x (t)| ω )) ∑ i b t=1 wi = wi
Multiple-Instance Learning via Decision-Based Neural Networks
891
⎛
⎞ (m) (Θ |ω , x (t)) (xbd (t) − μ (m) )2 p r i r d b i i ⎠, ×∑⎝ (m) 2 log p(xb (t)|ωi ) (σ )4 t=1 Nb
ri d
where p(m) (Θri |ωi , xb (t)) is the conditional posterior probability, p(m) (Θri |ωi , xb (t)) =
P(m) (Θri |ωi )p(m) (xb (t)|ωi , Θri ) . p(m) (xb (t)|ωi )
As to the conditional prior probability P(Θri |ωi ), since the EM algorithm can automatically satisfy the probabilistic constraints ∑Rrii=1 P(Θri |ωi ) = 1 and P(Θri |ωi ) ≥ 0, it is applied to update the P(Θri |ωi ) values to regulate the influences of different clusters: P(m+1) (Θri |ωi ) =
M Nb 1 ∑ ∑ p(m) (Θri |ωi , xb (t)). M · Nb b=1 t=1
(4)
Threshold Updating The threshold value of MI-DBNN can also be learned by the reinforced and antireinforced learning rules. Since the decrement of the discriminant function φ (x, wi , k) and the increment of the threshold Ti have the same effect on the decision making process, the direction of the reinforced and antireinforced learning for the threshold is the opposite of the one for the discriminant function. For example, if an input data set x belongs to the concept ωi but φ (x, wi , k) < Ti , Ti should reduce its value. On the other hand, if x does not belong to ωi but φ (x, wi , k) > Ti , Ti should increase. The proposed adaptive learning rule to train the threshold Ti is described as follows. Define d(X, ωi ) ≡ Ti − φ (x, wi , k) and penalty function f (d(x, ωi )), which can be either a step function, a linear function, or a fuzzy-decision sigmoidal function. Then, the threshold values can be trained as follows: Given a positive learning parameter γ , at step j, ( j) Ti − γ f (d(x, ωi )), if x ∈ ωi (reinforced learning); ( j+1) Ti = ( j) Ti + γ f (d(x, ωi )). otherwise (antireinforced learning), In order to verify the proposed learning rules, we trained MI-DBNN to learn the concept from the artificial data created in Section 2. During the learning phase, the trajectory of the predicted positions of the desired concepts are shown in Fig. 4. Clearly, MI-DBNN successfully picked out the peak and learned the desired concept.
892
Y.-Y. Xu and C.-H. Shih 1 1 5 41 53 2 2 5 2 1 22 3 3 22 3 5 4 23 51 4 2 1 4 34 455 5 41 3 2 1 24 5 4 1 2 1 3 544 5 4 3 3 454 2 5 3 3 2 111523 5 3 242535355 3 5121 2 2 4 13 4 5 0.8 11 153 524 4 3 14 5 2 1 1 4 1 5 42 1 4 3 1 4 4 4 3 22 55 3 2 4 3 3 1 5 21 5 4 2 0.7 2 4 5 2 4 4 4 4 4 2 4 5 43 5 2 2 55 4 4 12 3 51 3 34 3 1 5 4 34 1 32 51 1 0.6 4 2 3 1 5 23 133 53 3 4 2 3 1 1 5 4 11 4 4 1 4 12 4 1 22 5 4 1 51 4 2 2 3 3 5 2 2 3 35 3 4 2 1 414 3 5 32 3 43 3 2 1 52 3 0.5 4 5 1 24 4 5 5 34 25 3 1 4 5 14 5 2 5 1 1 3 4 2 1 4 41 4 5 3 3 3 31 5 5 1 4 0.4 55 2 2 5 5 2 5 5 1 2 3 3 1 51 51 12 5 4 45 22 23 3 1 2 1 3 4 3 3 1 3 3 5 5 2 3 5 4 2 2 5 3 0.3 2 4 34 13 2 21 1 5 23 2 5 5 15 5 2 113 4 4 5 43 1 3 4 13 4 55442 135 51 3 1 3 2 4 2 0.2 244 2 2 53 5 25 4 4214 5 53 53 3 3 4 455312 451 3 1 23 1 24 3 2 2 3 4 3 2 4 4 2 1 3 2 1 4 4 55 1 12 1 35 25 4 2 4 1 5 0.1 25 5 1 11 5 13 1 4 2 4 2 223 1 4 2 3 2 51 3 1 5 2 1 2 1 1 3 3 0 1 2 0 0.2 0.4 0.6 0.8 1 1
4
0.9
3
35
Fig. 4 MI-DBNN is trained to learn the concept from the artificial data of Fig.1. The concept was located at two 0.1 × 0.1 squares. The red circles show the trajectory of the predicted positions of the desired concepts during the MI-DBNN learning phase. Apparently, MI-DBNN successfully picked out the peak and learned the desired concept.
3.2 The Recognition Phase The goal of the MI-DBNN recognition phase is to obtain instances from the input unlabelled bag, to compare them with the concept models learned before, and to find the concepts the input unlabelled bag belonging to. As shown in Fig. 3, each subnet receives the input patterns extracted from the unlabelled bag, and computes the discriminate function as shown in (2). Then, the results of (2) are compared with the threshold Ti . According to the different applications, the unlabelled bag may be labelled as one concept or multiple concepts. The vector V in MI-DBNN is the recognition vector showing which concepts unlabelled bag belonging to. The ith element of V is set to 1 if the output of the ith discriminate function is larger than Ti , which implies that the given bag belongs to the concept ωi . Otherwise, the ith element of V is set to 0 implying that the given bag does not belong to the concept ωi . From the recognition vector, one can recall which concepts the given bag belonging to.
4 Experiments In order to show the proposed MI-DBNN ability to deal with the MIL problems, the MUSK data sets available from UCI Machine Learning Repository [10] are used in the experiments. The MUSK data sets include MUSK1 and MUSK2. The data sets consist of descriptions of molecules. The target protein is a putative receptor in the human nose, and a molecule binds to the receptor as positive if it smells like a musk. The MI-DBNN is trained to learn what shape makes a molecule musky. MUSK1 has 92 molecules (bags), of which 47 are positive, with an average of 5.17 shapes (instances) per molecule. MUSK2 has 102 molecules, of which 39 are positive, with
Multiple-Instance Learning via Decision-Based Neural Networks
893
an average of 64.69 shapes per molecule. Each instance (in this case is conformation) is represented by 162 rays, along with four additional features that specify the location of a unique oxygen atom common to all the molecules. As a consequence, each instance contains totaly 166 features. Ten-fold cross validation is performed on each MUSK data set. MI-DBNN is then trained for ten times, each of which involves a different combination of nine partitions as the training set and the one as the testing set. Table 1 summarizes the prediction accuracy of 8 MIL algorithms in the literature: GFS elim-kde APR, GFS elim-count APR, and iterated-discrim APR [4], Diverse Density [8], Citation-kNN, Bayesian-kNN [14], EM-DD [17]1 , and MILES [3]. We see from Table 1 that the proposed MI-DBNN obtains an average accuracy of 97.8% on MUSK1 and 98.4% on MUSK2, which achieves the best performance on both MUSK1 and MUSK2 data sets. The experimental result tells us that MI-DBNN has the capability of dealing with the MIL problems, and demonstrates competitive classification accuracy in comparison with the other methods.
Table 1 Comparison of the predictive accuracy (%correct±standard deviation) on the MUSK data sets. Algorithms MUSK1 MUSK2 EM-DD [17] 84.8 84.9 iterated-discrim APR [4] 92.4 89.2 GFS elim-kde APR [4] 91.3 80.4 GFS elim-count APR [4] 90.2 75.5 Diverse Density [8] 88.9 82.5 Citation-kNN [14] 92.4 86.3 Bayesian-kNN [14] 90.2 82.4 MILES [3] 87.0 93.1 MI-DBNN 97.8±1.14 98.4 ±1.05
5 Conclusions We have presented a Multiple Instance Decision Based Neural Networks (MIDBNN) for multiple-instance learning (MIL). A novel discriminant function is proposed to capture the nature of MIL. We tested MI-DBNN over the benchmark data sets taken from applications of drug activity prediction. In comparison with other methods, MI-DBNN demonstrates competitive classification accuracy on MUSK1 and MUSK2 data sets, which are 97.8% and 98.4%, respectively. Since MI-DBNN is a general algorithm that has not been optimized toward any data, applying MIDBNN to more real-world applications such as content-based image retrieval is an 1
The EM-DD results reported in [17] were obtained by selecting the optimal solution using the test data. The EM-DD result cited in this paper was provide by [2] using the correct algorithm.
894
Y.-Y. Xu and C.-H. Shih
interesting issue for future works. Furthermore, it is also interesting to employ feature selection techniques to test if feature selection can improve the performance of MI-DBNN .
References 1. Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 288–303 (2010) 2. Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multipleinstance learning. In: Advances in Neural Information Processing Systems 15, pp. 561– 568. MIT Press, Cambridge (2003) 3. Chen, Y., Bi, J., Wang, J.Z.: Miles: Multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(12), 1931– 1947 (2006) 4. Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89(1-2), 31–71 (1997) 5. Gu, Z., Mei, T., Hua, X.S., Tang, J., Wu, X.: Multi-layer multi-instance learning for video concept detection. IEEE Transactions on Multimedia 10(8), 1605–1616 (2008) 6. Kung, S., Taur, J.: Decision-based hierarchical neural networks with signal/image classification applications. IEEE Transactions on Neural Networks 6(1), 170–181 (1995) 7. Mandel, M.I., Ellis, D.P.W.: Multiple-instance learning for music information retrieval. In: Proceedings of Ninth International Conference on Music Information Retrieval (2008) 8. Maron, O., Lozano-Perez, T.: A framework for multiple-instance learning. In: Advances in Neural Information Processing Systems, pp. 570–576 (1998) 9. Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 341–349. Morgan Kaufmann Publishers Inc., San Francisco (1998) 10. Murphy, P.M., Aha, D.W.: Uci repository of machine learning databases 11. Pandya, A.S., Macy, R.B.: Pattern Recognition with Neural Networks in C++. CRC Press, Boca Raton (1995) 12. Ramon, J., Raedt, L.D.: Multi instance neural networks. In: Proc. ICML 2000 Workshop on Attribute-Value and Relational Learning (2000) 13. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation, pp. 673–695. MIT Press, Cambridge (1988) 14. Wang, J.: Solving the multiple-instance problem: A lazy learning approach. In: Proc. 17th International Conf. on Machine Learning, pp. 1119–1125. Morgan Kaufmann, San Francisco (2000) 15. Zafra, A., Gibaja, E.L., Ventura, S.: Multiple instance learning with multiple objective genetic programming for web mining. Appl. Soft Comput. 11, 93–102 (2011) 16. ling Zhang, M., hua Zhou, Z.: Improve multi-instance neural networks through feature selection. In: Neural Processing Letters, pp. 1–10 (2004)
Multiple-Instance Learning via Decision-Based Neural Networks
895
17. Zhang, Q., Goldman, S.A.: Em-dd: An improved multiple-instance learning technique. In: Advances in Neural Information Processing Systems, pp. 1073–1080. MIT Press, Cambridge (2001) 18. Zhang, Q., Goldman, S.A., Yu, W., Fritts, J.E.: Content-based image retrieval using multiple-instance learning. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 682–689. Morgan Kaufmann, San Francisco (2002) 19. Zhou, Z.H.: Multi-instance learning: A survey. Tech. rep., AI Lab, Department of Computer Science and Technology, Nanjing University, Nanjing, China (2004) 20. Zhou, Z.H., Zhang, M.L.: Neural networks for multi-instance learning. Tech. rep., AI Lab, Department of Computer Science and Technology, Nanjing University, Nanjing, China (2002)
Software Testing – Factor Contribution Analysis in a Decision Support Framework Deane Larkman, Ric Jentzsch, and Masoud Mohammadian
*
Abstract. A decision support framework has been developed to guide software test managers in their planning and risk management for successful software testing. Total factor contribution analysis for risk management is applied to the decision support framework. Total factor contribution analysis (FCA) is a tool that can be used to analyse risk before and during software testing. This paper illustrates how software test managers can apply FCA to the decision support framework to assess risk management issues, and interpret the results for their implications on successful software testing.
1 Introduction Software issues are things that are inconsistent, incomplete, inappropriate, or do not conform to the intended good practices of the software (Institute of Electrical and Electronics Engineers 1994). Software testing identifies issues that result when software does not meet its intended requirements in some way before, during, and after it is executed. Most software issues are rarely obvious: they can be simple or subtle, or both. Often it is hard to distinguish between what is an issue and what is not an issue (Patton 2006). Software testing planning is an essential part of the software testing life cycle (Editorial 2010). However, it is a labour intensive and complex activity (Ammann and Offutt 2008). Planning for successful software testing relies on the expertise and experience of the software test manager (Pinkster et al. 2004). Little to no research has been reported on the development or use of a decision support framework for software testing. Despite an intensive literature search we Deane Larkman · Masoud Mohammadian Faculty of Information Sciences and Engineering University of Canberra, ACT, Australia e-mail: [email protected], [email protected] Ric Jentzsch Business Planning Associates Pty Ltd, ACT, Australia e-mail: [email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 897–905. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
898
D. Larkman, R. Jentzsch, and M. Mohammadian
are only aware of our previous work on decision support frameworks for software testing (Larkman et al. 2010a, b). Research on assessing risk management in software testing is also scarce. To assist the software test manager in their planning and risk management for successful software testing, a decision support framework has been developed.
2 Decision Support Frameworks Defined A decision support framework (DSF) establishes a structure and organisation about a phenomenon. The phenomenon is a type of thing, event, or situation (Alter 2002). The framework can be a real or conceptual structure, or it can be an abstract logical structure (Burns and Grove 2009). Generic decision support frameworks use a variety of terms to describe their structure. The structure consists of a high level set of requirements that provides for the inclusion of such concepts as: elements, components, objects, entities, and/or factors. The ability to plug requirements into a decision support framework provides a guide to support analysis and information on achieving an overall goal or objective. A decision support framework is specific to a particular environment, application, business issue, or concept. A decision support framework is based on what is to be achieved, not how it is to be achieved; and therefore the framework implicitly decouples the what from the how (Larkman et al. 2010b). At any particular point in time, a decision support framework identifies an invariant set of concepts and therefore infers a discrete boundary. The framework can be a defined approach, a set of rules, a set of policies, a set of data for the understanding of an issue or domain, a high level definition to achieve some outcome, or a group of outcomes.
3 Decision Support Framework Used for This Study The development of the decision support framework for software testing has been reported elsewhere (Larkman et al. 2010b). The decision support framework was developed to assist the software test manager to guide them in their planning task for successful software testing. The framework consists of: 1.
a set of elements that represent major software testing categories that need to be addressed before testing begins; 2. element factors that provide details about the related element that the software manager needs to consider when planning and assessing risk for successful software testing; and 3. a set of directional signed relationships that indicate elements’ influences, directly or indirectly, on successful software testing, and that provide a basis for risk management assessment before and during software testing.
Software Testing – FCA in a Decision Support Framework
899
The DSF includes a goal (C0), three primary elements (C1, C2, and C3) and one secondary element (C3.1). The influences of elements are mapped by the directional signed relationships to the goal or other elements. Along each directional signed relationship is an illustrative influence weighting, expressed as a percentage. The percentages shown in Fig. 1 are for illustration and discussion purposes only. These percentages will vary by the type of software to be tested within the organisational context, and by the software test manager’s experience and expertise. Influence weightings are determined by the software test manager, and they define the strength of an element’s influence on achieving successful software testing. For each of the four elements there are a set of factors which define the details of an element, as shown in Fig. 1.
Fig. 1 Decision Support Framework for Software Testing
The DSF is applied in two steps. First the software test manager assigns input (influence weightings and factor contribution percentages) to the DSF, to allow them to model their specific testing situation. Second the software test manager evaluates and interprets the DSF model using one or more analytical techniques. Using the DSF for software testing comes with many advantages: •
First, it provides a template that the software test manager can apply to any type of software that is planned to be tested.
900
D. Larkman, R. Jentzsch, and M. Mohammadian
•
Second, the DSF serves as a guide for software test managers of those things (in the form of elements and factors) they need to consider when planning for software testing. Third, the DSF can be used for assessment of risk management issues. In other words, when things do not go 100% according to plan (such as a resource shortage or an unexpectedly reduced testing schedule), the DSF can be analysed for the risk impact of not achieving successful software testing. Finally, the DSF provides a basis for software test project review, used for management reporting. The DSF can be used to compare planning estimates against actual results.
•
•
The DSF generates a specific software testing model that is used to analyse the type of software to be tested from two perspectives: 1. Static perspective; and 2. Dynamic perspective. The static perspective is used to ensure that all software testing considerations have been thought about, and that critical path analysis and total path weight analysis have been done (Larkman et al. 2010a). The static perspective is not part of this paper and is not discussed herein. The dynamic perspective is used to more formally understand the risk management assessment based on the DSF model. The dynamic perspective includes: • •
Fuzzy cognitive maps; and Factor contribution analysis.
Fuzzy cognitive map analysis, for the DSF, has been discussed elsewhere and is not part of this paper (Larkman et al. 2010a).
4 Factor Contribution Analysis (FCA) This paper concentrates on factor contribution analysis for risk management assessment. Fig. 2 will be used to discuss the use of FCA for the decision support framework (established in Fig. 1) (Larkman et al. 2010b).
Software Testing – FCA in a Decision Support Framework
901
Fig. 2 Decision Support Framework for Factor Contribution Analysis
4.1 Factors and Elements Factors are a set of items that relate to a particular element. Factors provide more detailed information about an element as shown in Fig. 1. Each factor contributes to the success of the element it is associated with. By success we mean the degree that the element is able to fulfil its influence on achieving the goal. Each factor contributes to an element being able to influence the goal directly or indirectly, through its influence on other elements. The sum of factors for each element must be equal to 100%, as shown in Fig. 2. When the sum of an element’s factor contributions is not equal to 100%, then the influence weighting percentage(s) associated with that element will be less than their assigned percentage(s). When the factors contribute less than 100%, then factor contribution analysis can be used to assess the level of risk of not achieving the goal of successful software testing.
4.2 Factor Contribution Analysis (FCA) Technique FCA looks at the changes to the total factor contribution of one or more elements in analysing the risk of not being able to achieve the goal. Individual percentages for each factor are not material to the analysis, as only the “total of 100%” is used. Of course, once the analysis has been done, the software test manager would be more concerned with which factor, or factors, contribute to the loss, and which ones require attention.
902
D. Larkman, R. Jentzsch, and M. Mohammadian
Some basic rules and issues need to be remembered in factor contribution analysis. 1. 2. 3. 4. 5. 6.
7.
The influence weighting attributed to each element will be ≥ 10% and ≤ 90%; Factor contribution analysis is only concerned with an element’s total factor contribution and not the individual factors for that element; Risk management is based only on the intent of achieving the goal, and is not applicable to the individual element; If a factor contribution loss is equal to 100%, the goal cannot be reached. Thus the realistic maximum loss for a factor contribution cannot be greater than 90%; No individual factor can be 100%, as that would make all the other factors for that element 0%, and the goal would not be achieved; Back tracking between elements is not permitted, as factor contribution analysis would result in an endless loop (such would be the case for C2 → C3 → C2 → C3 → C2 → etc…); and The decision support framework for software testing, in its current structure, has a maximum of 16 possible scenarios.
Risk management assessment begins with what happens when an element’s factor contribution fails to meet 100%. The following table shows risk management criteria, and how they are interpreted against changes in the influence weightings between primary elements and the goal. The influence weighting change criteria are based on the original influence weightings compared with the new influence weightings, which are determined from the factor contribution change. Table 1 Risk Management Criteria
Risk Category Low Low-Medium Medium Medium-High High
Influence Weighting Change Criteria: Percentage of Original Weightings ≥ 90% ≥ 80% to < 90% ≥ 70% to < 80% ≥ 60% to < 70% < 60%
4.3 Element Total Factor Contribution Effects When C1 (test management) total factor contribution falls below 100%, the following influence weightings are affected: C1 C1 C1
→ C0 → C2 → C0 → C2 → C3 → C0
Software Testing – FCA in a Decision Support Framework
903
When C2 (test information) factor contribution falls below 100%, the following influence weightings are affected: C2 C2
→ C0 → C3 → C0
When C3 (test environment) factor contribution falls below 100%, the following influence weightings are affected: C3 C3
→ C0 → C2 → C0
When C3.1 (technical support) factor contribution falls below 100%, the following influence weightings are affected: C3.1 C3.1
→ C3 → C0 → C3 → C2 → C0
4.4 Combined Elements Factor Contribution Effects This is the case when two, three or all four element’s factor contributions fall below 100%. For example, what if C1’s (test management) and C3.1 (technical support) factor contributions fall below 100%? Based on the DSF (Larkman et al. 2010b) shown in Fig. 2, and using what has been discussed thus far, the following influences are analysed: C1 → C0 C1 → C2 → C0 C1 → C2 → C3 → C0 C3.1 → C3 → C0 C3.1 → C3 → C2 → C0 In other words the goal (successful software testing) will be affected by the change to C1 and C3.1. The effect of C1 is from C1 to C0, C2 to C0 via C1, and C3 to C0 via C2. The effect of change to C3.1 adds to the effect of change to C1. The effect of C3.1 is from C3 to C0 and C2 to C0 via C3.
5 Illustrated Examples An example: what happens if the influence weightings for C1 and C3.1 are reduced by 10%? The new influence weightings on the goal are: C1 C2 C3
→ C0 – 72% (80 minus reduction: 10% of 80) → C0 – 56% (70 minus reduction: 10 + 10 = 20% of 70) → C0 – 44% (55 minus reduction: 10 + 10 = 20% of 55)
904
D. Larkman, R. Jentzsch, and M. Mohammadian
C1 is only affected by changes to its factor contributions. However C1 affects both C2 and C3 influence weightings on the goal. C3.1 affects both C3 and C2 influence weightings on achieving the goal, but not C1. The total affect on achieving the goal has been reduced. If the sum of the new influence weightings on the goal (172) is divided by the sum of the influence weightings on the goal, before the change in factor contribution (205), the result is 83.9%. The loss in factor contribution on C1 and C3.1 shows that the influence weighting loss has a LOW-MEDIUM risk on not achieving the intended goal (see Table 1). As another example, what if the following occurs: Table 2 Factor Contribution Analysis #2
Elements
Factor Contribution Reduced to
Goal Weight Reduced by
95% 75% 90% 80%
5% 60% 60% n/a
C1 C2 C3 C3.1 TOTAL
New Influence Weighting on the Goal 76.0 28.0 22.0 (via C3 and C2) 126.0
The calculations of the new influence weightings on the goal are: C1 → C0 (80 minus reduction: 5% of 80), C2 → C0 (70 minus reduction: 5 + 25 + 10 + 20 = 60% of 70) and C3 → C0 (55 minus reduction: 5 + 25 + 10 + 20 = 60% of 55). The risk of achieving the goal has been substantially affected. If the sum of the new influence weightings (126.0) is divided by the sum of the influence weightings on the goal, before the change in factor contribution (205), the result is 61.5%. Using Table 1 the loss in factor contribution across the board shows that the loss has a MEDIUM-HIGH risk on not achieving the intended goal.
6 Conclusion Factor Contribution Analysis (FCA) is a tool that can be used by software test managers in risk management assessment, based on the DSF model for their particular software testing situation. This paper has demonstrated the use of FCA when planning for successful software testing, and how to analyse the risk if the factors of one or more elements are not 100%. It was shown how to interpret the FCA results and understand their meaning for the risk of not achieving successful software testing. The DSF is an important contribution to the tool set that is needed by modern software test managers.
Software Testing – FCA in a Decision Support Framework
905
References [1] Alter, S.: Information systems: Foundation of e-business, 4th edn. Prentice Hall, Upper Saddle River (2002) [2] Ammann, P., Offutt, J.: Introduction to software testing. Cambridge University Press, New York (2008) [3] Burns, N., Grove, S.K.: The practice of nursing research: Appraisal, synthesis, and generation of evidence, 6th edn. Elsevier Saunders, St. Louis (2009) [4] Editorial (2010) Software testing life cycle, http://editorial.co.in/software/ software-testing-life-cycle.php (accessed January 20, 2011) [5] Institute of Electrical and Electronics Engineers (1994) IEEE standard classification for software anomalies, doi:10.1109/IEEESTD.1994.121429 [6] Larkman, D., Mohammadian, M., Balachandran, B., Jentzsch, R.: Fuzzy cognitive map for software testing using artificial intelligence techniques. In: Papadopoulos, H., Andreou, A.S., Bramer, M. (eds.) AIAI 2010. IFIP Advances in Information and Communication Technology, vol. 339, pp. 328–335. Springer, Heidelberg (2010a), doi:10.1007/978-3-642-16239-8_43 [7] Larkman, D., Mohammadian, M., Balachandran, B., Jentzsch, R.: General application of a decision support framework for software testing using artificial intelligence techniques. In: Phillips-Wren, G., Jain, L.C., Nakamatsu, K., Howlett, R.J. (eds.) Second KES International Symposium IDT 2010, July 28-30, pp. 53–63. Springer, Heidelberg (2010b), doi:10.1007/978-3-642-14616-9_5 [8] Patton, R.: Software testing, 2nd edn. Sams Publishing, Indiana (2006) [9] Pinkster, I., van de Gurgt, B., Janssen, D., van Veenendaal, E.: Successful test management: An integral approach. Springer, Berlin (2004)
Sustainability of the Built Environment – Development of an Intelligent Decision System to Support Management of Energy-Related Obsolescence T.E. Butt and K.G. Jones
*
Abstract. From the built environment perspective, well more than half of whatever has been built and is being built, is going to be around for many decades to come. For instance in the UK, approximately 70% of the UK buildings that have already been built before 2010 will be existing in 2050s. The existing built environment (both infrastructures and buildings) suffer obsolescence in many ways and of various types. The obsolescence is being and will be more induced in the existing built environment not only due to conventional factors (such as aging, wear and tear) but also climate change related factors such as global warming / heat waves, wetter and colder winters, hotter and dryer summers, more frequent and more intense flooding and storms, etc. There are complexities and variation of characteristics from one built environment to another in terms of obsolescence. Whatever the type, shape, size, nature and location of a built environment scenario, energy is involved in it one way or another. Existing energy-related systems in built environments are going to become obsolescent due to both climate change and non-climate change related factors listed above as examples. Furthermore, the energy in the built environment exists in three different stages which are generation end, distribution and consumption end of the ‘pipeline’. Accommodating the aforesaid complexities and variation of characteristics from one built environment to another in terms of obsolescence specifically due to energy-related systems; this paper presents an intelligent decision making tool in the form of a conceptual but holistic framework for the energy-related obsolescence management. The tool at this stage of the research study, provides a conceptual platform where various stages and facets of assessment and management of energy-related obsolescence are assembled together in the form of a sequential and algorithmic system. T.E. Butt · K.G. Jones Sustainable Built Environments Research Group (SBERG), University of Greenwich, Avery Hill Campus, Bexley Road, Eltham, London PostCode: SE9 2PQ. England, UK Tel.: +44(0)7817 139170 e-mail: [email protected] J. Watada et al. (Eds.): Intelligent Decision Technologies, SIST 10, pp. 907–919. springerlink.com © Springer-Verlag Berlin Heidelberg 2011
908
T.E. Butt and K.G. Jones
Keywords: Intelligent decision technology; intelligent decision making; sustainability; sustainable development; multi-agent system; conceptual framework; obsolescence; built environment; climate change.
1 Background The term built environment means human-made surroundings that provide a setting for human activity, ranging in scale from personal shelter to neighbourhoods and large scale-scale civic surroundings. Thus, whatever is human-made or human-influenced constitutes the built environment. The built environment consists of two main parts which are buildings and infrastructures. The built environment density in an urban environment is more than in a rural environment. The biophysical properties of the urban environment are distinctive with a large building mass (350kg.m-2 in dense residential areas) and associated heat storage capacity, reduced greenspace cover (with its evaporative cooling and rainwater interception and infiltration functions) and extensive surface sealing (around 70% in high density settlement and city centres) which promotes rapid runoff of precipitation (Handley, 2010). Climate change amplifies this distinctive behaviour by strengthening the urban heat island (Gill et. al. 2004). As a general rule, the greater is the density of a built environment, the greater will be the potential of the obsolescence, irrespective of other reasons and drivers. For instance, London is one of the most urbanised parts of the UK built environment in terms of a range of elements such as geographical size, value, economy, human population, diversity, ecology and heritage. Furthermore, London is the capital of the UK and located near the North Sea, stretching around an estuary, with the River Thames running through it, thereby further adding significance and sensitivity to the city in a hydrological context e.g. increased potential of pluvial, fluvial, tidal and coastal floods. In view of these wide-ranging elements together, the overall London share in the total obsolescence to take place in the total UK built environment over time, is most probably to be larger than anywhere else in the UK, and probably one of the largest shares throughout the world. (Butt et. al., 2010a; 2010b). Any constituent (such as a building or infrastructure) of built environment grows to become obsolete or suffers increasing obsolescence over time. Moreover, what is being built now shall predominantly be around as a substantial part of our built environment for decades to come, which are bound to suffer various degrees of obsolescence in different ways (Butt et. al., 2010a; 2010b). In order to render our built environment more sustainable, obsolescence needs to be combated. There is a host of factors which play a role either alone or collectively to cause obsolescence. These factors are not only conventional such as general wear and tear, fatigue, corrosion, oxidation, evaporation, rusting, leaking of gas / water or any other fluid like coolant, breaking, age, etc. These factors are also nonconventional, rather contemporary such as changes in existing or advent of a new environmental legislation; social forces / pressure groups; arrival of new technology; enrichment of knowledge e.g. asbestos is no longer allowed to be used in the built environment; fluctuation in demand; inflation of currency; etc.
Sustainability of the Built Environment
909
In addition to the aforesaid list of factors that cause obsolescence a new driver which is being increasingly realised is climate change (See Section 2 for details). By 2050s the UK is expected to experience: increase in average summer mean temperatures (predicted to rise by up to 3.5oC) and frequency of heat-waves / very hot days; and increases in winter precipitation (of up to 20%) and possibly more frequent severe storms (Hulme et. al., 2002). 70% of UK buildings that will exist in 2050 have already been built. Due to climate change factors (examples of which are indicated above) these existing built assets of the UK are already suffering and will further increasingly suffer from various types of obsolescence (Butt, et. al., 2010a; 2010b). Thus, if sustainable built environment is to accommodate climate change and the investment in these buildings (which was approximately £129 billions in 2007 in the UK alone (UK Status online, 2007)) is to be protected, action needs to be taken now to assess and reduce likely obsolescence of the existing UK built environment; and plan adaptation and mitigation interventions, that continue to support the quality of life and well-being of UK citizens. Failure to act now will mean that the costs of tackling climate change associated obsolescence in future will be much higher (CBI, 2007). The situation with other countries around the globe is not dissimilar, although there may be some variation in nature and quantity of climate change, and the way climate change impacts manifest themselves in relation to the resources and governance of a given country. Thus, managing the sustainability of the existing built environment against obsolescence is of paramount importance to preserve our built assets from local through sub-regional, regional, provincial, national, continental to international and global level.
2 Climate Change Induced Obsolescence Irrespective of whether an obsolescence is internal or external and financial or functional, if a given obsolescence is due to impacts of climate change it is referred to as Climate Change Induced Obsolescence by the authors. The climate change associated obsolescence can be direct or indirect as described below:
2.1 Directly Induced Climate Change Obsolescence Obsolescence that is caused by direct impact of climate change factors is termed as directly induced climate change obsolescence. For instance: •
•
Current air conditioning systems in our built environment may not be as effective due to global warming / heat-waves which are a resultant of climate change. Thus global warming / heat-waves may bring about obsolescence in a given building’s air conditioning system as a direct impact. These heat-waves can also have direct adverse affects on structure or fabric of buildings. Due to ever higher levels of greenhouse gas emissions in the atmosphere, the interaction of poor air quality with facade of a given building can induce obsolescence in terms of reducing refurbishment cycle of the building facade.
910
T.E. Butt and K.G. Jones
•
Similarly, due to water level rise as a result of climate change, estimated flood levels are rising. This implies that current level of electrical cables, power points and appliances from the ground in a given scenario of built environment may not be high enough any longer, thereby becoming obsolete to encounter estimated high level flooding, should it happen.
2.2 Indirectly Induced Climate Change Obsolescence Obsolescence that results from the impact of climate change factors in an indirect manner is referred to as indirectly induced climate change obsolescence. For example: •
•
Irrespective of to whatever degree, one of the reasons of climate change acceleration is anthropogenic activities such as greenhouse gas (GHG) emissions which include carbon dioxide, concentration levels of which in the global atmosphere are higher than ever before. This has contributed in shaping environmental legislation such as European Union (EU) Directive of Energy Performance of Buildings (2002/91/EC) (EC, 2010; EU, 2002); EU Climate and Energy objectives; and legally binding carbon reduction targets set up by the Climate Change Act 2008, 2010 (DECC, 2010a; 2010b). Such environmental legislations have begun to cause indirectly induced climate change obsolescence in existing buildings for they are not able to meet the aforesaid environmental requirements as they stand today. Similarly, the advent of Carbon Capture and Storage (CCS) technology in line with carbon cut demands and targets is at the verge of introducing substantial amount of obsolescence to existing fossil fuel power plants operating without CCS. This is yet another case of indirectly induced climate change obsolescence.
3 Energy-Associated Obsolescence in the Built Environment In the built environment energy is generated, distributed and consumed in different amounts and various direct and indirect ways. The built environment without energy is just not possible to function whether it is lighting of or heating in a building, energy consumption in transport means, or even energy generation and distribution infrastructures. On the other hand, among various other factors, energy-related obsolescence also corresponds to a building’s ability to benefit from improvements in energy efficiency (technical and operational) and the provision of low carbon energy solutions. While significant improvements have been made over the years to the energy efficiency of building fabric, demand for power has increased by 24% since 1990 and is predicted to grow by 53% by 2030 ((DTI 2010, BIFM, 2007). If the UK government are to have any chance of achieving their 80% reduction in CO2 emissions by 2050 as legally bound by the Climate
Sustainability of the Built Environment
911
Change Act 2008, 2010 (DECC, 2010a; 2010b) then the impact of energy-related obsolescence needs to be factored in to energy generation, distribution and utilization policy. Given that most of the current built environment will exist in many decades to come, and considering demands that climate change will place on buildings mitigation and adaptation, the challenge will be to find non-fabric ways of addressing energy obsolescence, through new approaches to integrated building services systems (e.g. heating, cooling, lighting, information technology, etc.) and business operations (e.g. remote working, hot desking, etc.) that do not place unbearable costs of refurbishment on a building’s owner. Failure to develop an integrated approach to energy-related asset management will result in ad hoc solutions being retrofitted to existing buildings (e.g. room mounted air conditioning systems) that, whilst addressing the business imperative, doesn't address the wider climate change agenda. A review of literature to date (e.g. Allehaux and Tessier, 2002; Jones and Sharp, 2007; Acclimatise, 2009) reveals a lack of knowledge, models, and holistic approaches towards integrated and intelligent asset management against energy-related obsolescence. (Butt et. al., 2010a; 2010b; Kiaie, 2010). This paper presents a conceptual but holistic framework of intelligent decision system to support management of energy-related obsolescence. The framework categorises wide ranging scenarios of energy and built environments into appropriate groups, and presents stages of the management of energy-related obsolescence in a sequential, logical and algorithmic order.
4 Development of the Holistic Framework of the Intelligent Decision System This section of the paper presents the development of the holistic framework of the intelligent decision system (which is shown in Figure 1) for assessment and management of energy-related obsolescence in built environments. All the items in the framework are shadowed under technical, non-technical, physical and / or nonphysical aspects. The contributing proportions of these four aspects for items in the framework will vary from scenario to scenario of built environments, depending upon a number of characteristics such as nature, size, scope, and type of the built environment scenario under consideration. This is further explained in the next paragraph with examples. Similarly, the framework encapsulates dimensions of sustainability i.e. social, economic, and environmental. This becomes prominently more evident at the cost-benefit analysis stage of the framework (Section 4.2.1). Examples of technical aspects are heating systems; limitations of technologies (such as non-energy saver i.e. conventional lighting bulbs); etc. Whereas, the behaviour of occupants and maintenance staff of a commercial building; energy using patterns of dwellers of a house; etc. are examples of non-technical aspects. Examples of physical aspects are fabric of buildings; facade of buildings; furniture; etc. Whereas for non-physical the following can be taken as examples: energy; ventilation; maintenance or refurbishment schedule; etc. Among various items in the holistic decision framework (Figure 1), some would be physical, some
912
T.E. Butt and K.G. Jones
non-physical, some technical, some non-technical, and some could be various combinations of any of these four aspects. This will depend on various characteristics specific to a given built environment scenario under consideration. For instance, natural lighting is non-physical but may have aspects that are associated with physical design of the building or non-physical phenomenon such as summer sun or winter sun. Similarly, environmental legislation (e.g. the Climate Change Act 2008; 2010 (DECC, 2010a; 2010b; HM, 2010)) regarding carbon cuts is a non-technical entity but may have technical aspects integrated when carbon cut technologies (e.g. carbon capture and storage) are employed to a fossil-fuel power generation plant. Also, the heating system of a building is a physical commodity, but energy consumption efficiency in terms of users’ behaviour is non-physical aspect and the employed heating technology is a technical matter. Management systems (such as maintenance schedule; environmental management system e.g. ISO 14000; quality management system e.g. ISO 9000, etc.) are non-physical matters but may have technical as well as non-technical aspects associated with them.
Fig. 1 Framework of a holistic intelligent decision system for energy-related obsolescence management
Energy-associated obsolescence management can be described as a process of analysis, evaluation and control of obsolescence that is induced or is likely to be induced in energy-related systems of a given built environment scenario. Figure 1 depicts the holistic conceptual framework as a basis of the intelligent decision support system of energy-related obsolescence management. The framework includes all the three phases of the ‘pipeline’ of energy in the built environment i.e. generation, distribution and consumption. That is, the framework can be applied to
Sustainability of the Built Environment
913
any of the three phases of energy in the built environment. This is divided into two main parts i.e. Obsolescence Assessment (OA) and Obsolescence Reduction (OR). The output of OA is input to OR and this is how the former provides foundation for the latter. Therefore, the more robust and stronger the foundation yielded from OA, the more effective the OR is likely to be.
4.1 Obsolescence Assessment (OA) OA consists of two sub-parts i.e. baseline study and, Identification and categorisation. In the baseline study section of the framework an obsolescence assessor is to gather relevant information by carrying out a desk study of various documents such as drawings, engineering design, historic data on various aspects e.g. flooding, etc. This information can be reinforced by paying investigative visits to the site and gathering anecdotes. The baseline study can also play a vital role of screening before scoping is carried out in a later stage of the framework. Based on the information collected, the assessor can identify and categorise various items under a set of two headings i.e. built environment constituents (BEC) and obsolescence nature (Figure 1). 4.1.1 Built Environment Constituents (BEC) In the BEC module, scope or boundaries can be established of a given built environment. For instance, whether it is a building or infrastructure; what is the type of building (e.g. commercial, domestic, industrial, etc.); what is the type of infrastructure e.g. transport (and if transport then road network or buses in specific; railway networks or trains themselves); energy generation or distribution network; etc. At this point, it can also be established whether it is an existing or a future development. The stage of a development can also be further broken down along the following steps: planning, construction, in-operation and / or decommissioning. Sometimes it may be the extension of an already existing building which is being planned, constructed or decommissioned. Therefore, it is better to identify whether planning, construction, in-operation and / or decommissioning are in full or in part. After this, the assessor can break down the given built environment under consideration into all constituting components. These components can be further categorised into energy-related and non-energy-related groups of commodities. The energy-related group is divided further into energy utilizing items (such as heating, cooling, lighting, etc.) and energy embodied constituents which would be almost all constituents of a given built environment scenario. The only items (if any) which would not fall in this group would basically be in the nonenergy-related category (Figure 1). These identified and categorized components can then be looked at from other perspectives for further investigation such as operational, non-operational, physical, non-physical, technical, non-technical, socio-technical, managerial, non-managerial, fabric, non-fabric, etc.
914
T.E. Butt and K.G. Jones
4.1.2 Obsolescence Nature The next main stage in the framework is identification and categorisation of obsolescence nature for the specified components (Figure 1). The nature of the obsolescence can be characterised as follows: Financial obsolescence means loss in value where as functional obsolescence is loss of usefulness, effectiveness, efficiency or productivity. The financial obsolescence is also termed as social or economic obsolescence, and functional obsolescence as technical obsolescence. (Cooper, 2004; Montgomery Law, 2010; Leeper Appraisal Services, 2010; Richmond Virginia Real Estate, 2003; Nky Condo Rentals, 2010: SMA Financing, 2009). Irrespective of whether obsolescence is in value or function or both, internal obsolescence in a building component or built asset is due to factors that exist within the component or built asset. For instance, general wear and tear, fatigue, corrosion, oxidation, evaporation, rusting, leaking of gas / water or any other fluid like coolant, breaking, age, etc. Where as external obsolescence is temporary or permanent impairment in value or usefulness of a built asset due to factors outside the system such as change in existing or advent of a new environmental legislation; social forces / pressure groups; arrival of new technology; improvement or enhancement of knowledge; fluctuation in demand; inflation of currency; etc. (Landmark Properties, 2009; Salt Lake County, 2004; ESD Appraisal Services, 2010; Drew Mortgage, 2006). Permanent obsolescence is irreversible, for instance, materials of buildings that contain asbestos have become permanently obsolete due to its health impacts. On the contrary, factors which are not permanent such as temporary civil unrest in a society, loss of power for days, flooding, etc. can cause a temporary obsolescence. Furthermore, irrespective of whether obsolescence is internal or external and financial or functional, obsolescence can also be categorised in climate change context as explained above in Section 2.
4.2 Obsolescence Reduction (OR) As stated earlier, the second main part of the sustainable obsolescence management is OR. Although the whole of the obsolescence management framework may be iterated a number of times depending on various characteristics of a given built environment scenario, this part of the holistic framework certainly needs repeating a number of times until most sustainable and yet realistically possible solution or set of solutions have been derived and implemented. The Obsolescence Assessment (OA) is predominantly around gathering and categorisation of data and information of the given built environment, which does not need much iteration. However, as for the OR, the main reason for repeating the OR more frequently, is that various modules in this part are mutually dependent on each other for mutual information transfer. This is due to the fact that information processed in various OR modules have to be delivered and received between the modules backwards and forwards a number of times. For instance, the information between costbenefit analysis and stakeholder participation modules has to be used backwards and forwards due to various sustainability aspects as well as variation in interests among different stake holders (Figure 2). This iteration aspect becomes clearer in
Sustainability of the Built Environment
915
the discussion below where various modules of the OR part are described in more details. This part has been divided into two sub-parts i.e. Obsolescence Evaluation (OE) and Obsolescence Control (OC). Details on them are described below: 4.2.1 Obsolescence Evaluation (OE) The first unit in the OE section is ‘selection of component(s)’ module. Based on the information which would have been collated earlier in the OA part of the OM, in this module the components of the built environment scenario, which are the point of interest in terms of obsolescence assessment and management, can be identified and categorised. In order to assist prioritisation and selection of the components, this module categorises the identified components into three groups based on the rationale around various sustainability aspects. These three groups are: 1. 2. 3.
The components which have become obsolescent; The components which are nearing end of life; and The components which have sufficient life.
There is a module allocated for establishing positive and negative impacts of both taking action and not taking action to fix the obsolescence problems, particularly those of from the first two groups above. This can help to further prioritise on which components need more and quick attention as opposed to others. Following this, there is another module in the framework where all possible options to control obsolescence can be identified and their characteristics (both advantages and limitations) can be established. These options could be technical (e.g. some innovative technology); non-technical (such as a new managerial system to control behaviour of energy consumption); or even combinations of technical and non-technical facets with varying proportions. Keeping this in view, there are three categories introduced in the decision support framework. These are: 1. Technological 2. Behaviour / People, and 3. Socio-technical. The first category of options is further classified into fabric and non-fabric technologies. The technical aspect of the socio-technical category could be fabric or non-fabric related, as Figure 1 shows. The information thus established on various options can later also feed into the ‘cost-benefit analysis’ module, which is divided into three sub-modules to address the three principal dimensions of sustainable development philosophy. These three sub-modules are: Social, Environment and Economic. Each of the social and environment sub-modules cover sustainability aspects in two categories, which are financial costs and nonfinancial costs. For the social sub-module, examples of financial costs are fine that a company may face due to not complying with some legislation requirements such as health and safety regulations; compensation which might have to be paid to the relevant stakeholder e.g. an employee who suffers a health problem or an accident at work; compensation might have to be paid to an external customer too; etc. Where as adverse impact on company image, quality of service or product of
916
T.E. Butt and K.G. Jones
the company, poor social corporate responsibility, are examples of non-financial aspects. Similarly for the environment sub-module, lets consider a case in which some spillage of a toxic substance takes place due to some improper or obsolete component. This can cause financial costs such as the cost to fix the environmental damage and compensation to the victims of the environmental damage. Whereas the bad publicity and extra-ordinarily high pressures from the government bodies (e.g. the Environment Agency) as well as various voluntary environmental pressure groups are examples of non-financial environmental costs. For the economic sub-module there are three categories which are: 1. 2. 3.
capital cost, running cost, and payback time.
In the first two categories above, costs of refitting (i.e. maintenance) and / or retrofitting (i.e. refurbishment) of the selected components are to be analysed. The payback time will also play a vital role in decision making. The financial costs of environmental and social aspects can also be tapped into the economics submodule to draw a bigger picture of total costs. Thus, economic sub-module is shown connected with financial costs of the social and environmental sub-modules (Figure 1). The cost-benefit analysis can be reinforced by consulting diverse spectrum of (internal and external) stakeholders ranging from technical to nontechnical. Each and every stakeholder needs not to be consulted for each and every obsolescence scenario but only appropriate and relevant stakeholders depending on characteristics of the scenario. Thus, in the ‘stakeholder participation’ module of the framework, appropriate stakeholders should also be identified prior to consultation. Information from the ‘other evaluations’ module regarding e.g. feasibility report, the company’s policy and mission statement, etc., can also be tapped into the cost-benefit analysis module to render it more holistic. Eventually, in the decision making module, a decision is made in terms selection of an option or a set of options to reduce impacts of the obsolescence in the built environment scenario. 4.2.2 Obsolescence Control (OC) In the OC section of the obsolescence management framework, the selected obsolescence control option(s) is/are designed and planned. If any unexpected implications, these can be reconsulted with the appropriate stakeholders and rechecked via the cost-benefit analysis module. If any problems, another option or set of options can be selected and then designed and planned again. Such iterations can continue till a satisfactory option or set of options has/have been designed and planned, following which the option(s) can be implemented. While implementing monitoring needs to take place for if there are any discrepancies. Frequency of monitoring can also be incorporated into the system at the design and planning stages earlier. If any discrepancies observed, corrective actions need to be taken to control the implementation process. Such corrective actions against discrepancies can also be set during the design and planning stage.
Sustainability of the Built Environment
917
5 Concluding Remarks This paper establishes link between energy-associated obsolescence and various factors that cause it ranging from conventional to contemporary and also climate change challenges. An intelligent support system is developed in the form of a holistic and conceptual framework for sustainable assessment and management of energy-related obsolescence, which has not existed in the reported literature to date. The framework assembles and categorises all appropriate modules and submodules from the start to the end under one umbrella and places them in a logical, sequential and algorithmic order to support carrying out the obsolescence assessment and management process. The framework encapsulates wide ranging built environment scenarios whether fully or partly at planning, construction, inoperation and / or decommissioning stage. The presence of energy in the built environment is not considered only at the consumption end of the ‘pipeline’ but also the generation end and distribution. The physical, non-physical, technical and nontechnical aspects are also included. This renders the framework useful for diverse range of stakeholders from experts to non-technical. This research work is a step towards making obsolescence management possible in a holistic and sustainable manner. This research work can also streamline current practices of obsolescence management which are available in a non-integrated and peace-meal fashion, and to a limited extent. This framework can attract debate and interests from both practitioners and researchers for further study and research, and later be converted into a computer-aided system. However, the framework, in its current shape can still be effectively used as a decision making tool to select an option or set of options to control obsolescence. Thereby, assisting in rendering our existing built environment (which will be around for many decades to come) more sustainable against various obsolescence drivers from conventional to as modern factors as climate change.
References 1. Acclimatise, Building Business Resilience to Inevitable Climate Change, Carbon Disclosure Project Report 2008, Global Oil and Gas, Acclimatise and Climate Risk Management Limited, Oxford (2009) 2. Allehaux, D., Tessier, P.: Evaluation of the functional obsolescence of building services in European office buildings. Energy and Buildings 34, 127–133 (2002) 3. BIFM (British Institute of Facilities Management), Position Paper: Energy, Executive Summary, BIFM, (January 10, 2010) 4. Butt, T.E., Giddings, B., Cooper, J.C., Umeadi, B.B.N., Jones, K.G.: Advent of Climate Change and Energy Related Obsolescence in the Built Environment. In: International Conference on Sustainability in Energy and Buildings, Brighton, UK, May 6-7 (2010a) 5. Butt, T.E., Umeadi, B.B.N., Jones, K.G.: Sustainable Development and Climate Change Induced Obsolescence in the Built Environment. In: International Sustainable Development Research Conference, Hong Kong, China, May 30-June 1 (2010b)
918
T.E. Butt and K.G. Jones
6. CBI (Confederation of British Industry), Climate Change: Everyone’s business, CBI (2007) 7. Cooper, T.: Inadequate Life? Evidence of Consumer Attitudes to Product Obosolescence. Journal of Consumer Policy 27, 421–449 (2004) 8. DECC (Department of Energy and Climate Change), Climate Change Act 2008, http://www.decc.gov.uk/en/content/cms/legislation/ cc_act_08/cc_act_08.aspx, (Viewed April 2010b) 9. DECC (Department of Energy and Climate Change), Legislation, http://www.decc.gov.uk/en/content/cms/legislation/ legislation.aspx, (Viewed April 2010a) 10. Drew Mortgate Inc. Online Mortgage Dictionary (2006), https://www.drewmortgage.com/abargoot/dictionary/ dictionary_e.html (Viewed March 2010) 11. DTI (Department of Trade and Industry), Energy – Its impacts on environment and Society – Chapter 3, http://www.dti.gov.uk/files/file20300.pdf (Viewed May 2010) 12. EC [European Commission – Energy (ManagEnergy)], COM 2002/91/EC: Directive on the Energy Performance of Buildings, http://www.managenergy.net/products/R210.htm, ManagEnergy (last modified March 9, 2010) 13. ESD Appraisal Services, External Obsolescence, http://www.edsappraisalservices.com/Glossary_and_Terms (Viewed March 2010) 14. EU (European Union), Directive 2009/91/EC of the European Parliament and of the Council of 16 December 2002 on the energy performance of buildings. OJ L1/65 (4-12003) (2002) 15. Gill, S., Pauleit, S., Ennos, A.R., Lindley, S.J., Handley, J.F., Gwilliam, J., UeberjahnTritta, A.: Literature review: Impacts of climate change on urban environment. CURE, University of Manchester (2004) (available on line) 16. Handley, J.F.: Adaptation strategies for climate change in the urban environment (ASCCUE), Narrative report for GR/S19233/01, http://www.sed.manchester.ac.uk/research/cure/downloads/ asccue-epsrc-report.pdf (Viewed March 2010) 17. HM Government, Climate Change: Taking Action (Delivering the Low Carbon Transition Plan and preparing for changing climate), Crown Copyright (2010) 18. Hulme, et al.: Climate change scenarios for the United Kingdom: The UKCIP 2002 Scientific Report, Tyndall Centre for Climate Change Research, School of Environmental Sciences, University of East Anglia, p. 120 (2002) 19. Jones, K.G., Sharp, M.: A new performance-based process model for built asset maintenance. Facilities 25(13/14), 525–535 (2007) 20. Kiaie, M., Umeadi, B.B.N., Butt, T.E., Jones, K.G.: Challenges to Sustainable Development: how facility managers can apply intelligent monitoring to maintenance. In: Conference on Sustainable Development and Scientific Research, Integration and Knowledge-Oriented Pars, Pars Special Economic Energy Zone, Assaluyeh, Iran, January 6-8 (2010) 21. Landmark Properties – Commercial Real Estate, Real Estate Dictionary, http://www.allaboutskyscrapers.com/dictionary/e.htm (Viewed March 2010)
Sustainability of the Built Environment
919
22. Leeper Appraisal Services, California Appraisal / Appraisers, http://www.leeperappraisal.com/appraiser_jargon.htm (Viewed March 2010) 23. Montgomery Law, Family Law Glossary, http://www.montylaw.com/family-law-glossaryo.php (Viewed March 2010) 24. Nky Condo Rentals – Rodney Gillum (2010), http://www.nkycondorentals.com/index.cfm/ fuseaction/terms.list/letter/O/contentid/ 511AF257-5436-4595-81BBAA7704C1AC40 (Viewed March 2010) 25. Richmond Virginia Real Estate, Real Estate Dictionary (2003), http://www.therichmondsite.com/Blogs/Dictionary_O.html (Viewed March 2010) 26. Salt Lake County, Tax Administration (2004), http://www.taxadmin.slco.org/boeGlossary/boeGlossaryE.html (Viewed March 2010) 27. SMA Financing, Real Estate Glossary (2009), http://www.smafinancing.com/glossary.htm (Viewed March 2010) 28. UK Status online, Gross Fixed Capital Formation at Chained Volume Measure (2007)
Author Index
Abderrahim, Siam 409 Ahmed, Sabbir 789 Arai, Yuta 557 Arbaiy, Nureize 103 Asakura, Koichi 367 Bae, Hyerim 469, 519 Bae, Joonsoo 629 Balachandran, Bala M. 429, 529 Baohui, Ji 683 Barbu, Marian 155 Bhandari, Gokul 743 ´ Bog´ardi-M´esz¨oly, Agnes 65 Bormane, D.S. 809 Botvich, Dmitri 821, 873 Botzheim, J´anos 165, 273 Brodsky, Alexander 223 Bunciu, Elena M. 95 Butt, T.E. 907 Campos, Ana 853 Caraman, Sergiu 155 Chang, Betty 399, 647 Chang, Jieh-Ren 399, 647 Chang-jun, Han 671 Chang, Shih-Yu 567 Cheng, Yu-Kuang 389 Chen, You-Shyang 343, 389, 449, 479 Chiang, Bo-Yu 567 Chiba, Takuya 367 Chiros¸c˘a, Alina 155 Christodoulou, Spyros 113 Chuang, Huan-Ming 501, 605
Dempe, Stephan 255, 265 Dominish, Derek 863 Dumitras¸cu, George 155 Ekel, Petr Ya 459 Elmisery, Ahmed M.
821, 873
F¨oldesi, P´eter 65, 165, 273 Fukagawa, Daiji 719 Gawinowski, Grzegorz 617, 843 Gobbin, R. 429, 529 Grivokostopoulou, Foteini 135 Hai-cheng, Li 699 Hanaue, Koichi 547 Hashizume, Ayako 329 Hatano, Kenji 707, 719 Hatzilygeroudis, Ioannis 135 Henryk, Piech 617, 843 Hsieh, Ming-Yuan 343, 439, 449, 597 Hu, Xiangpei 13, 37 Huang, Chi-Yo 123, 355, 567 Huang, Minfang 13 Huang, Xu 489 Iino, Takashi 511, 537 Ikeda, Kento 719 Imazu, Yoshihiro 799 Imoto, Seiya 799 Irimia, Danut C. 283 Itoh, Akihiro 213 Itoi, Ryota 589, 637 Iyetomi, Hiroshi 511, 537, 557
922
Author Index
Jain, Dreama 753 Jentzsch, Ric 897 Jheng, Yow-Hao 399 Jiang, Yiping 145 Jiang, Zhongqiang 37 Jian-hui, Liu 671 Jones, K.G. 907
McDonald, Tom 743 Ming, Huang 683 Miwa, Kanna 213 Miyano, Satoru 799 Mohammadian, Masoud
Kalashnikov, Vyacheslav V. 255, 265 Kalashnykova, Nataliya I. 255, 265 Kanamaru, Masanori 547 Kang, Young Ki 629 Karacapilidis, Nikos 113 Katayama, Kotoe 799 Kent, Robert D. 731, 743, 789 Kido, Takemasa 637 Kinoshita, Eizo 213, 247, 319 Ko, Yu-Chien 23 Kobayashi, Takashi 719 Kobti, Ziad 743, 753, 789 Kojiri, Tomoko 577 Kolekar, Sucheta V. 809 Kovas, Konstantinos 135 Kung, Chaang-Yung 389, 479, 597 Kung, Chang-Yung 439
Ohya, Takao 247 Okunishi, Kouichi 557 Ont, O. 763 Ozaki, Toshimasa 213
Lai, Chien-Jung 343, 389, 439, 479 Lai, Kin Keung 75 Larkman, Deane 897 Lee, Huey-Ming 185 Leitch, Kellie 763 Li, Jianming 37 Lim, Sungmook 377, 519 Lin, Chia-Li 295 Lin, Chien-Ku 501, 605 Lin, Chyuan-Yuh 501, 605 Lin, Lily 185 Lin, Yang-Cheng 85, 833 Lin, Yi-Fan 355 Lin, Yu-Hua 439 Lo, Chi-Hsiang 399 Lo, Mei-Chen 47, 175, 213 Luo, Juan 223 Lv, Renping 37 Ma, Chao 3 Ma, Min-Yuan 85 Ma, Peng 307 Matsuura, Keiko 799 McCarrell, Jason 743
Neves-Silva, Rui
529, 897
853
Park, Jaehun 519 Park, Jennifer J. 763 Parreiras, Roberta O. 459 Pei, Hung-Mei 647 Peng, Li 691 P´erez-Vald´es, Gerardo A. 255, 265 Perikos, Isidoros 135 Petre, Emil 191, 201 Poboroniuc, Marian S. 283 Poign´e, Axel 113 Popescu, Dan 191 Popescu, Dorin 283 Popovici, Ioana Florina 237 Preney, Paul D. 743 Pugovkin, Aleksey V. 47 Qian, Wang
661
Ramdane, Maamri 409 Roman, Monica 191 R¨ovid, Andr´as 65 Ruan, Junhu 3 R¨uping, Stefan 113 Sajjad, Farhan 743 Sanjeevi, Sriram G. 809 Sato, Yuji 57 Selis¸teanu, Dan 201, 283 S¸endrescu, Dorin 191, 201 Serrano, Mart´ın 873 Shah, Pritam Gajkumar 489 Shang, Hongyan 3 Sharma, Dharmendra 429, 489 Shell, Jeremy 763 Shih, Chi-Huang 885 Shiizuka, Hisao 329
Author Index Shimizu, Nobuo 769 Sioutis, Christos 863 Snowdon, Anne W. 731, 743, 753, 763 Solehati, Nita 629 Sugiura, Shin 319 S¨ule, Edit 273 Tamura, Koya 707 Tanaka-Yamawaki, Mieko 589, 637 Terada, Yoshikazu 779 Tokunaga, Hideaki 799 Tsai, Chien-Tzu 47 Tseng, Chun-Chieh 567 Tzagarakis, Manolis 113 Tzeng, Gwo-Hshiung 23, 47, 123, 175, 213, 355, 567 Tzeng, Wei-Chang 123 Wang, Haiyan 307 Wang, Xuping 3 Watada, Junzo 103 Watanabe, Kenji 799 Watanabe, Toyohide 367, 547, 577
923 Watanabe, Yuki 577 Wei, Chun-Chun 85, 833 Wu, Wen-Ming 389, 449, 479, 597 Wu, Ya-Ling 343, 439, 449, 597 Xu, Liang 683 Xu, Yeong-Yuh 885 Yadohisa, Hiroshi 707, 779 Yahya, Bernardo N. 469 Yamada, Takayoshi 419 Yamaguchi, Rui 799 Yamamoto, Hidehiko 419 Yan, Zhou 691 Yang, Min-Hsien 47 Yang, Xin 589 Yoshikawa, Takeo 511 Yuan, Ming-Cheng 123 Zahid, Atif Hasan 731 Zhang, Lihua 13 Zhao, Lindu 145 Zhi, Qi 699 Zhou, Shifei 75
Index
acidogenesis-methanization 95 ad-hoc network 367 adapted queueing algorithm 65 adaptive control 201 adaptive learning styles 809 additive demand 307 agent-based modeling 237 agents 409 AHP 175, 213, 319 anaerobic process 95 analytical network process 389, 479 analytic hierarchy process 175 analytic network process 47 ANP 47, 213, 319, 390, 479 assemblage 396 assembly line 419 attribute coding 399 background music 439 Bayesian network 691 bee algorithm 683 benchmarking 519 bilevel programming model 255, 265 bioengineering 283 biometric identification 459 budget allocation 57 building 853 building facilities 647 built environment 907 business management 47 business process 469 CAI 597 calculating similarity
720
CCM 319 cell production 419 climate change 908 cloud based banking services 123 cloud computing 123 clustering 821 collaborative filtering model 661 combinational disruption 3 communication 529 components 409 compute unified device architecture conceptual framework 908 condorcet method 449 consignment contract 307 consistency estimation 617 continuous auditing 731 correlation 589, 637 CUDA 37
37
data-intensive collaboration 113 data envelopment analysis 519 DCR 175 decentralized supply chain 307 decision analysis 213 decision-based neural network 885 decision-making 237 decision-making trial and evaluation laboratory 47, 295 decision method 13 decision support 705 decision support systems 743 DEMATEL 47, 295 DEMATEL based network process 355 design change requirement 175
926 difficulty estimation 135 dimension reduction 255, 265 disaster area 367 disruption management 3, 13 distance learning 459 distributed computing middleware 863 distribution-valued data 779, 799 distribution scheduling 145 DNP 355 dominance-based rough set approach 23 dominant AHP 247, 319 domination 617 DRSA 23 dynamic fuzzy control 489 e-commerce 429 e-learning 809 eclipse 429, 529 ECOAccountancy 389 eigenvalue distribution 589 eigenvalues 637 elaboration likelihood model 605 EMD 75 emergency response 145 emotion 753 emotional behavior 237 energy efficiency 853 English writing 597 evidence fusion theory 699 exponential function 145 factor contribution analysis 843, 897 failure diagnosis 185, 691 feature selection 449 finite size effect 557 first order logic 135 fractals 237 fragrance form design 85 functional data analysis 769 fuzzy control 155 fuzzy inference 185 fuzzy preference 459, 897 fuzzy random variable 103 GA 419, 568 genetic algorithm 568, 683 global 459, 466 global investment 343 gold market forecasting 75 GPU 37
Index Harker’s method 247 Harker method 213 health care 731 health informatics 743 health monitoring 873 health information technology 763 heuristic 469 hierarchical clustering 769 HIT 763 hospital medication administration 753 human and machine reasoning 117 hybrid uncertainty 103 hydro-electrical simulation system 691 hyperlink analysis 707 ICT enabled personal health 873 imperfect matrix 213 improved bee algorithm 683 improvement strategy 295 information sharing 367 information system continuance 605 information technology 123 intelligent angent 831 intelligent decision guidance system 223 intelligent decision making 871 intelligent decision technology 908 inter-organizational business process 629 interaction 529 intermodal freight transport 13 interpretation 165 interval-pitch conversion 547 interval observer 95 investment in safety measures 57 IPTV networks 821 IS continuance 501 IS success model 501, 605 iterative majorization 779 JADE
429, 529
K-L transformation 459 K-means clustering 519 Kano’s quality model 165 kansei/affecive engineering 329 kansei communication 329 kansei engineering 85 kansei information 103 kansei value creation 329
Index lactic acid production 201 language model 707 LCG 589 lead user method 355 least squared support vector machine 568 linear programming 37 localization algorithm 671 logarithmic least square method 247 log mining analysis 809 loss aversion 165, 273 LS-SVM 568 macroeconomic model 343 market structure 511 MAS 409 mathematical programming 23 maximizing marginal loss saving 145 MCDM 47, 123, 175, 356 mediated activity 529 membership grade 185 methodology 459 minimum bounding rectangle 367 model predictive control 191 multi-period inventory model 377 MP 23 MT 589 mujltiple-choice cloze questions 577 multi-agent 821 multi-agent system 908 multi-attribute decision making 103 multi-issue negotiation 429 multiple-instance learning 885 multiple criteria decision making 47, 123, 175, 356 multiscale community analysis 537 music composition 547 music tempo 439 national competitiveness 23 natural language formalization 135 nested partition method 3 neural networks 191, 201, 283 neuroprosthesis control 283 nonlinear systems 191 nonverbal communication 329 notation-support method 547 ob-solescence 871 online shopping 292 optimal channel profit 302
927 optimization 388 organization 396 organizational performance measurement 459 pairwise comparison matrix 247 parallel algorithm 37 Pareto-efficient 429 particle swarm optimization 399 percent complete 57 performance arts 647 performance evaluation 175 personalised health systems 873 personal medicine pervasive computing on ehealth 873 PGR 449 piecewise surface regression 223 playing method 439 predictive data mining 789 preferention 617 primary health care 743 principal component 637 privacy 821 process chain 65 product design 833 production-pricing decision 307 production network 537 production system 419 profit growth rate 449 proximity 469 PSO 399 psychological attachment 501 query support
743
random correlation matrix 557 randomness 589 ranking lists 617 real-time assurance 731 recommender system 821 reference process model 469 rescue strategies 3 retrofit scenario 853 RMT-PCA 637 RMT-test 589 robotics 283 rough set theory 399, 449, 647 RSSI 671 RST 399, 449
928 smart space 671 SCM 479 scoring functions 429 security system 489 SEM 356 semantics 743 semiconductor 175 senior people 329 service oriented architecture 731 service performance 295 short term wind prediction 568 SIA-NRM 295 software agents 429, 529 software test 843, 897 stock market 637 structural analysis 720 structural equation modeling 356 subjectivity 529 supply chain 273 supply chain management 479 support vector machines 568 sustainability 908 sustainable development 908 SVM 568 symbolic data analysis 769, 779 synergy 113 tablet PC 356 tablet personal computer 356 TAM 356 technology acceptance model 356
Index technology assessment 47 time-space network 145 time utility 273 TOEFL 577 TOEIC 577 transaction management 629 tree structured data 720 trend 637 VIKOR 47 virtual factory 419 virtual university 459 visual analogue scale 779 VlseKriterijumska optimizacija i kompromis-no resenje 47 VRPTW 3 wafer fab 175 wastewater treatment bioprocesses 191 wastewater treatment process 155 water turbine model 699 web 459, 597, 707, 809 web-based instruction 597 web search 707 web usage mining 809 weighted link 511 wind power 568 wind speed forecasting 568 wireless networks 489 XML
720